Development of a Decision Support Tool for the Selection of a Data Conversion Tool

Academic Advisor: Jonas Oppenländer
Discipline: Web development, software engineering, semantic web technologies, decision support systems
Degree: BSc. (concrete task) / MSc. (with a broader scientific contribution)
Project:

Requirements:

  • Knowledge of creating web applications

Contents

Context of the work/project

The conversion of existing tabular data (in CSV, SQL, or flat-JSON format) is often the starting point of the development of Linked Data applications. Linked Data [1] is a general principle that publishes information in a machine-readable, text-based format, e.g. in RDF [2].

What is the problem

A number of tools are available to convert the existing tabular data into RDF. Software engineers and data scientists face the difficult task of selecting the right tool for their specific application requirements. Changing tools at a later time point in the project life cycle may be time-intensive and therefore expensive.

Objectives of this thesis

This work is concerned with supporting and facilitating the selection of a data conversion tool, given a set of requirements.

Previous work at the HCC group

In a previous study, we identified a number of conversion tools. Based on the requirements of a concrete research project (IKON) and past experience, we formulated a set of requirements that the conversion tools must fulfill. We analysed the documentation of each tool and tested if the tool fulfills the given requirements. A number of tools were also installed and tested with sample data. The test results may however contain errors. The tools were classified into groups based on above test results which resulted in a first version of a decision tree.

Suggested procedure for the Bsc thesis

  • Creation of a set of sample data that is able to address and test all of the given requirements
  • Installation and familiarization with the conversion tools
  • Testing of each tool with the sample data
  • Analysis of the results
  • Development of a web application that supports the selection of the right tool for a given set of requirements

Suggested procedure for the MSc thesis

Additional to the above, we can provide a real use case in collaboration with the Natural History Museum [3] in Berlin. The tools will be tested with this real data. A suitable method of comparing the results of the conversion will need to be developed. Using live data instead of the sample data will help uncover and extend the set of application requirements.

References

[1] Tom Heath and Christian Bizer (2011): Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool.

[2] Schreiber, G., Raimond, Y. (eds.) (2014): RDF 1.1 Primer. https://www.w3.org/TR/rdf11-primer/

[3] https://www.naturkundemuseum.berlin