Workshop "Data Quality Management in Wikidata"

News from Nov 13, 2018

January 18th, 9 AM - 5 PM the workshop "Data Quality Management in Wikidata" will take place at Wikimedia Germany e.V.

Abstract

Within a few years, Wikidata has developed into a central knowledge base for structured data through the collaborative efforts of Wikidata’s peer production community. One of the benefits of peer production is that knowledge is curated and maintained by a wide range of editors, with different cultural, experience and educational backgrounds, which hopefully results in potentially fewer biases and content-wise in a more diverse knowledge base.

Ensuring data quality is, thus, of utmost importance, as the goal of Wikidata is to “give more people more access to knowledge”[1] and therefore, the data needs to be “fit for use by data consumers”[2]. The Wikidata community has already developed methods and tools that monitor relative completeness (e.g., Recoin gadget [3]), encourage link validation and correction (e.g. Mix’N’Match [4]) and help editors observe recent changes and identify vandalism [5]. Moreover, the community started global discussions about relevant dimensions of data quality in a recent RFC that used a survey of Linked Data Quality methods [6] as the debate’s starting point to better describe and categorize quality issues and add more quality aspects/ dimensions, with the goal of developing a data quality framework for Wikidata [7]. These data quality dimensions are categorized in several dimensions, with intrinsic and contextual dimensions being the most crucial dimensions. Despite this progress, recent research has shown the dominant role of a Western perspective in the represented languages [8], thus, more work needs to be done to strive for more knowledge diversity. It is therefore a major concern of data quality, to support such knowledge diversity and ensure that Wikidata covers a wide variety of topics, from various trustworthy sources, where facts can be contradictory.

In this workshop, we would like to emphasize this perspective and discuss existing challenges and opportunities in the field of data quality monitoring and data quality assurance in the context of Wikidata. We would especially like to focus on Wikidata’s unique characteristics: its central role in a network of knowledge bases and other peer production projects (like Wikipedia), its ability to host plural statements and illustrate misinformation from Web information sources, its multilinguality, its community of humans and machines, as well as its dynamicity.

The workshop will give scientific researchers and community members the opportunity to discuss and present preliminary findings, ideas, opinions and demos.

Registration: here

Location: Wikimedia Germany e. V., Tempelhofer Ufer 23-24, 10963 Berlin, Germany

For further informations click here.

References:

https://tinyurl.com/y8hnq3rj
R. Y. Wang and D. M. Strong, “Beyond Accuracy: What Data Quality Means to Data Consumers,” J Manage Inf Syst, vol. 12, no. 4, pp. 5–33, Mar. 1996.
V. Balaraman, S. Razniewski, and W. Nutt, “Recoin: Relative Completeness in Wikidata,” in Companion Proceedings of the The Web Conference 2018, Republic and Canton of Geneva, Switzerland, 2018, pp. 1787–1792.
https://meta.wikimedia.org/wiki/Mix%27n%27match
Lydia Pintscher (2018). Data Quality in Wikidata. Wikimania 2018. https://commons.wikimedia.org /wiki/File:Wikimania_2018_-_data_quality_in_Wikidata_poster.pdf
A. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer, “Quality Assessment for Linked Open Data: A Survey,”
Wikidata:Requests for comment/Data quality framework for Wikidata
L.-A. Kaffee and E. Simperl, “Analysis of Editors’ Languages in Wikidata,” in Proceedings of the 14th International Symposium on Open Collaboration, New York, NY, USA, 2018, pp. 21:1–21:5.

6 / 39

Department of Mathematics and Computer Science

Human-Centered Computing

Workshop "Data Quality Management in Wikidata"

Abstract