Results of the Workshop: "Data Quality Management in Wikidata"



News from Jan 21, 2019

The workshop “Data Quality Management on Wikidata” was held on January 18th 2019 at Wikimedia Germany. It brought together scholars interested and working on monitoring and improving data quality on Wikidata as well as members of the Wikidata community.


The workshop began with a welcome note by two of the organizers of the workshop (Claudia Müller-Birn from Freie Universität Berlin and Cristina Sarasua from University of Zurich).

After that, the first keynote speaker, Amrapali Zaveri, gave a talk about “Open Data Quality: dimensions, metrics, assessment and improvement” (video, slides). Afterwards, the workshop participants introduced themselves in a round robin introduction session and formed three discussion groups.

The workshop was designed as a discussion forum organized in three sprints: one to collectively identify the key data quality challenges in Wikidata, a second sprint to brainstorm solutions to address the identified challenges and a third sprint to discuss ways to prioritize the next activities.

The main challenges discussed within the groups were, amongst other things the velocity of Wikidata’s schema, the diverging meaning of items in various languages, subtitle vandalism, measuring completeness without introducing bias, extending references and sources. Suggested solutions, for instance, for getting more references was to enforce adding references, or, to overcome the language challenge of Wikidata, was to introduce a record button to allow spoken languages.

After each sprint participants presented their ideas and shared them all participants. Based on the discussions in the room, the participants founded the WikidataProject: DataQuality.

The workshop was summarized by the second keynote speaker, Daniel Mietchen (video, slides)

