Hong Zhu:

Analyzing Behavioural Patterns in Online Knowledge Collaborations: A Case Study of Wikidata


  • Database Management
  • Python
  • Web Service
  • Sequential Pattern Mining
Data Processing, Data Mining
Bachelor of Science (B.Sc.)



A knowledge base is a technology used to store complex structured data used by a computer system. To create abundant knowledge-based goods, online knowledge collaboration is becoming a primary way as it allows distributed members self-organized to work for shared goals. Online knowledge collaboration, which is defined broadly as the sharing, accumulation, transformation, and co-creation of knowledge [1], elaborated its full development importance as we live in an information explosion time, and it enormously makes up the insufficiency in the traditional organizational mechanisms, with a view to the absence of stable membership, persistent interaction, or shared goals [2]. In the meanwhile, its rapid development increases the complexity of structured knowledge representations consequently since no more single authority is able to develop all. Hence, it is a considerable task for us to better understand and regulate the underlying co-production processes of how users collaboratively edit knowledge bases[3].


Existing studies investigated the structure of online knowledge collaboration predominately from a static perspective, yet overlooked the importance of interaction behavioural patterns in exploring the collaboration temporal dynamics. The availability of metadata from large-scale collaborative ontology projects like Wikidata bridged the gap between sequence analysis theory and temporal dynamics, however, studies of sequences are often restricted to a single perspective e.g. contributors or activities, instead of identifying patterns in multiple dimensions by constructing these factors into event log sequences.


The goal in this paper is to employ and extend the existing framework for analyzing editing motifs [2] by specifying it into a Wikidata domain, with outlining a conceptualization of studying the sequential editing behaviours by stressing more on temporal dynamics in knowledge collaboration systems. In order to illustrate the extended methodology as well as verify its feasibility to answer the research question – which identification of sequences is most effective in terms of data quality, a case study will be investigated by employing various sequence identifications, which helps us to better understand the complex processes in terms of contributor relationships, co-production patterns and sequential consequences.


The work is organized as follows:

  • Survey related researches concerning online knowledge collaborations, especially in the area of Wikidata, and sequence analysis.
  • Describe a methodology framework of investigating Wikidata’s edit behaviours that augmented from existing works. General approaches corresponding to diverse research directions will be specifically presented.
  • A case study adapting the proposed framework will be further investigated, where a data sample regarding 500 Wikidata items along with the metadata will be collected and constructed into sequences. Later an empirical investigation of data collected and the corresponding editing behaviours will be conducted by mining the sequential patterns with a data quality perspective.
  • Conclude this study together with discussions about the future works.


1] Samer Faraj, Sirkka L. Jarvenpaa, and Ann Majchrzak. “Knowledge Collaboration in Online Communities”. In: Organization Science 22.5 (Sept. 2011), pp. 1224–1239. issn: 1526-5455.doi: 10.1287/orsc.1100.0614. url: https://doi.org/10.1287/orsc.1100.0614

[2] Brian C. Keegan, Shakked Lev, and Ofer Arazy. “Analyzing Organizational Routines in Online Knowledge Collaborations: A Case for Sequence Analysis in CSCW”. In: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. CSCW ’16. San Francisco, California, USA: ACM, 2016, pp. 1065–1079. isbn: 978-1-4503- 3592-8. doi: 10.1145/2818048.2819962. url: http://doi.acm.org/10.1145/2818048. 2819962.

[3] Simon Walk, Philipp Singer, and Markus Strohmaier. “Sequential Action Patterns in Col- laborative Ontology-Engineering Projects: A Case-Study in the Biomedical Domain”. In:Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3-7, 2014. 2014, pp. 1349– 1358. doi: 10.1145/2661829.2662049. url: https://doi.org/10.1145/2661829.2662049.