Springe direkt zu Inhalt
Schaufenster: Mariam Farda-Sarbas

Schaufenster: Mariam Farda-Sarbas
Image Credit: Mariam Farda-Sarbas

Mariam is a DAAD scholarship holder and researcher at the Human-Centered Computing Research Group. Her work focuses on data and user diversity in Wikidata.

In December 2017, I arrived in Germany from my home country, Afghanistan, to pursue my Ph.D. Although I had previously earned my master's degree in Germany in 2010, embarking on a Ph.D. journey in a foreign country still proved to be a profound experience that intertwines personal growth with academic exploration. After successfully submitting my dissertation in September 2023 and while preparing my defense that will take place in April 2024, it is time to reflect on the academic and personal journey that I've completed over the last few years. 

Exploring Multilingualism in the Semantic Web Landscape

I started my PhD with a topic that focused on multilingualism in the vast domain of the Semantic Web. Gradually, I narrowed my focus to the application domain of Wikidata, an open and structured knowledge base. Bringing these two topics and fields of interest together, I decided to investigate the multilingual dimension of Wikidata. I approached this field by first conducting a mapping study of the related research landscape. My study revealed dominant research topics, areas of limited research, and notable gaps. With this study, I could map and understand ongoing research on multilingualism in the Wikidata context, which led to my insight into the overarching topic of knowledge diversity, which consequently became the focus of my Ph.D. thesis.

Intrigued by Wikidata's aspirations to become a global knowledge base, I observed its escalating popularity, particularly as a foundational data source for numerous projects. However, a notable challenge was emerging simultaneously: the significant impact of automated edits by a limited group of bots (automated accounts run by operators) that overshadowed the contributions of tens of thousands of human users. Seen against the backdrop of Wikidata's ambition to become a global knowledge base, the impact of automated bot edits revealed an interesting dichotomy: A platform that is primarily driven by contributions from bots raises questions about the actual diversity and quality of the global knowledge it seeks to serve.

Motivated to explore the impact of bot edits on Wikidata's global knowledge representation, I wondered how this automated curation could authentically reflect the diversity of the world's population. As someone from an Eastern background working on a predominantly Western project, I also sought to understand how Wikidata could be used to better serve my language and culture.

My research evolved into an in-depth exploration of bots, diversity, and data quality in the context of Wikidata. With a group of fellow students, we collected, preprocessed, and analyzed bot request pages, leading to the publication of my first paper. This marked the beginning of a broader endeavor - creating a dataset of Wikidata's edit history to evaluate the impact of bot edits on data quality compared to human edits. The focus of my research still remained on diversity, which I addressed by introducing a diversity measurement concept from two main angles: user and data. I subsequently analyzed the impact of bots on the existing diversity levels of Wikidata.

Navigating Unforeseen Challenges: The Pandemic and Personal Struggles

The year 2020 brought unforeseen challenges with the onset of the COVID-19 pandemic, which disrupted research plans, deadlines, and schedules. Despite quarantine restrictions and the responsibility of caring for my child at home, I continued to explore the concept of diversity in the context of Wikidata. Late-night study sessions became a sanctuary, providing both progress and a welcome distraction from the ongoing challenges. As the pandemic continued, a shift to online work and study became the norm. Taking additional courses in data science equipped me with the skills necessary to analyze the data I had collected and compiled comprehensively. In early 2021, a year-long maternity leave after the birth of my second child offered both respite and reflection. Unfortunately, it coincided with the darkest days in my homeland, when women's basic rights were brutally denied. Amid this turmoil, my motivation waned, and my faith in "diversity" faltered. However, determined to take advantage of the opportunity I was given, I decided to turn this challenge into motivation.

Resilience, feedback, and the path to graduation

Upon returning to work in 2022, I navigated the complex landscape of academia with renewed vigor. Regular doctoral seminars and meetings, progress presentations, and invaluable feedback from my advisor and research team helped refine and enrich my work. More and more chapters of my dissertation took shape, and feedback encouraged me to look at my research from different angles. Reflecting on this academic journey reveals a narrative woven with resilience, adaptability, and a dedication to contribute to the ever-changing realm of knowledge. From navigating the complexities of Wikidata to facing the challenges of a global pandemic, each experience has not only influenced my research but also enriched my understanding of the transformative potential found in determination and commitment. As I approach my dissertation defense, I confidently anticipate the next chapter, knowing that this journey has prepared me to make meaningful contributions to academia and beyond.

Text: Mariam Farda-Sarbas, edited by Katrin Glinka