Human Centered Computing

Crawler for Scientific Collaboration Data

Betreuer: Prof. Dr. Claudia Müller-Birn
Fach: Information Retrieval, Data/Community Analytics
Abschluss: Bachelor of Science (B.Sc.)

Voraussetzungen:

  • Gute Kenntnisse im Bereich Webtechnologien sind wünschenswert
  • Interesse an Data Analytics
  • Gute Englischkenntnisse für die Literaturanalyse

Inhalt

Scientific collaboration has been defined as “individuals who differ in notable ways sharing information and working toward a particular purpose“ [1]. One way, to describe scientific collaboration is by using co-authorship relations between scientists. Even though, co-authorship are a valid proxy for collaboration since sharing of authorship reflects a kind of mutual engagement. Another less explored approach is analyzing collaboration by scientists based on successfully funded research projects. In this research, the goal is to write a crawler that allows to collect data (regularly) from a website (GEPRIS), to preprocess the data and to save them in a database. This database can then be used for analytical purposed such as analyzing the degree of collaboration in the German scientific community over time for example by using techniques from network analysis.

The goal of this thesis is to collect, prepare and analyze data from GEPRIS - the Funded Projects Information System of the German Research Foundation.

The GEPRIS website offers information on publicly funded research projects. For each project, GEPRIS provides, amongst others, the names of the submitters and their affiliation, the funding period, a description of the project, the associated discipline and the grant program. The data cover all sorts of funding initiatives and schemes from individual grants programs to coordinated programs. Since there is no publicly available API available, the data needs to be crawled from the website. In order to collect the available data, the following steps has to be taken:

  1. Familiarize with the available data
  2. Create a data model that reflect these data
  3. If needed use additional information (for example the DFG overview on disciplines).
  4. Develop mechanism to evaluate the completeness and correctness of the data collected. 

References

  • Amabile, T. M., Patterson, C., Mueller, J., Wojcik, T., Odomirok, P. W., Marsh, M., et al. (2001). Academic-practitioner collaboration in management research: A case of cross-profession collaboration. The Academy of Management Journal, 44 (2), 418–431. Available from http://dx.doi.org/10.2307/3069464