Scientific collaboration has been defined as “individuals who differ in notable ways sharing information and working toward a particular purpose“ . One way, to describe scientific collaboration is by using co-authorship relations between scientists. Even though, co-authorship are a valid proxy for collaboration since sharing of authorship reflects a kind of mutual engagement. Another less explored approach is analyzing collaboration by scientists based on successfully funded research projects. In this research, the goal is to write a crawler that allows to collect data (regularly) from a website (GEPRIS), to preprocess the data and to save them in a database. This database can then be used for analytical purposed such as analyzing the degree of collaboration in the German scientific community over time for example by using techniques from network analysis.
The GEPRIS website offers information on publicly funded research projects. For each project, GEPRIS provides, amongst others, the names of the submitters and their affiliation, the funding period, a description of the project, the associated discipline and the grant program. The data cover all sorts of funding initiatives and schemes from individual grants programs to coordinated programs. Since there is no publicly available API available, the data needs to be crawled from the website. In order to collect the available data, the following steps has to be taken: