The proceedings of the 2nd International Workshop on Semantics for Biodiversity have been published with Naouel Karam as co-editor. The workshop will take place in Vienna, Austria, October 22nd 2017, co-located with the 16th International Semantic Web Conference.
The research paper with the title "Bottom-up Taxon Characterisations with Shared Knowledge: Describing Specimens in a Semantic Context" written by Naouel Karam, together with Patrick Plitzner, Tilo Henning, Andreas Müller and Norbert Kilian from the Botanic Garden and Botanical Museum Berlin is part of the proceedings.
Using the angiosperm order Caryophyllales, we will provide an exemplar use case on optimizing the taxonomic research process with respect to delimitation and characterisation (“description”) of taxa using the European Distributed Institute of Taxonomy (EDIT) Platform for Cybertaxonomy. The workflow for sample data handling of the EDIT platform will be extended: Character data (data on genotypic and phenotypic characters of any type, here focusing on morphology) will be captured and stored in structured form. The structure consists of character and character state matrices for individual specimens instead of taxa, which shall allow to generate taxon characterisations by aggregating the data sets for the individual specimens included. To ensure data integrity, especially for the aggregation process, semantic web technologies will be used to establish and continuously elaborate expert community-coordinated exemplar vocabularies with term ontologies and explanations for characters and states. In cooperation with the "German Federation for Biological Data" (GFBio), the GFBio Terminology Service is used for publishing the ontologies via a public API. The EDIT platform will be extended to use and integrate the GFBio Terminology Service in order to work with the latest version of the ontology used for specimen respective taxon descriptions.
Also, the poster paper with the title "Terminologies as a neglected part of research data: Making supplementary research data available through the GFBio Terminology Service" by Naouel Karam, Claudia Müller-Birn, together with David Fichtmüller, Maren Gleisberg and Anton Güntsch from the Botanic Garden and Botanical Museum Berlin has been published in the proceedings.
In many research projects, much more data are created than made publicly available. Keeping research data deliberately closed or publishing only selected subsections of the gathered data are unfortunately common practices in academia. Fortunately, such problems have been getting more and more attention in the past years. However, another issue that is still often overlooked concerns research data that are generated as part of a research project but that are generally not considered part of the primary research data. One example for such neglected research data are terminologies such as controlled vocabularies that are used to describe or classify primary research data. In this paper we will outline the process that is used by the Terminology Service of the German Federation for Biological Data (GFBio) to prepare and process terminologies so that they can be included in the GFBio Terminology Service where they are made available to researchers within and outside the original research project. We will also show how making such supplementary research data publicly available will benefit the researchers who share them as well as the scientific community as a whole.
Another poster paper by Naouel Karam, written together with Felicitas Löffler and Friederike Klan from the FSU Jena, Claas-Thido Pfaff from University of Leipzig and David Fichtmüller from the Botanic Garden and Botanical Museum Berlin is also part of the proceedings. The title is "What do Biodiversity Scholars Search for? Identifying High-Level Entities for Biological Metadata".
Research questions in biodiversity are as diverse and heterogeneous as data are. Most metadata standards are mainly data-focused and pay little attention to the search perspective. In this work, we introduce a method to analyze the actual information need of biodiversity scholars based on two individual studies: (1) a series of workshops with domain experts and (2) an analysis of research and search questions collected in three different biodiversity projects. We finally present 12 high-level entities that appear in all kinds of biological data across the different sources evaluated.