Towards a Taxonomy of Theories in Software Engineering / Erste Schritte zu einer Taxonomie für Theorien im Software Engineering (Bachelor- oder Masterarbeit)

09 Aug 2023 - 17:03 | Version 6 | UnknownUser

Advisor: Lutz Prechelt. The difficulty level of this topic for a Bachelor thesis is high (but you will learn a lot). It is very well-suited for a Master thesis, because it offers a lot of possibilities.

This page is in English to help possible international collaboration. The thesis can be written in English or German as you like.

Towards a Taxonomy of Theories in Software Engineering / Erste Schritte zu einer Taxonomie für Theorien im Software Engineering (Bachelor- oder Masterarbeit)

Background

Theory (as an approach to structuring a discipline and research work in that discipline) and theories (as concrete outcomes of that approach) play an important role in any natural science, social science, or engineering subject [!!!]. In software engineering (SE), work towards theories, let alone work that uses software engineering theories, is still in its infancy.

For improving this situation, it would likely be useful to collect existing theories, proto-theories (conjectures, hypotheses, etc.), and theory fragments so that researchers can more easily obtain an overview of what exists in order to decide whether they want to use some of that (and build upon it) or want to participate in forming or validating a theory or proto-theory.

Such a collection, however, will quickly become incomprehensible unless there is a good mechanism by which one can easily find all theories that have some property or properties of interest by means of some kind of search dialog and/or browsing.

Goal

In this work, you will

collect a set of theories and theory-oids (coming from software engineering or coming from some other field but having been used in software engineering),
write a short version ("theory abstract") of each,
based on a small, existing upper-level ontology, prototypically develop an SE-specific mid-level ontology that allows to approximately describe the content of the theory abstracts,
model the theories (or at least many of them) in this manner, and
provide these models together with the underlying SE-specific mid-level ontology in OWL format.
In the unlikely event that some time remains, we could a) validate the models, b) perform a usability test, and c) develop a workflow (with continuous integration) for extending the collection in the future.

The goal of the models is not to represent the theories with maximal accuracy. The goal is to make the models easy to create, with just enough detail that searching and browsing the collection can be supported well.

Suggestions for the approach

Not necessarily in exactly this order.

Familiarize yourself with Description Logic as a basis for knowledge representation. E.g. read chapters 1, 2, and 8 of "An Introduction to Description Logic".
Acquaint yourself with some software engineering theories (see "Starting points" below and pick for example two theories at random from each of the first three categories) to get a feel for what theories are talking about and how. Pay particular attention to the distinction between key properties of a theory as opposed to mere details.
Superficially learn a little OWL, e.g. from the Wikipedia article or by skimming (and only skimming! We will not use many of the more advanced constructs) the OWL 2 Primer.
Our work needs three kinds of parts:
- We will use an established upper-level ontology in order to make our work compatible with other work and to avoid re-inventing the wheel (or even making horrible mistakes).
- We will extend this with our own SE-specific mid-level ontology that provides the terminology used by the theories (classes and relationships).
- The latter is then used to express the theories themselves. This is the concrete data/facts level of the work, the ontology is the abstract modelling language level (metamodel).
As the upper ontology we will use BFO (Basic Formal Ontology), which is very small and very well established.
- Look at the overview on page 3 of the BFO handbook
- Watch the 2019 presentation by BFO inventor Barry Smith "Introduction to Basic Formal Ontology" and pay particular attention to the part on "Information Entities" near the end.
- Later, read the handbook where needed so that you do not misinterpret something relevant about BFO.
- BFO leans towards the biology domain in which it is mostly being developed and used and is therefore not strong in some of the abstractions we will need to model a lot.
  Our mid-level ontology will have to introduce a few very critical quasi-toplevel concepts.
For developing the mid-level ontology (our SE ontology), learn and try out the basics of GTM by means of units 4 and 5 of the course Empirical methods in software engineering (this will take about 1 day of highly concentrated work). The present work will likely not need all of GTM. Presumably, Open Coding, Theoretical Coding, Constant Comparison, and perhaps a bit of Axial Coding will be sufficient.
Together with the advisor, ask the Dagstuhl participants for pointers to further theories. Collect these.
Find at least 10 additional theories for category 2 of "Starting points".
Write each theory up in a compact, approximate form ("theory abstract", akin to abstracts of research articles). If you found very many theories, we will restrict this step to a useful subset.
Pick about half a dozen of theories from these such that the theories are very different, talking about very different things and/or in different ways.
Now review related work: #StartingPointsOntologies. Learn a little bit that might(!) go into the SE ontology from each related work.
Now learn the concrete tool for the modeling: Protégé.
- Install Protégé Desktop and familiarize yourself with it (as a tool) and with knowledge modeling.
  Use material from the Protégé Wiki, in particular the Getting Started tutorial and perhaps the "Pizzas in 10 minutes" modeling exercise
- Convince yourself that you know your stuff by solving the Protégé Murder Mystery
Using your half-dozen theories, develop their models and the SE ontology hand-in-hand:
- The SE ontology constructs will describe terms (entity types, relationship types) and relationships between terms (in particular subconcept relationships).
- Work incrementally as far as possible (one theory after another), but are prepared to rework your set of constructs as needed. Use GTM as a role model for your procedure in at least two respects:
  - Use Theoretical Coding to come up with construct names that explain (not just describe) and define each construct precisely.
  - Revise constructs when Constant Comparison suggests your current design is broken or ill-shaped.
- Balance expressivity against simplicity. Favor simplicity.
- The theory individuals should include lots of metadata, in particular
  - DOIs to publications defining the theory
  - The theory abstract used for the model
  - Creation date, change date
  - (this list will be extended a lot later)
- Write a short documentation for the modeling language.
If possible, perform usability tests on these models: Can subjects understand the model? (A way to find this out would be asking them what is missing in the model that is present in the theory abstract. It is OK to explain the abstract to them -- understanding what a theory is in the first place is difficult.)
Modify the SE ontology and models according to the outcomes of the usability test.
If the SE ontology appears to do the job, model as many of the theory abstracts as you can manage in good quality.
Write a thesis that describes
- how the process went,
- what your design alternatives were (and which you picked and why),
- what was most difficult and how you solved it, and
- what the most important limitations of the resulting language are.

Starting points: SE Theories

The theories mentioned in [HanSjoDyb07]
Software engineering articles the title of which contains the terms "theory" or "grounded theory".
The set of theories brought (by the organizers) into Dagstuhl seminar 22231 as examples:
Theories or theory fragments in articles named by the participants of said Dagstuhl seminar when asking them.

Starting points: SE Ontologies

[AlmGomCru06], Section 7 (about research methods, not theories)
SEON, as well as [BorAlmPer16], an article about SEON, and articles that cite it
The three SE taxonomy articles cited in Stephanie Hohenberg's Master thesis

Literature

[AlmGomCru06] Jorge Calmon de Almeida Biolchini, Paula Gomes Mian, Ana Candida Cruz Natali, Tayana Conte, Guilherme Horta Travassos: Scientific research ontology to support systematic review in software engineering, Advanced Engineering Informatics 21:133–151, 2007.

[BorAlmPer16] Borges Ruy, Almeida Falbo, Perini Barcellos, Dornelas Costa, Guizzardi: SEON: A software engineering ontology network. In European Knowledge Acquisition Workshop (pp. 527-542). Springer, 2016.

[HanSjoDyb07] Hannay, Jo E., Dag IK Sjoberg, and Tore Dyba: A systematic review of theory use in software engineering experiments, IEEE Transactions on Software Engineering 33(2), 2007.