Thema der Dissertation:
Using interpretable machine learning to understand gene silencing dynamics during x-chromosome inactivation
Thema der Disputation:
Interpreting Random Forest models to gain mechanistic insights into biological systems
Using interpretable machine learning to understand gene silencing dynamics during x-chromosome inactivation
Thema der Disputation:
Interpreting Random Forest models to gain mechanistic insights into biological systems
Abstract: Recent technological advances in molecular science have made it possible to analyse biological systems in a high throughput fashion. The availability of large data sets gives us the unique opportunity to answer challenging biological questions, where the underlying mechanisms are complex and depend on the interplay of many different regulatory factors. However, efficient analysis of such large and complex data sets is merely impossible by visual investigation or traditional statistical methods. Instead, machine learning (ML) algorithms offer the opportunity to systematically detect underlying patterns in the analysed data sets. A key impediment in using complex ML models to gain new mechnistic insights is their frequent lack in transparency. Complex ML models are often considered to be “Black Boxes”, because it can be hard or nearly impossible to understand why certain predictions have been made by the model. Particularly in biology, it is more and more important to not just accurately predict the outcome of a biological system with a ML model but also to be able to uncover the mechanisms behind those biological systems that led to a certain outcome. To uncover the underlying mechanisms of a biological system, we have to work on the interpretability of our ML models. In my talk, I will briefly introduce the differences between model-based and post-hoc interpretation methods on the example of Decision Trees and Random Forest models. Afterwards, I will compare two broad categories of post-hoc interpretation methods: 1) model-agnostic methods, such as feature importance and shapley values that can be applied to any ML model and 2) model-specific methods, such as Tree SHAP and Tree Prototypes that were specifically developed for Tree Ensemble methods, like Random Forest models.
Zeit & Ort
18.12.2020 | 10:00