PhD in Mathematics/Bioinformatics
The Bromodomain and Extraterminal domain (BET) protein family is constituted by: BRDT, BRD2, BRD3, and BRD4. These proteins are characterised by a common structure consisting of two bromodomains and an extraterminal domain. The most widely studied BET protein is BRD4, which has emerged as a key transcription regulator. BRD4 plays a fundamental role in regulating transcription initiation, elongation, and termination. Additionally, BRD4 is enriched in super-enhancers, and although it has been linked to promoter-enhancer contacts, much remains unknown about the underlying mechanism.
The thesis project aims to characterise the main determinants of BRD4 chromatin binding through an integrative computational approach. For this purpose, the project will look for the transcription factors and histone modifications that colocalise with BRD4. Specifically, machine learning algorithms will be used to predict the genomic binding of BRD4.
Understanding control mechanisms present in biological processes is crucial for the development of potential therapeutic applications, for instance cell reprogramming or drug target identification. Experimental approaches aimed at identifying possible control targets are usually costly and time-consuming. Mathematical modeling provides a formal framework to study biological systems and to predict potential successful candidate interventions. A common modeling framework is Boolean modeling, which stands out for its ability to capture the qualitative behavior of the system using coarse representations of the interactions between the components, overcoming the usual parametrization problem.
The main goal of this thesis is the study of the control problems present in biological systems and the development of efficient and complete approaches for control strategy identification. In particular, we aim at developing methods to identify sets of minimal controls that are able to induce the desired states in biological systems modeled by Boolean networks. With the goal of making our approaches attractive for application, we establish two key factors: efficiency and diversity. We want our approaches to be able to deal with state-of-the-art networks in a reasonable amount of time while providing as many different optimal control sets as possible. With these factors in mind, we developed two different approaches to meet these needs.
El-Athman, Rukeia: Computational analysis of circadian splicing events in human cancer cell lines and mammalian tissues
The circadian clock regulates physiology and behavior of various organisms in synchrony with daily environmental rhythms. At the cellular level, circadian rhythmicity is driven by the interplay of clock genes and proteins that interact via negative feedback loops, thereby causing oscillations with a period of 24 h in the expression of numerous target genes. The resulting rhythms in the abundance of proteins and other biomolecules are responsible for the temporal organization of diverse biological processes. Accumulating evidence suggests that alternative splicing might be one of these clock-controlled processes. Alternative splicing describes a versatile mechanism of gene regulation that generates several distinct protein isoforms from a single gene via the differential inclusion or exclusion of alternate RNA regions. Both disruptions of the circadian clock and aberrant splicing are associated with carcinogenesis and tumor progression. This dissertation seeks to answer the question whether mammalian alternative splicing is regulated by the circadian clock, and whether the hypothesized regulation differs between cancer cells in different tumor stages. In particular, it tries to elucidate whether changes in circadian regulated splicing events could be responsible for the production of protein isoforms that contribute to the malignant development of cancer cells. The study is based on data from two human colon cancer cell lines, SW480 and SW620, that have been derived from a primary tumor and a metastasis of the same patient and thus serve as an in vitro model of colorectal tumor progression. A computational analysis was conducted to identify 24 h rhythmic genes and alternative splicing events on transcriptome-level based on the time-series data of both cell lines. As a reference, previously published time-series data of numerous healthy tissues from mouse and baboon organs were analyzed. The analysis revealed differences in the circadian phenotype of the two cell lines, with the metastasis-derived cell line SW620 exhibiting a stronger dysregulation of circadian rhythmicity. Furthermore, this work shows that splicing-related genes and putative splicing events display 24 h rhythms that differ between primary tumor- and metastasis-derived cells. Both in healthy tissues and cancer cells, rhythmic splicing was found to affect many genes that are themselves involved in splicing, suggesting a partial autoregulation of the process. Several of the spliced candidate genes encode for protein isoforms that are involved in processes promoting tumor progression, such as migration and angiogenesis. Taken together, the results presented in this dissertation point to a circadian regulation of alternative splicing that plays a role in cancer development.
Schwieger, Robert: Combining Boolean Networks and Ordinary Differential Equations for Analysis and Comparison of Gene Regulatory Networks
This thesis is concerned with different groups of qualitative models of gene regulatory networks. Four types of models will be considered: interaction graphs, Boolean networks, models based on differential equations and discrete abstractions of differential equations. We will investigate the relations between these modeling frameworks and how they can be used in the analysis of individual models. The focus lies on the mathematical analysis of these models. This thesis makes several contributions in relating these different modeling frameworks. The first approach concerns individual Boolean models and parametrized families of ordinary differential equations (ODEs). To construct ODE models systematically from Boolean models several automatic conversion algorithms have been proposed. In Chapter 2 several such closely related algorithms will be considered. It will be proven that certain invariant sets are preserved during the conversion from a Boolean network to a model based on ODEs. In the second approach the idea of abstracting the dynamics of individual models to relate structure and dynamics will be introduced. This approach will be applied to Boolean models and models based on differential equations. This allows to compare groups of models in these modeling frameworks which have the same structure. We demonstrate that this constitutes an approach to link the interaction graph to the dynamics of certain sets of Boolean networks and models based on differential equations. The abstracted dynamics – or more precisely the restrictions on the abstracted behavior – of such sets of Boolean networks or models based on differential equations will be represented as Boolean state transitions graphs themselves. We will show that these state transition graphs can be considered as asynchronous Boolean networks. Despite the rather theoretical question this thesis tries to answer there are many potential applications of the results. The results in Chapter 2 can be applied to network reduction of ODE models based on Hill kinetics. The results of the second approach in Chapter 4 can be applied to network inference and analysis of Boolean model sets. Furthermore, in the last chapter of this thesis several ideas for applications with respect to experiment design will be considered. This leads to the question how different asynchronous Boolean networks or different behaviours of a single asynchronous Boolean network can be distinguished.
Cell classifiers are synthetic bio-devices performing type-specific in vivo classification of the cell’s molecular fingerprint. In particular, they can recognize cancerous cells and trigger their apoptosis, shaping novel therapies for cancer patients. Here, the classifiers describe the relationship between cells’ molecular profiles and their annotation as cancerous or non-cancerous. Such a relationship can be represented as a partially defined logical function where the output indicates the cell condition. A single circuit’s processing logic is usually described using a larger individual Boolean function, whereas multi-circuit classifiers are ensembles of simpler logic designs. Such a distributed classifier consists of a group of single-circuit classifiers deciding collectively whether a cell is cancerous according to a predefined threshold function. Both architectures have shown the potential to predict the cell condition with high accuracy. However, the lack of comprehensive workflows to design and evaluate the classifiers, in particular, assessing their robustness to noise and novel information, makes their application limited.
We propose a framework for designing miRNA-based distributed cell classifiers, employing genetic algorithms and Answer Set Programming. We develop optimization criteria comprising the accuracy and robustness of the circuits and train classifiers that achieve high performance (89.78% accuracy for the most-perturbed data set), as shown in multiple simulated data studies. The evaluation performed on cancer data demonstrates that distributed classifiers outperform single-circuit designs by up to 13.40%. Our workflow provides inherently interpretable classifiers that comprise relevant miRNAs previously described in the literature, as well as more complex regulation patterns underlying the data. Ultimately, we show how our approach can be applied to other binary classification problems comprising different biological modalities such as gene expression or mutation patterns providing interpretable classifiers
This thesis is a contribution to the field of systems biology, where complex processes such as metabolism, gene regulation, or immune responses are formulated as mathematical representations to gain a comprehensive view. In order to create such a representation, called model, main characteristics of the system need to idealized and simplified, where different modeling formalisms require different levels of simplification. This level can be seen as a trade of between loosing details and the amount of necessary information to validate this model. Often models are built even though there is not enough information about the biological system available, which is circumvented by making assumptions. In this thesis, an alternative approach is presented, where the lack of information is included as uncertainty in the system. This uncertainty is used as constraints to create not one but every possible model that lies within these constraints giving rise to a pool of models. In our group, software for building and analyzing these model pools in form of logical models was available, thus my work focuses on the biological application of this approach. The main task was to define how biology is translated into the mathematical formalism, to identify which kind of biological questions can be addressed and to interpret the mathematical results for gaining new biological insight. These tasks were collected in a toolbox and applied to three different signaling systems that are interesting for cancer research. I investigated the effect of mutations on a signaling processes, connected two pathways with uncertain crosstalk and investigated the controversial regulation of a protein complex involved in metabolism and cancer signaling.
This thesis is a contribution to the field of systems biology, which is concerned with mathematical and computational modelling of biological systems. The aim of the field is to understand biological processes via holistic computational methods. One of the standing problems in systems biology is how to derive model of a system, preferably one easily understandable by humans, from experimental data and observations. Understandably, the structure of the problem depends heavily on the system of interest and the available data, therefore it is worthwhile to create new methods that utilize particular features, as there can hardly be a universal solution. Here we present an approach for modelling and analysis of complex biological networks that uses a high-level, abstract modelling framework---the multi-valued logical networks. In this framework we employ an automated method originating in the theoretical computer science, called model checking, which allows for formal reasoning about dynamical systems. We can then create a multitude of candidate models and use model checking method to compare the behaviour of these to experimental data. Our approach however produces high volumes of data. To be able to work with the data we use basic statistical methods, which allow us to summarize the dataset into a few key values. In addition, these values can be subsequently compared between multiple datasets. For better understanding we couple these methods with an interactive visualization software. The whole framework is implemented in a tool called TREMPPI, which is available under an open-source license and distributed together with this thesis. We illustrate the functions of TREMPPI on three biological studies---two human signalling pathways, related to cancer, and a protection mechanism of the bacteria E. Coli.
This thesis addresses three challenges in modeling regulatory and signal transduction networks. Starting point is the generalized logical formalism as introduced by R. Thomas and further developed by D. Thieffry, E. H. Snoussi and M. Kaufman. We introduce the fundamental concepts that make up such models, the interaction graph and the state transition graph, as well as model checking, a computer science technique for deciding whether a finite transition system satisfies a given temporal specification. The first problem we turn to is that of whether a given model is consistent with time series data. To do so we introduce query patterns that can be automatically derived from discretized data. Time series data, being such an abundant source of information for reverse engineering, has previously been used in the context of logical models but only under the synchronous, transition-based notion of consistency. The arguably more realistic asynchronous transition relation has so far been excluded from such data driven reverse engineering, probably because the corresponding non-determinism in the transition system introduces additional obstacles to the already hard problem. Our contribution here is a path-based notion of consistency between model and data that works for any transition relation. In particular, we discuss linear time properties like monotony and branching time properties like robustness. The result are several query patterns, similar to but more complex than the ones proposed by P. T. Monteiro et al. A toolbox, called TemporalLogicTimeSeries for the automated construction of queries from data is also presented. The second problem we turn to concerns the two types of long-term behaviors that logical models are capable of producing: steady states, in which the activity levels of all network components are kept at a fixed value, and cyclic attractors in which some components are unsteady and produce sustained oscillations. We attempt to understand the emergence of these behaviors by searching for symbolic steady states as defined by H. Siebert. Our main contribution is the introduction of the prime implicant graph, which describes all minimal conditions under which components may change their activities, and an optimization-based algorithm for the enumeration of all maximal and minimal symbolic steady states. Essentially, we generalize the canalizing effects and forcing structure that were first introduced and studied by S. Kauffman and F. Fogelman in the context of random Boolean networks. The chapter includes a theorem that relates symbolic steady states to the existence of positive feedback circuits in the interaction graph. A toolbox, called BoolNetFixpoints that implements our algorithm is also described. The theme of the last chapter is how to deal with uncertainties that inevitably appear during the modeling of biological systems. One is often forced to resolve them since most types of analysis require a single, fully specified model. The knowledge gap is usually filled by making simplifications or by introducing additional assumptions that are hard to justify and therefore somewhat arbitrary. The alternative is to work with and analyze sets of alternative models, rather than single models. This idea entails additional theoretical and practical challenges: With which language should we describe our partial knowledge about a system? How can predictions be made given that each model in the set may behave differently? How can hypotheses and additional data be added to the current knowledge in a systematic manner? It seems that there are in principle two different approaches. The first one is constraint-based and studied by F. Corblin et al. It translates the partial knowledge and modeling formalism into facts and rules of a logic program. Common solvers can then deduce additional properties or test the validity of given queries across all models. In contrast, we propose to study the pros and cons of an explicit approach that enumerates all models that agree with a given partial specification. During the first step, models are enumerated and stored in a database. During a second step, models are annotated with additional information that is obtained from custom algorithms. The relationships between the annotations are then analyzed in a third step. The chapter is based on the prototype implemention LogicModelClassifier that performs the discussed steps. Throughout, we apply our results to two previously published models of biological systems. The first one is a small model of the galactose switch which regulates the transcription of genes that are involved in the metabolism of yeast. We address questions that arise during the construction of the model, for example the number of involved components and their interactions, as well as issues related to model validation and model revision with time series data. The case study also discusses different approaches to data discretization. The second one is a medium size model of the MAPK network studied by D. Thieffry et al. that is used to predict the cell fate response to different stimuli involving the growth factors EGF, TGFB, FGF and DNA damage. With the methods developed in this thesis we can prove that the model is capable of 18 different asymptotic behaviors, 12 of them steady states and 6 cyclic attractors. The question of which attractor is reached from which initial state is answered and we can show that the response in terms of proliferation or growth arrest and apoptosis is fully determined by the input stimulus.
Jamshidi, Shahrad: Comparing discrete, continuous and hybrid modelling approaches of gene regulatory networks
Mathematical modelling of biological networks can help us understand the complex mechanisms that are behind cell proliferation, differentiation or other cellular processes. From these models, we are able to replicate and predict system behaviour that can help in the design of experiments in the systems biology context. Multiple formalisms capture the evolution or dynamics of a system as implied by the network. Ordinary differential equation (ODE) models provide a precise representation of the system, where the concentrations of network components evolve based on chemical kinetics, e.g. mass action kinetics. The kinetic parameters required to generate the dynamics accurately, however, are often lacking, which has led to the development of more qualitative or discrete modelling methods. Discrete formalisms, like the well known Thomas formalism, provide a very coarse representation of the systems dynamics, whilst still highlighting fundamental features of the network structure. When modelling a given system, it could occur that the different approaches yield contrary dynamics. From a modelling perspective, this is highly impractical as we expect the system to behave uniquely irrespective of the modelling approach used. By mathematically relating different formalisms, we can analyse the dynamics of the formalisms and determine conditions for which the dynamics of each formalism are common or contrary between formalisms. Hybrid modelling approaches, that is formalisms that combine discrete and continuous methods, help in relating the purely discrete Thomas formalism with the purely continuous ODE formalism. Approximating the ODEs, we obtain piecewise affine differential equations (PADEs), which have well defined dynamics that can be discretised to reflect features of the Thomas formalism. Incorporating the hybrid formalism of PADEs into our analysis, we can break up the otherwise rough transformation between ODE and Thomas formalisms in order to specify the conditions for contrary dynamics to occur between formalisms. Our main result compares the qualitative approach of PADEs with the Thomas formalism. In particular, we show that even though the qualitative parameter information of the PADEs is inherent in the Thomas formalism and vice versa, the dynamics in both models still yield contrary dynamics. However, with the well-defined correspondences of the transition systems implied by the two approaches, we can provide proofs of paths and terminal strongly connected components that are common between both formalisms. With our analysis, we bridge the gap between discrete and continuous modelling methods. More specifically, we establish the dynamics that is common regardless of the choice of formalism and the dynamics that can be seen as artefacts of the formalism. From this analysis, therefore, we achieve a more rigorous modelling framework that allows us to model and predict biological systems with greater accuracy.