This thesis is a contribution to the field of systems biology, where complex processes such as metabolism, gene regulation, or immune responses are formulated as mathematical representations to gain a comprehensive view. In order to create such a representation, called model, main characteristics of the system need to idealized and simplified, where different modeling formalisms require different levels of simplification. This level can be seen as a trade of between loosing details and the amount of necessary information to validate this model. Often models are built even though there is not enough information about the biological system available, which is circumvented by making assumptions. In this thesis, an alternative approach is presented, where the lack of information is included as uncertainty in the system. This uncertainty is used as constraints to create not one but every possible model that lies within these constraints giving rise to a pool of models. In our group, software for building and analyzing these model pools in form of logical models was available, thus my work focuses on the biological application of this approach. The main task was to define how biology is translated into the mathematical formalism, to identify which kind of biological questions can be addressed and to interpret the mathematical results for gaining new biological insight. These tasks were collected in a toolbox and applied to three different signaling systems that are interesting for cancer research. I investigated the effect of mutations on a signaling processes, connected two pathways with uncertain crosstalk and investigated the controversial regulation of a protein complex involved in metabolism and cancer signaling.
This thesis is a contribution to the field of systems biology, which is concerned with mathematical and computational modelling of biological systems. The aim of the field is to understand biological processes via holistic computational methods. One of the standing problems in systems biology is how to derive model of a system, preferably one easily understandable by humans, from experimental data and observations. Understandably, the structure of the problem depends heavily on the system of interest and the available data, therefore it is worthwhile to create new methods that utilize particular features, as there can hardly be a universal solution. Here we present an approach for modelling and analysis of complex biological networks that uses a high-level, abstract modelling framework---the multi-valued logical networks. In this framework we employ an automated method originating in the theoretical computer science, called model checking, which allows for formal reasoning about dynamical systems. We can then create a multitude of candidate models and use model checking method to compare the behaviour of these to experimental data. Our approach however produces high volumes of data. To be able to work with the data we use basic statistical methods, which allow us to summarize the dataset into a few key values. In addition, these values can be subsequently compared between multiple datasets. For better understanding we couple these methods with an interactive visualization software. The whole framework is implemented in a tool called TREMPPI, which is available under an open-source license and distributed together with this thesis. We illustrate the functions of TREMPPI on three biological studies---two human signalling pathways, related to cancer, and a protection mechanism of the bacteria E. Coli.
This thesis addresses three challenges in modeling regulatory and signal transduction networks. Starting point is the generalized logical formalism as introduced by R. Thomas and further developed by D. Thieffry, E. H. Snoussi and M. Kaufman. We introduce the fundamental concepts that make up such models, the interaction graph and the state transition graph, as well as model checking, a computer science technique for deciding whether a finite transition system satisfies a given temporal specification. The first problem we turn to is that of whether a given model is consistent with time series data. To do so we introduce query patterns that can be automatically derived from discretized data. Time series data, being such an abundant source of information for reverse engineering, has previously been used in the context of logical models but only under the synchronous, transition-based notion of consistency. The arguably more realistic asynchronous transition relation has so far been excluded from such data driven reverse engineering, probably because the corresponding non-determinism in the transition system introduces additional obstacles to the already hard problem. Our contribution here is a path-based notion of consistency between model and data that works for any transition relation. In particular, we discuss linear time properties like monotony and branching time properties like robustness. The result are several query patterns, similar to but more complex than the ones proposed by P. T. Monteiro et al. A toolbox, called TemporalLogicTimeSeries for the automated construction of queries from data is also presented. The second problem we turn to concerns the two types of long-term behaviors that logical models are capable of producing: steady states, in which the activity levels of all network components are kept at a fixed value, and cyclic attractors in which some components are unsteady and produce sustained oscillations. We attempt to understand the emergence of these behaviors by searching for symbolic steady states as defined by H. Siebert. Our main contribution is the introduction of the prime implicant graph, which describes all minimal conditions under which components may change their activities, and an optimization-based algorithm for the enumeration of all maximal and minimal symbolic steady states. Essentially, we generalize the canalizing effects and forcing structure that were first introduced and studied by S. Kauffman and F. Fogelman in the context of random Boolean networks. The chapter includes a theorem that relates symbolic steady states to the existence of positive feedback circuits in the interaction graph. A toolbox, called BoolNetFixpoints that implements our algorithm is also described. The theme of the last chapter is how to deal with uncertainties that inevitably appear during the modeling of biological systems. One is often forced to resolve them since most types of analysis require a single, fully specified model. The knowledge gap is usually filled by making simplifications or by introducing additional assumptions that are hard to justify and therefore somewhat arbitrary. The alternative is to work with and analyze sets of alternative models, rather than single models. This idea entails additional theoretical and practical challenges: With which language should we describe our partial knowledge about a system? How can predictions be made given that each model in the set may behave differently? How can hypotheses and additional data be added to the current knowledge in a systematic manner? It seems that there are in principle two different approaches. The first one is constraint-based and studied by F. Corblin et al. It translates the partial knowledge and modeling formalism into facts and rules of a logic program. Common solvers can then deduce additional properties or test the validity of given queries across all models. In contrast, we propose to study the pros and cons of an explicit approach that enumerates all models that agree with a given partial specification. During the first step, models are enumerated and stored in a database. During a second step, models are annotated with additional information that is obtained from custom algorithms. The relationships between the annotations are then analyzed in a third step. The chapter is based on the prototype implemention LogicModelClassifier that performs the discussed steps. Throughout, we apply our results to two previously published models of biological systems. The first one is a small model of the galactose switch which regulates the transcription of genes that are involved in the metabolism of yeast. We address questions that arise during the construction of the model, for example the number of involved components and their interactions, as well as issues related to model validation and model revision with time series data. The case study also discusses different approaches to data discretization. The second one is a medium size model of the MAPK network studied by D. Thieffry et al. that is used to predict the cell fate response to different stimuli involving the growth factors EGF, TGFB, FGF and DNA damage. With the methods developed in this thesis we can prove that the model is capable of 18 different asymptotic behaviors, 12 of them steady states and 6 cyclic attractors. The question of which attractor is reached from which initial state is answered and we can show that the response in terms of proliferation or growth arrest and apoptosis is fully determined by the input stimulus.
Mathematical modelling of biological networks can help us understand the complex mechanisms that are behind cell proliferation, differentiation or other cellular processes. From these models, we are able to replicate and predict system behaviour that can help in the design of experiments in the systems biology context. Multiple formalisms capture the evolution or dynamics of a system as implied by the network. Ordinary differential equation (ODE) models provide a precise representation of the system, where the concentrations of network components evolve based on chemical kinetics, e.g. mass action kinetics. The kinetic parameters required to generate the dynamics accurately, however, are often lacking, which has led to the development of more qualitative or discrete modelling methods. Discrete formalisms, like the well known Thomas formalism, provide a very coarse representation of the systems dynamics, whilst still highlighting fundamental features of the network structure. When modelling a given system, it could occur that the different approaches yield contrary dynamics. From a modelling perspective, this is highly impractical as we expect the system to behave uniquely irrespective of the modelling approach used. By mathematically relating different formalisms, we can analyse the dynamics of the formalisms and determine conditions for which the dynamics of each formalism are common or contrary between formalisms. Hybrid modelling approaches, that is formalisms that combine discrete and continuous methods, help in relating the purely discrete Thomas formalism with the purely continuous ODE formalism. Approximating the ODEs, we obtain piecewise affine differential equations (PADEs), which have well defined dynamics that can be discretised to reflect features of the Thomas formalism. Incorporating the hybrid formalism of PADEs into our analysis, we can break up the otherwise rough transformation between ODE and Thomas formalisms in order to specify the conditions for contrary dynamics to occur between formalisms. Our main result compares the qualitative approach of PADEs with the Thomas formalism. In particular, we show that even though the qualitative parameter information of the PADEs is inherent in the Thomas formalism and vice versa, the dynamics in both models still yield contrary dynamics. However, with the well-defined correspondences of the transition systems implied by the two approaches, we can provide proofs of paths and terminal strongly connected components that are common between both formalisms. With our analysis, we bridge the gap between discrete and continuous modelling methods. More specifically, we establish the dynamics that is common regardless of the choice of formalism and the dynamics that can be seen as artefacts of the formalism. From this analysis, therefore, we achieve a more rigorous modelling framework that allows us to model and predict biological systems with greater accuracy.