Defensio Abstract

Speaker Ulrike Mückstein
Title Sequence-Structure Relations of Single RNA Molecules and Cofolded RNA Complexes


In this work we investigated the folding of RNA sequences into secondary structures from different perspectives. One way to describe the relation between single RNA molecules and their secondary structures is the mapping of sequence into structure. In this mapping the preimage is the set of all possible sequences of a given length and alphabet, the image is the set of secondary structures adopted by the sequences. When viewed in the context of biological evolution the sequence is the object under variation, whereas the structure is the target of selection. Thus RNA sequence to structure mapping provides a suitable mathematical model to extract robust statistical properties of the evolutionary dynamics based on RNA replication and mutation.

Within the last years RNA sequence structure maps were analyzed in great detail by the group of Peter Schuster. In the first part of my thesis the results of this analysis were reevaluated by exhaustive folding and enumeration of the sequence spaces $\mathcal{I}^{(\ell = 9)}_{{AUGC}}$ and $\mathcal{I}^{(\ell = 10)}_{{AUGC}}$, where {A,U,G,C} is the alphabet and $\ell$ the sequence length. We were able to prove the results of previous studies by considering only the set of sequences that fold into stable secondary structures, i.e. structures with negative free energies: As expected there are more sequences than structures. The frequency distribution of secondary structures is highly biased. The majority of sequences fold into few common structures. Common structures form extended neutral networks. We examined the topology $\mathcal{I}^{(9)}_{{AUGC}}$ and $\mathcal{I}^{(10)}_{{AUGC}}$ by partitioning sequences into components defined by neighbourhood relation and structural criteria. Using stepwise less stringent criteria for the construction of components, we could show that one extensively connected network exists in each of the two sequence spaces. This fact is remarkable because an overwhelming percentage of sequences does not fold at all and we have to expect that the sequences forming stable structures are embedded in a sea of sequences having the open chain as image. The explanation of the apparent paradox is the high dimension of sequence space, nine for $\mathcal{I}_{AUGC}^{(9)}$ and ten for $\mathcal{I}_{AUGC}^{(10)}$: Distances are short in high-dimensional spaces and connected preimages of structures, which are infrequent compared to the open chain, can readily span distances of the diameter of sequence space. Furthermore we could demonstrate that shape space covering, which says that it is sufficient to screen a high dimensional sphere around an arbitrarily chosen sequence in order to find at least one sequence for every common structure, holds in $\mathcal{I}^{(9)}_{{AUGC}}$ and $\mathcal{I}^{(10)}_{{AUGC}}$.

The role of structure neutral networks in evolutionary dynamics has been studied by Peter Schuster's group using computer simulations of RNA population in a flow reactor. We examined relay series of different evolutionary trajectories of the alphabet {A,U,G,C} to extract common features of RNA structure optimization. We found that relay series may not only be monotonic sequences of structures with increasing fitness converging to a target structure but may contain structures that are visited more than once in the process of evolutionary optimization. Such structures differ from structures visited only once during the optimization process by having a restricted set of common neighbors showing the same fitness. Furthermore the accessibility relations between common neighbors are highly symmetric, favoring an easy conversion between these structures.

In the last years the importance of sequence specific interaction between two RNA molecules in the regulation of gene expression became increasingly apparent. We developed a method to study significant aspects of RNA-RNA interaction. By applying a modified version of McCaskill's partition function algorithm to RNA co-folding we can provide detailed information about the location of an RNA-RNA interaction, about the structural context of the binding site and also about the energetics of an RNA-RNA interaction. The application of the partition function to this problems allows not only a compensation of errors resulting from an inherent imprecision of secondary structure prediction algorithms but also a more exact description of the interaction than provided by sampling methods.