Speaker | Ulrike Mückstein |
Title | Sequence-Structure Relations of Single RNA Molecules and Cofolded RNA Complexes |
In this work we investigated the folding of RNA sequences into secondary structures from different perspectives. One way to describe the relation between single RNA molecules and their secondary structures is the mapping of sequence into structure. In this mapping the preimage is the set of all possible sequences of a given length and alphabet, the image is the set of secondary structures adopted by the sequences. When viewed in the context of biological evolution the sequence is the object under variation, whereas the structure is the target of selection. Thus RNA sequence to structure mapping provides a suitable mathematical model to extract robust statistical properties of the evolutionary dynamics based on RNA replication and mutation.
Within the last years RNA sequence structure maps were analyzed in great
detail by the group of Peter Schuster. In the first part of my thesis the
results of this analysis were reevaluated by exhaustive folding and
enumeration of the sequence spaces $\mathcal{I}^{(\ell = 9)}_{{AUGC}}$ and
$\mathcal{I}^{(\ell = 10)}_{{AUGC}}$, where {A,U,G,C}
is the
alphabet and $\ell$ the sequence length. We were able to prove the results
of previous studies by considering only the set of sequences that fold into
stable secondary structures, i.e. structures with negative free energies:
As expected there are more sequences than structures. The frequency
distribution of secondary structures is highly biased. The majority of
sequences fold into few common structures. Common structures form extended
neutral networks. We examined the topology $\mathcal{I}^{(9)}_{{AUGC}}$ and
$\mathcal{I}^{(10)}_{{AUGC}}$ by partitioning sequences into components
defined by neighbourhood relation and structural criteria. Using stepwise
less stringent criteria for the construction of components, we could show
that one extensively connected network exists in each of the two sequence
spaces. This fact is remarkable because an overwhelming percentage of
sequences does not fold at all and we have to expect that the sequences
forming stable structures are embedded in a sea of sequences having the
open chain as image. The explanation of the apparent paradox is the high
dimension of sequence space, nine for $\mathcal{I}_{AUGC}^{(9)}$ and ten
for $\mathcal{I}_{AUGC}^{(10)}$: Distances are short in high-dimensional
spaces and connected preimages of structures, which are infrequent compared
to the open chain, can readily span distances of the diameter of sequence
space. Furthermore we could demonstrate that shape space covering, which
says that it is sufficient to screen a high dimensional sphere around an
arbitrarily chosen sequence in order to find at least one sequence for
every common structure, holds in $\mathcal{I}^{(9)}_{{AUGC}}$ and
$\mathcal{I}^{(10)}_{{AUGC}}$.
The role of structure neutral networks in evolutionary dynamics has been
studied by Peter Schuster's group using computer simulations of RNA
population in a flow reactor. We examined relay series of different
evolutionary trajectories of the alphabet {A,U,G,C}
to extract
common features of RNA structure optimization. We found that relay series
may not only be monotonic sequences of structures with increasing fitness
converging to a target structure but may contain structures that are
visited more than once in the process of evolutionary optimization. Such
structures differ from structures visited only once during the optimization
process by having a restricted set of common neighbors showing the same
fitness. Furthermore the accessibility relations between common neighbors
are highly symmetric, favoring an easy conversion between these structures.
In the last years the importance of sequence specific interaction between two RNA molecules in the regulation of gene expression became increasingly apparent. We developed a method to study significant aspects of RNA-RNA interaction. By applying a modified version of McCaskill's partition function algorithm to RNA co-folding we can provide detailed information about the location of an RNA-RNA interaction, about the structural context of the binding site and also about the energetics of an RNA-RNA interaction. The application of the partition function to this problems allows not only a compensation of errors resulting from an inherent imprecision of secondary structure prediction algorithms but also a more exact description of the interaction than provided by sampling methods.