98-06-008
Automatic Detection of of Conserved Base Pairing Patterns
in RNA Virus Genomes
Ivo L. Hofacker and Peter F. Stadler
Almost all RNA molecules - and consequently also almost all
subsequences of a large RNA molecule - form secondary structures. The
presence of secondary structure in itself therefore does not indicate
any functional significance. In fact, we may not expect a conserved
secondary structure for all parts of a viral genome or a mRNA, even if
there is a significant level of sequence conservation.
We present a novel method for detecting conserved RNA secondary
structures in a family of related RNA sequences. Our method is based
on a combination of predicting the matrices of all equilibrium base
pairing probabilities and comparative sequence analysis. In contrast
to purely phylogenetic methods, our algorithm can be used for small
data sets of about ten sequences, efficiently exploiting the
information contained in the sequence variability.
The procedure was tested on artificial data sets showing that it
indeed detects very few false positives. Applications to some RNA
viruses (HIV1, and Hanta virus) use the complete genomic RNAs. In all
cases we have been able to identify most of the known secondary
structure features. In addition, we predict a substantial number of
conserved structural elements which have not been described so far.
Keywords:
RNA Secondary Structure Prediction, Structure
Alignment, Conserved Substructures, Compensatory Mutations,
RNA Virus Genomes
Return to 1998 working papers list.