98-06-008

Automatic Detection of of Conserved Base Pairing Patterns in RNA Virus Genomes

Ivo L. Hofacker and Peter F. Stadler

Almost all RNA molecules - and consequently also almost all subsequences of a large RNA molecule - form secondary structures. The presence of secondary structure in itself therefore does not indicate any functional significance. In fact, we may not expect a conserved secondary structure for all parts of a viral genome or a mRNA, even if there is a significant level of sequence conservation.
We present a novel method for detecting conserved RNA secondary structures in a family of related RNA sequences. Our method is based on a combination of predicting the matrices of all equilibrium base pairing probabilities and comparative sequence analysis. In contrast to purely phylogenetic methods, our algorithm can be used for small data sets of about ten sequences, efficiently exploiting the information contained in the sequence variability.
The procedure was tested on artificial data sets showing that it indeed detects very few false positives. Applications to some RNA viruses (HIV1, and Hanta virus) use the complete genomic RNAs. In all cases we have been able to identify most of the known secondary structure features. In addition, we predict a substantial number of conserved structural elements which have not been described so far.

Keywords: RNA Secondary Structure Prediction, Structure Alignment, Conserved Substructures, Compensatory Mutations, RNA Virus Genomes

Return to 1998 working papers list.