Scanning RNA Virus Genomes for Conserved RNA Secondary Structures


Principal Investigator
Peter Stadler

Co-Investigator:
Ivo Hofacker


Co-workers:
Roman Stocsits, Christina Witwer, Caroline Thurner

Support:


Fonds zur Förderung der Wissenschatlichen Forschung, Proj.No. P13545-INF
Begin 1.6.1999

Abstract

Almost all RNA molecules form secondary structures. The presence of secondary structure in itself therefore does not indicate any functional significance. However, if selection acts to preserve a structural elements then it must have some function. The detection of conserved structural motifs in related RNA sequences is therefore a first crucial step towards understanding their functional aspects. In a previous project we have developed we have developed a set of computer methods to scan moderate size samples of RNA sequences for conserved secondary structures.

We propose to refine and extend these methods for determining conserved secondary structures and apply them to a wide variety of different RNA virus families. Our goals are threefold:

  • Conserved secondary structures most likely have an important function in the viral life cycle. Our approach therefore immediately produces a list of promising targets for experimental investigations such as deletion studies.
  • Functional secondary structures evolve much slower than the underlying sequences. Conserved secondary structures can therefore be used to extend the viral phylogeny to higher taxa.
  • A list of conserved, and therefore evolved, RNA structures is a valuable dataset in itself. Detailed analysis of such data can yield insights into general questions such as the evolution of robustness.
  • Our current implementation does not consider the possiblility of regions with conserved structural alternatives, pseudo-knots, and non-standard base pairs. All these topics are known to be important in the evolution of RNA genomes. We expect a substantial improvement in the capabilities of our detection algorithm by incorporating these features. Our method of determing secondary structure elements depends crucially on the quality of the sequence alignment. We expect a significant improvement in alignment quality by using a protein sequence alignment for the coding regions and combining it with a nucleic acid based alignment.

    We plan to apply our technique to essentially all families of RNA viruses for which a sufficient number of complete genomes have been sequenced. This includes a number important human pathogens, such as HIV, Influenza, Hepatitis B, C and G, and many others.