Scanning RNA Virus Genomes for Conserved RNA Secondary Structures
Principal Investigator
Peter Stadler
Co-Investigator:
Ivo Hofacker
Co-workers:
Roman Stocsits,
Christina Witwer,
Caroline Thurner
Support:
Fonds zur Förderung der Wissenschatlichen Forschung,
Proj.No. P13545-INF
Begin 1.6.1999
Abstract
Almost all RNA molecules form secondary structures. The presence of
secondary structure in itself therefore does not indicate any
functional significance. However, if selection acts to preserve a
structural elements then it must have some function. The detection of
conserved structural motifs in related RNA sequences is therefore a
first crucial step towards understanding their functional aspects.
In a previous project we have developed
we have developed a set of computer methods to scan moderate
size samples of RNA sequences for conserved secondary structures.
We propose to refine and extend these methods for determining conserved
secondary structures and apply them to a wide variety of different RNA
virus families. Our goals are threefold:
Conserved secondary structures most likely have an important
function in the viral life cycle. Our approach therefore immediately
produces a list of promising targets for experimental
investigations such as deletion studies.
Functional secondary structures evolve much slower than the
underlying sequences. Conserved secondary structures can therefore
be used to extend the viral phylogeny to higher taxa.
A list of conserved, and therefore evolved, RNA structures is
a valuable dataset in itself. Detailed analysis of such data can
yield insights into general questions such as the evolution of
robustness.
Our current implementation does not consider the possiblility of regions
with conserved structural alternatives, pseudo-knots, and non-standard base
pairs. All these topics are known to be important in the evolution of RNA
genomes. We expect a substantial improvement in the capabilities of our
detection algorithm by incorporating these features.
Our method of determing secondary structure elements depends crucially on
the quality of the sequence alignment. We expect a significant improvement
in alignment quality by using a protein sequence alignment for the coding
regions and combining it with a nucleic acid based alignment.
We plan to apply our technique to essentially all families of RNA viruses
for which a sufficient number of complete genomes have been sequenced. This
includes a number important human pathogens, such as HIV, Influenza,
Hepatitis B, C and G, and many others.
