Compensatory mutations are a strong indication of structural
conservation, while consistent mutations provide a weaker signal. The
covariance term used by RNAalifold therefore assigns a bonus
of 1 kcal/mol to each consistent and 2 kcal/mol for each compensatory
mutation. Sequences that cannot form a standard base pair incur a
penalty of kcal/mol. Thus, for every possible consensus pair
between two columns
and
of the alignment a covariance score
is computed by counting the fraction of sequence pairs
exhibiting consistent and compensatory mutations, as well as the
fraction of sequences that are inconsistent with the pair. The weight
of the covariance term relative to the normal energy function, as well
as the penalty for inconsistent mutations can be changed via command
line parameters.
Apart from the covariance term, the folding algorithm in
RNAalifold is essentially the same as for single sequence
folding. In particular, folding an alignment containing just one
sequence will give the same result as single sequence folding using
RNAfold. For sequences of length
the required CPU
time scales as
while memory
requirements grow as the square of the sequence length. Thus
RNAalifold is in general faster than folding each sequence
individually. The main advantage, however, is that the accuracy of
consensus structure predictions is generally much higher than for
single sequence folding, where typically only between 40% and 70% of
the base pairs are predicted correctly.
Apart from prediction of MFE structures RNAalifold
also implements an algorithm to compute the partition function over
all possible (consensus) structures and the thermodynamic equilibrium
probability for each possible pair. These base pairing probabilities
are useful to see structural alternatives, and to distinguish well
defined regions, where the predicted structure is most likely correct,
from ambiguous regions.
As a first example we'll produce a consensus structure prediction for the following four tRNA sequences.
$ cat four.seq
>M10740 Yeast-PHE GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUUUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCA >K00349 Drosophila-PHE GCCGAAAUAGCUCAGUUGGGAGAGCGUUAGACUGAAGAUCUAAAGGUCCCCGGUUCAAUCCCGGGUUUCGGCA >K00283 Halobacterium volcanii Lys-tRNA-1 GGGCCGGUAGCUCAUUUAGGCAGAGCGUCUGACUCUUAAUCAGACGGUCGCGUGUUCGAAUCGCGUCCGGCCCA >AF346993 CAGAGUGUAGCUUAACACAAAGCACCCAACUUACACUUAGGAGAUUUCAACUUAACUUGACCGCUCUGA
RNAalifold uses aligned sequences as
input. Thus, our first step will be to align the sequences. We use
clustalw in this example, since it's one of the most widely
used alignment programs and has been shown to work well on structural
RNAs. Other alignment programs can be used (including programs that
attempt to do structural alignment of RNAs), but the resulting
multiple sequence alignment must be in Clustal format. Get
clustalw2 and install it as you have done it with the other
packages: http://www.clustal.org/clustal2.