Consensus Structure from related Sequences

  1. Prepare a sequence file (use file four.seq)
  2. Align the sequences
  3. Compute the consensus structure from the alignment
  4. Inspect the output files alifold.out,alirna.ps, alidot.ps
  5. For comparison fold the sequences individually using RNAfold
	  $ clustalw2 four.seq > four.out
	  $ RNAalifold -p four.out
	  $ RNAfold -p < four.seq

RNAalifold output:
	__GCCGAUGUAGCUCAGUUGGG_AGAGCGCCAGACUGAAAAUCAGAAGGUCCCGUGUUCAAUCCACGGAUCCGGCA__
	..(((((((..((((.........)))).(((((.......))))).....(((((.......))))))))))))...
	 minimum free energy = -15.07 kcal/mol (-13.50 +  -1.57)
	..(((((((..((((.........)))).(((((.......))))).....(((((.......))))))))))))...
	 free energy of ensemble = -15.63 kcal/mol
	 frequency of mfe structure in ensemble 0.405193
RNAfold output:
	>M10740 Yeast-PHE
	GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUUUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCA
	((((((((........((((.((((((..((((...........))))..))))))..))))..)))))))). (-21.00)
	(((((((,...,,.{,((((.((((((..((((...........))))..))))))..))))),)))))))). [-22.55]
	(((((((.........((((.((((((..((((...........))))..))))))..))))...))))))). {-19.50 d=8.36}
	 frequency of mfe structure in ensemble 0.0805133; ensemble diversity 13.62 
	>K00349 Drosophila-PHE
	[...]

The output contains a consensus sequence and the consensus structure in dot-bracket notation. The consensus structure has an energy of $-15.07$ kcal/mol, which in turn consists of the average free energy of the structure $-13.50$ kcal/mol and the covariance term $-1.57$ kcal/mol. The strongly negative covariance term shows that there must be a fair number of consistent and compensatory mutations, but in contrast to the average free energy it's not meaningful in the biophysical sense.

Compare the predicted consensus structure with the structures predicted for the individual sequences using RNAfold. How often is the correct ``clover-leaf'' shape predicted?

A structure annotated alignment or color annotated structure drawing can be produced using the coloraln.pl and colorrna.pl commands. Both read an RNA secondary structure plot and a dot plot created with RNAalifold -p and produce a secondary structure plot with color annotated seqence notation. Alternatively these can be generated by RNAalifold by using the -aln and -color options.

  $ RNAalifold --color --aln four.aln
  $ gv aln.ps &
  $ gv alirna.ps &
Sven Findeiss 2013-11-22