Sampling the Boltzmann Ensemble

Draw a sample of size 10,000 from the Boltzmann ensemble
Calculate the desired property by using a perl script

  $ RNAsubopt -p 10000 < test.seq > tt
  $ perl -nle '$h++ if substr($_,5,3) eq "...";
      END {print $h/$.}' tt
      0.426057394260574

A far better way to calculate this property is to use RNAfold -p to get the ensemble free energy, which is related to the partition function via $F = -RT\ln(Q)$ , for the unconstrained () and the constrained case (), where the three bases are not allowed to form base pairs (use option -C), and evaluate $p_c = \exp((F_u - F_c)/RT)$ to get the desired probability.

So let's do the calculation using RNAfold.

  $RNAfold -p

  Input string (upper or lower case); @ to quit
  ....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8
  ccgcacagcgggcagugccca
  length = 21
  ccgcacagcgggcagugccca
  .........((((...)))).
   minimum free energy =  -5.70 kcal/mol
  ,,{{...}}||((...))}}.
   free energy of ensemble =  -6.25 kcal/mol
  ..((...))((((...)))). { -4.80 d=3.57}
   frequency of mfe structure in ensemble 0.408876; ensemble diversity 4.29

Now we have calculated the free ensemble energy of the ensemble over all structures (F_u), in the next step we have to calculate it for the structures using a constraint(F_c).

Following notation has to be used for defining the constraint:

$\vert$ : paired with another base
. : no constraint at all
x : base must not pair
: base i is paired with a base j<i
: base i is paired with a base j>i
matching brackets ( ): base i pairs base j

So our constraint should look like this:

  .....xxx............

Next call the application with following command and provide the sequence and constraint we just created.

  $ RNAfold -p -C

The output should look like this

  length = 21
  ccgcacagcgggcagugccca
  .........((((...)))).
   minimum free energy =  -5.70 kcal/mol
  .........((((...)))).
   free energy of ensemble =  -5.73 kcal/mol
  .........((((...)))). { -5.70 d=0.12}
   frequency of mfe structure in ensemble 0.95751; ensemble diversity 0.24

Afterwards evaluate the desired probability according to the formula given before e.g. with a simple perl script.

  perl -e 'print exp(-(6.25-5.73)/(0.00198*310.15))."\n"'

You can see that there is a slight difference between the RNAsubopt run with 10,000 samples and the RNAfold run including all structures.

Sven Findeiss 2013-11-22