Theoretical Biochemistry Group

Institute for Theoretical Chemistry

Font size: Increase font size Decrease font size Switch style sheet from default to highcontrast and back

ViennaRNA Package 2 - Performance

To investigate the impact of the new energy parameters used in ViennaRNA Package 2 we did a quite extensive performance analysis.

Overview

We have measured the Performance of ViennaRNA Packge 2 by comparing Minimum Free Energy (MFE) predictions of the RNAfold program to

  1. RNAfold 1.8.5
  2. UNAfold 3.8
  3. RNAstructure 5.2
both in terms of prediction accuracy as well as in computation speed, i.e. runtime.

Please note, that for this performance analysis we only compared thermodynamics-based approaches that rely on the Nearest neighbor energy model and solve the problem by Zuker's dynamic programming algorithm!
Benchmarks for other approaches in RNA structure prediction can be found elsewhere in literature or in the web.

Runtime analysis

Computational speed was measured using a dataset of randomly generated sequences with fixed lengths of

  • 100 nt (100 samples)
  • 500 nt (100 samples)
  • 1000 nt (100 samples)
  • 2500 nt (20 samples)
  • 5000 nt (16 samples)
  • 10000 nt (16 samples)
Measurements were taken on an Intel Core2 6600 CPU running at 2.4GHz.

Computation times (arithmetic mean)

Unfortunaltey, RNAstructure 5.2 was not able to predict an MFE structure for the 10000nt samples in a relatively small time frame and thus was omitted in the particular test.

As visible in the computation times graph above, we observe virtually no difference in the runtimes of RNAfold 1.8.5 and RNAfold 2.0. This is also true for the memory consumption (data not shown).
Although all programs in the test have the same asymptotic runtime complexity, the computation time analysis of RNAfold compares quite favorably to that of the competing implementations.

Prediction accuracy

We calculated the following four performance measures to assess the prediction accuracy:

  1. Sensitivity
  2. Positive Predictive Value (PPV)
  3. Matthews correlation coefficient (MCC)
  4. F-measure

The test set was based on a set comprising 1817 non-multimer sequence/structure pairs taken from the RNAstrand database (all without pseudoknots in the reference structure). Both versions of RNAfold were run with -d2 option whereas UNAfold and RNAstructure were run with default options.

In the table below, the resulting arithmetic mean of each performance measure is shown. Furthermore, we did a bootstrapping analysis with 1000 iterations to estimate the 95% confidence intervals for the predicted measures.

Program Sensitivity PPV MCC F-measure
RNAfold 2.0 0.739 [0.728,0.748] 0.792 [0.781,0.802] 0.763 [0.753,0.773] 0.761 [0.751,0.771]
RNAfold 1.8.5 0.711 [0.701,0.722] 0.773 [0.762,0.784] 0.740 [0.729,0.750] 0.737 [0.727,0.748]
UNAfold 0.692 [0.682,0.703] 0.766 [0.756,0.778] 0.727 [0.717,0.737] 0.724 [0.714,0.734]
RNAstructure 0.715 [0.705,0.725] 0.781 [0.769,0.791] 0.745 [0.735,0.755] 0.742 [0.732,0.753]

The cumulative distribution of the MCC shows that RNAfold 2.0 outperformes the other programs on the test dataset: more of its predictions fall into the region of higher performance values.

Matthews correlation coefficients

However, a detailed look at the performance among different RNA classes in our test set reveals that it differs widely. No single implementation tested provides consistent superiority of results.

Matthews correlation coefficients per RNA class