RNAwolf

Introduction

RNAwolf (and additional tools) is designed to predict extended RNA secondary structure and the main tool described in Höner zu Siederdissen et al. (2011) . An extended structure is allowed to contain non-canonical base-pairs and structures composed of 2-diagrams. The allowed base-pairs can contain all 4x4 nucleotides and the nucleotide bonds are explicitly annotated with the paired edges and isomerism information. 2-diagrams are required to describe structures were a nucleotide may be involved in more than one base pair. In principle, each nucleotide may be involved in up to three interactions, but as those are rarely annotated currently, we restrict ourselves to two. If required, the extension to three nucleotides is straightforward.

As with other algorithms (e.g. MC-Fold-DP), the grammar behind RNAwolf is unambiguous. This allows for exhaustive enumeration of all structures within a certain band above the optimal structure (or simply all co-optimal structures).

Using RNAwolf

Basic usage

The basic principle is the same as with RNAfold:

echo CCCAAAGGG | ./RNAwolf

This returns an extended secondary structure. The structure or structures can be converted into postscript figures using a supplied perl script. Further options will be provided soon.

Constrained folding

This will fold the sequence, given certain structural constraints:

echo CCCAAAGGG | ./RNAwolf --constraint "(...x...)"

Normal brackets () force a pairing, x disallows this nucleotide from pairing, while . means that the nucleotide may fold without restrictions.

Parameter optimization

Running RNAwolf in parameter optimization mode

This is basically a two-step procedure:

  1. create training data from RNAstrand (Andronescu et al. 2008) or FR3D (Sarver et al. 2008) using RNAwolfTrainingData. FR3D has an advantage compared to the "raw PDB" as the task of parsing PDB has already been done.

  2. follow the same steps as in scripts/run.sh (or in scripts/sge)

Sun Grid Engine compatibility

The package comes with a sub-directory for scripts. In scripts/sge are three scripts that work in conjunction with the Sun Grid Engine. We currently support parallelizing the folding process (RNAwolf) but not parameter optimization. This is typically acceptable, as folding takes much more time than optimizing.

The scripts need to be fixed up to work on your systems, grap the xstat and xsub scripts as well, if needed.

References

Andronescu, Mirela, Vera Bereg, Holger Hoos, and Anne Condon. 2008. RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database. BMC Bioinformatics 9: 340. doi:10.1186/1471-2105-9-340. http://www.biomedcentral.com/1471-2105/9/340.

Höner zu Siederdissen, Christian, Stephan H. Bernhart, Peter F. Stadler, and Ivo L. Hofacker. 2011. A Folding Algorithm for Extended RNA Secondary Structures. Bioinformatics 27: 129–36. doi:10.1093/bioinformatics/btr220.

Sarver, Michael, Craig L. Zirbel, Jesse Stombaugh, Ali Mokdad, and Neocles B. Leontis. 2008. FR3D: Finding Local and Composite Recurrent Structural Motifs in RNA 3D Structures. Journal of Mathematical Biology: 215–52.