RNAlib-2.5.1
Tree Representation of Secondary Structures

Secondary structures can be readily represented as trees, where internal nodes represent base pairs, and leaves represent unpaired nucleotides. The dot-bracket structure string already is a tree represented by a string of parenthesis (base pairs) and dots for the leaf nodes (unpaired nucleotides). More...

Detailed Description

Secondary structures can be readily represented as trees, where internal nodes represent base pairs, and leaves represent unpaired nucleotides. The dot-bracket structure string already is a tree represented by a string of parenthesis (base pairs) and dots for the leaf nodes (unpaired nucleotides).

Alternatively, one may find representations with two types of node labels, P for paired and U for unpaired; a dot is then replaced by (U), and each closed bracket is assigned an additional identifier P. We call this the expanded notation. In [8] a condensed representation of the secondary structure is proposed, the so-called homeomorphically irreducible tree (HIT) representation. Here a stack is represented as a single pair of matching brackets labeled P and weighted by the number of base pairs. Correspondingly, a contiguous strain of unpaired bases is shown as one pair of matching brackets labeled U and weighted by its length. Generally any string consisting of matching brackets and identifiers is equivalent to a plane tree with as many different types of nodes as there are identifiers.

Bruce Shapiro proposed a coarse grained representation [22], which, does not retain the full information of the secondary structure. He represents the different structure elements by single matching brackets and labels them as

We extend his alphabet by an extra letter for external elements E. Again these identifiers may be followed by a weight corresponding to the number of unpaired bases or base pairs in the structure element. All tree representations (except for the dot-bracket form) can be encapsulated into a virtual root (labeled R).

The following example illustrates the different linear tree representations used by the package:

Consider the secondary structure represented by the dot-bracket string (full tree) .((..(((...)))..((..)))). which is the most convenient condensed notation used by our programs and library functions.

Then, the following tree representations are equivalent:

The Expanded tree is rather clumsy and mostly included for the sake of completeness. The different versions of Coarse Grained Tree Representations are variatios of Shapiro's linear tree notation.

For the output of aligned structures from string editing, different representations are needed, where we put the label on both sides. The above examples for tree representations would then look like:

*  a) (UU)(P(P(P(P(UU)(UU)(P(P(P(UU)(UU)(UU)P)P)P)(UU)(UU)(P(P(UU)(U...
*  b) (UU)(P2(P2(U2U2)(P2(U3U3)P3)(U2U2)(P2(U2U2)P2)P2)(UU)P2)(UU)
*  c) (B(M(HH)(HH)M)B)
*     (S(B(S(M(S(HH)S)(S(HH)S)M)S)B)S)
*     (E(S(B(S(M(S(HH)S)(S(HH)S)M)S)B)S)E)
*  d) (R(E2(S2(B1(S2(M4(S3(H3)S3)((H2)S2)M4)S2)B1)S2)E2)R)
*  

Aligned structures additionally contain the gap character _.

+ Collaboration diagram for Tree Representation of Secondary Structures:

Macros

#define VRNA_STRUCTURE_TREE_HIT   1U
 Homeomorphically Irreducible Tree (HIT) representation of a secondary structure. More...
 
#define VRNA_STRUCTURE_TREE_SHAPIRO_SHORT   2U
 (short) Coarse Grained representation of a secondary structure More...
 
#define VRNA_STRUCTURE_TREE_SHAPIRO   3U
 (full) Coarse Grained representation of a secondary structure More...
 
#define VRNA_STRUCTURE_TREE_SHAPIRO_EXT   4U
 (extended) Coarse Grained representation of a secondary structure More...
 
#define VRNA_STRUCTURE_TREE_SHAPIRO_WEIGHT   5U
 (weighted) Coarse Grained representation of a secondary structure More...
 
#define VRNA_STRUCTURE_TREE_EXPANDED   6U
 Expanded Tree representation of a secondary structure. More...
 

Functions

char * vrna_db_to_tree_string (const char *structure, unsigned int type)
 Convert a Dot-Bracket structure string into tree string representation. More...
 
char * vrna_tree_string_unweight (const char *structure)
 Remove weights from a linear string tree representation of a secondary structure. More...
 
char * vrna_tree_string_to_db (const char *tree)
 Convert a linear tree string representation of a secondary structure back to Dot-Bracket notation. More...
 

Macro Definition Documentation

◆ VRNA_STRUCTURE_TREE_HIT

#define VRNA_STRUCTURE_TREE_HIT   1U

#include <ViennaRNA/utils/structures.h>

Homeomorphically Irreducible Tree (HIT) representation of a secondary structure.

See also
vrna_db_to_tree_string()

◆ VRNA_STRUCTURE_TREE_SHAPIRO_SHORT

#define VRNA_STRUCTURE_TREE_SHAPIRO_SHORT   2U

#include <ViennaRNA/utils/structures.h>

(short) Coarse Grained representation of a secondary structure

See also
vrna_db_to_tree_string()

◆ VRNA_STRUCTURE_TREE_SHAPIRO

#define VRNA_STRUCTURE_TREE_SHAPIRO   3U

#include <ViennaRNA/utils/structures.h>

(full) Coarse Grained representation of a secondary structure

See also
vrna_db_to_tree_string()

◆ VRNA_STRUCTURE_TREE_SHAPIRO_EXT

#define VRNA_STRUCTURE_TREE_SHAPIRO_EXT   4U

#include <ViennaRNA/utils/structures.h>

(extended) Coarse Grained representation of a secondary structure

See also
vrna_db_to_tree_string()

◆ VRNA_STRUCTURE_TREE_SHAPIRO_WEIGHT

#define VRNA_STRUCTURE_TREE_SHAPIRO_WEIGHT   5U

#include <ViennaRNA/utils/structures.h>

(weighted) Coarse Grained representation of a secondary structure

See also
vrna_db_to_tree_string()

◆ VRNA_STRUCTURE_TREE_EXPANDED

#define VRNA_STRUCTURE_TREE_EXPANDED   6U

#include <ViennaRNA/utils/structures.h>

Expanded Tree representation of a secondary structure.

See also
vrna_db_to_tree_string()

Function Documentation

◆ vrna_db_to_tree_string()

char * vrna_db_to_tree_string ( const char *  structure,
unsigned int  type 
)

#include <ViennaRNA/utils/structures.h>

Convert a Dot-Bracket structure string into tree string representation.

This function allows one to convert a secondary structure in dot-bracket notation into one of the various tree representations for secondary structures. The resulting tree is then represented as a string of parenthesis and node symbols, similar to to the Newick format.

Currently we support conversion into the following formats, denoted by the value of parameter type:

See also
Tree Representations of Secondary Structures
Parameters
structureThe null-terminated dot-bracket structure string
typeA switch to determine the type of tree string representation
Returns
A tree representation of the input structure

◆ vrna_tree_string_unweight()

char * vrna_tree_string_unweight ( const char *  structure)

#include <ViennaRNA/utils/structures.h>

Remove weights from a linear string tree representation of a secondary structure.

This function strips the weights of a linear string tree representation such as HIT, or Coarse Grained Tree sensu Shapiro [22]

See also
vrna_db_to_tree_string()
Parameters
structureA linear string tree representation of a secondary structure with weights
Returns
A linear string tree representation of a secondary structure without weights

◆ vrna_tree_string_to_db()

char * vrna_tree_string_to_db ( const char *  tree)

#include <ViennaRNA/utils/structures.h>

Convert a linear tree string representation of a secondary structure back to Dot-Bracket notation.

Warning
This function only accepts Expanded and HIT tree representations!
See also
vrna_db_to_tree_string(), VRNA_STRUCTURE_TREE_EXPANDED, VRNA_STRUCTURE_TREE_HIT, Tree Representations of Secondary Structures
Parameters
treeA linear tree string representation of a secondary structure
Returns
A dot-bracket notation of the secondary structure provided in tree