Next: , Previous: Suboptimal folding, Up: Folding Routines


2.5 Predicting hybridization structures of two molecules

The function of an RNA molecule often depends on its interaction with other RNAs. The following routines therefore allow to predict structures formed by two RNA molecules upon hybridization. One approach to co-folding two RNAs consists of concatenating the two sequences and keeping track of the concatenation point in all energy evaluations. Correspondingly, many of the cofold() and co_pf_fold() routines below take one sequence string as argument and use the the global variable cut_point to mark the concatenation point. Note that while the RNAcofold program uses the '&' character to mark the chain break in its input, you should not use an '&' when using the library routines (set cut_point instead). In a second approach to co-folding two RNAs, cofolding is seen as a stepwise process. In the first step the probability of an unpaired region is calculated and in a second step this probability of an unpaired region is multiplied with the probability of an interaction between the two RNAs. This approach is implemented for the interaction between a long target sequence and a short ligand RNA. The structure of the short RNA is not considered. The function pf_unstru(), which calculates the partition function over all unpaired regions, needs only the longer sequence as an input. Function pf_interact(), which calculates the partition function over all possible interactions between the two sequences, needs both sequence as separate strings as input.

— Variable: int cut_point

cut_point marks the position (starting from 1) of the first nucleotide of the second molecule within the concatenated sequence. The default value of -1 stands for single molecule folding. The cut_point variable is also used by PS_rna_plot() and PS_dot_plot() to mark the chain break in postscript plots.

— Function: float cofold (char *sequence, char *structure)

The analog to the fold() function. Computes the minimum free energy of two RNA molecules interacting. If cut_point==-1 results should be the same as with fold().

— Function: void free_co_arrays (void)

Frees arrays allocated by cofold().

— Function: void initialize_cofold (int length)

Allocates memory for mfe cofolding sequences not longer than length, and sets up pairing matrix and energy parameters. Explicitly calling initialize_cofold() is normally not necessary, as it will be called automagically before folding.

Prototypes for these functions are declared in cofold.h.

2.6 Partition Function Cofolding

As for folding one RNA molecule, this computes the partition function of all possible structures and the base pair probabilities. Uses the same global pf_scale variable to avoid overflows.

To simplify the implementation the partition function computation is done internally in a null model that does not include the duplex initiation energy, i.e. the entropic penalty for producing a dimer from two monomers). The resulting free energies and pair probabilities are initially relative to that null model. In a second step the free energies can be corrected to include the dimerization penalty, and the pair probabilities can be divided into the conditional pair probabilities given that a re dimer is formed or not formed.

— data type: cofoldF struct {double F0AB, FAB, FcAB, FA, FB;}
— Function: cofoldF co_pf_fold (char* sequence, char* structure)

The analog to the pf_fold() function, it computes the partition function of two interacting RNA molecules as well as base pair probabilities. It returns a struct including free energies computed for different models. F0AB is the free energy in the null model; FAB is corrected to include the initiation penalty, FcAB only includes real dimers. FA and FB are the free energies of the 2 molecules, resp. As with cofold() the cut_point global variable has to be set to mark the chain break between the molecules.

— Function: void free_co_pf_arrays (void)

frees partition function cofold arrays allocated in co_pf_fold().

— Function: void init_co_pf_fold (int length)

Obsolete function kept for backward compatibility. Allocates memory for partition function cofolding sequences not longer than length, and sets up pairing matrix and energy parameters.

— data type: plist struct {int i; int j; float p;}
— Function: extern struct plist *get_plist (struct plist *pl, int length, double cut_off)

Makes a list of base pairs out of the global *pr array, ignoring all base pairs with a probability less than cut_off. This is useful since the pr array gets overwritten by subsequent foldings.

Prototypes for these functions are declared in co_part_func.h.

2.7 Cofolding all Dimeres, Concentrations

After computing the partition functions of all possible dimeres one can compute the probabilities of base pairs, the concentrations out of start concentrations and sofar and soaway.

— Function: void compute_probabilities (double FAB, double FEA, double FEB, struct plist *prAB, struct plist *prA, struct plist *prB, int Alength)

Given the pair probabilities and free energies (in the null model) for a dimer AB and the two constituent monomers A and B, compute the conditional pair probabilities given that a dimer AB actually forms. Null model pair probabilities are given as a list as produced by get_plist()), the dimer probabilities prAB are modified in place.

Dimer formation is inherently concentration dependent. Given the free energies of the monomers A and B and dimers AB, AA, and BB one can compute the equilibrium concentrations, given input concentrations of A and B, see e.g.~Dimitrov & Zuker (2004)

— data type: ConcEnt struct {double A0; double B0;double ABc;double AAc; double BBc; double Ac; double Bc;}
— Function: extern struct ConcEnt *get_concentrations(double FEAB, double FEAA, double FEBB, double FEA, double FEB, double * startconc)

Takes an array startconc of input concentrations with alternating entries for the initial concentrations of molecules A and B (terminated by two zeroes), then computes the resulting equilibrium concentrations from the free energies for the dimers. Dimer free energies should be the dimer-only free energies, i.e. the FcAB entries from the cofoldF struct.

Prototypes for these functions are declared in co_part_func.h.

2.8 Partition Function Cofolding as stepwise process

In this approach to cofolding the interaction between two RNA molecules is seen as a stepwise process. In a first step, the target molecule has to adopt a structure in which a binding site is accessible. In a second step, the ligand molecule will hybridize with a region accessible to an interaction. Consequently the algorithm is designed as a two step process: The first step is the calculation of the probability that a region within the target is unpaired, or equivalently, the calculation of the free energy needed to expose a region. In the second step we compute the free energy of an interaction for every possible binding site.

— data type: pu_contrib struct {double **H; double **I; double **M; double **E; int length;}
— Function: pu_contrib *pf_unstru (char *sequence, char *structure, int w)

This function calculates the partition function over all unpaired regions in *sequence of a maximal length w. Its returns a pu_contrib struct containing four arrays of dimension [i = 1 to length of *sequence][j = 1 to w-1] containing all possible contributions to the probabilities of unpaired regions of maximum length w. Each array in pu_contrib contains one of the contributions to the total probability of being unpaired: The probability of being unpaired within an exterior loop is in array pu_contrib->E, the probability of being unpaired within a hairpin loop is in array pu_contrib->H, the probability of being unpaired within an interior loop is in array pu_contrib->I and probability of being unpaired within a multi-loop is in array pu_contrib->M. The total probability of being unpaired is the sum of the four arrays of pu_contrib. Use functions free_unpaired and free_pf_two to free everything allocated for function pf_unstru().

— Function: void free_unpaired (void)

Frees the arrays need for the calculation of the probability of being unpaired by function pf_unstru().

— Function: void free_pf_two (pu_contrib *p_con, FLT_OR_DBL **pin)

If non of the input parameters is set to NULL, frees everything returned by pf_unstru() and pf_interact(). If **pin is set to NULL, frees only the pu_contrib struct returned by function pf_unstru(). If *p_con is set to NULL, frees the two arrays of length of *sequence which are returned by function pf_interact().

— Function: double **pf_interact (const char *s1, const char *s2, pu_contrib *p_c, int w, int incr3, int incr5)

Calculates the probability of a local interaction between sequence *s1 and sequence *s2, considering the probability that the region of interaction is unpaired within *s1. The longer sequence has to be given as *s1. Function pf_unstru() has to be called for *s1, where the probabilities of being unpaired for *s1 are stored in *p_c. The shorter sequence has to be given as *s2. The parameter w gives the maximal length of a sub sequences of *s1 considered for the interaction. The command line parameters incr5 and incr3 allows inclusion of unpaired residues left (incr5) and right (incr3) of the region of interaction in *s1. If the incr options are used, function pf_unstru() has to be called with w=w+incr5+incr3. Function pf_interact() returns two arrays of the length of *sequence, which contain the probability of a local interaction (index[0][i]) and the minimum free energy of a local interaction (index[1][i]), where i is the position in sequence *s1. Use free_pf_two) to free the returned arrays. Notice that function pf_interact calls function free_unpaired!

Prototypes for these functions are declared in part_func_up.h.