Next: Local Fold, Previous: Suboptimal folding, Up: Folding Routines
The function of an RNA molecule often depends on its interaction with
other RNAs. The following routines therefore allow to predict structures
formed by two RNA molecules upon hybridization.
One approach to co-folding two RNAs consists of concatenating the two
sequences and keeping track of the concatenation point in all energy
evaluations. Correspondingly, many of the cofold() and
co_pf_fold() routines below take one sequence string as argument
and use the the global variable cut_point to mark the concatenation
point. Note that while the RNAcofold program uses the '&' character
to mark the chain break in its input, you should not use an '&' when using
the library routines (set cut_point instead).
In a second approach to co-folding two RNAs, cofolding is seen as a
stepwise process. In the first step the probability of an unpaired region
is calculated and in a second step this probability of an unpaired region
is multiplied with the probability of an interaction between the two RNAs.
This approach is implemented for the interaction between a long
target sequence and a short ligand RNA. The structure of the short RNA
is not considered. The function pf_unstru(), which calculates the
partition function over all unpaired regions, needs only the longer
sequence as an input. Function pf_interact(), which calculates the
partition function over all possible interactions between the two
sequences, needs both sequence as separate strings as input.
cut_pointmarks the position (starting from 1) of the first nucleotide of the second molecule within the concatenated sequence. The default value of -1 stands for single molecule folding. Thecut_pointvariable is also used byPS_rna_plot()andPS_dot_plot()to mark the chain break in postscript plots.
The analog to the
fold()function. Computes the minimum free energy of two RNA molecules interacting. Ifcut_point==-1results should be the same as withfold().
Allocates memory for mfe cofolding sequences not longer than length, and sets up pairing matrix and energy parameters. Explicitly calling
initialize_cofold()is normally not necessary, as it will be called automagically before folding.
Prototypes for these functions are declared in cofold.h.
As for folding one RNA molecule, this computes the partition function of all possible structures and the base pair probabilities. Uses the same global pf_scale variable to avoid overflows.
To simplify the implementation the partition function computation is done internally in a null model that does not include the duplex initiation energy, i.e. the entropic penalty for producing a dimer from two monomers). The resulting free energies and pair probabilities are initially relative to that null model. In a second step the free energies can be corrected to include the dimerization penalty, and the pair probabilities can be divided into the conditional pair probabilities given that a re dimer is formed or not formed.
The analog to the
pf_fold()function, it computes the partition function of two interacting RNA molecules as well as base pair probabilities. It returns a struct including free energies computed for different models. F0AB is the free energy in the null model; FAB is corrected to include the initiation penalty, FcAB only includes real dimers. FA and FB are the free energies of the 2 molecules, resp. As withcofold()the cut_point global variable has to be set to mark the chain break between the molecules.
frees partition function cofold arrays allocated in
co_pf_fold().
Obsolete function kept for backward compatibility. Allocates memory for partition function cofolding sequences not longer than length, and sets up pairing matrix and energy parameters.
Makes a list of base pairs out of the global *pr array, ignoring all base pairs with a probability less than cut_off. This is useful since the pr array gets overwritten by subsequent foldings.
Prototypes for these functions are declared in co_part_func.h.
After computing the partition functions of all possible dimeres one can compute the probabilities of base pairs, the concentrations out of start concentrations and sofar and soaway.
Given the pair probabilities and free energies (in the null model) for a dimer AB and the two constituent monomers A and B, compute the conditional pair probabilities given that a dimer AB actually forms. Null model pair probabilities are given as a list as produced by
get_plist()), the dimer probabilities prAB are modified in place.
Dimer formation is inherently concentration dependent. Given the free energies of the monomers A and B and dimers AB, AA, and BB one can compute the equilibrium concentrations, given input concentrations of A and B, see e.g.~Dimitrov & Zuker (2004)
Takes an array startconc of input concentrations with alternating entries for the initial concentrations of molecules A and B (terminated by two zeroes), then computes the resulting equilibrium concentrations from the free energies for the dimers. Dimer free energies should be the dimer-only free energies, i.e. the FcAB entries from the
cofoldFstruct.
Prototypes for these functions are declared in co_part_func.h.
In this approach to cofolding the interaction between two RNA molecules is seen as a stepwise process. In a first step, the target molecule has to adopt a structure in which a binding site is accessible. In a second step, the ligand molecule will hybridize with a region accessible to an interaction. Consequently the algorithm is designed as a two step process: The first step is the calculation of the probability that a region within the target is unpaired, or equivalently, the calculation of the free energy needed to expose a region. In the second step we compute the free energy of an interaction for every possible binding site.
pu_contrib *pf_unstru (char *sequence, char *structure, int w)This function calculates the partition function over all unpaired regions in *sequence of a maximal length w. Its returns a
pu_contribstruct containing four arrays of dimension [i = 1 to length of *sequence][j = 1 to w-1] containing all possible contributions to the probabilities of unpaired regions of maximum length w. Each array inpu_contribcontains one of the contributions to the total probability of being unpaired: The probability of being unpaired within an exterior loop is in arraypu_contrib->E, the probability of being unpaired within a hairpin loop is in arraypu_contrib->H, the probability of being unpaired within an interior loop is in arraypu_contrib->Iand probability of being unpaired within a multi-loop is in arraypu_contrib->M. The total probability of being unpaired is the sum of the four arrays ofpu_contrib. Use functionsfree_unpairedandfree_pf_twoto free everything allocated for functionpf_unstru().
Frees the arrays need for the calculation of the probability of being unpaired by function
pf_unstru().
pu_contrib *p_con, FLT_OR_DBL **pin)If non of the input parameters is set to NULL, frees everything returned by
pf_unstru()andpf_interact(). If **pin is set to NULL, frees only thepu_contribstruct returned by functionpf_unstru(). If *p_con is set to NULL, frees the two arrays of length of *sequence which are returned by functionpf_interact().
pu_contrib *p_c, int w, int incr3, int incr5)Calculates the probability of a local interaction between sequence *s1 and sequence *s2, considering the probability that the region of interaction is unpaired within *s1. The longer sequence has to be given as *s1. Function
pf_unstru()has to be called for *s1, where the probabilities of being unpaired for *s1 are stored in *p_c. The shorter sequence has to be given as *s2. The parameter w gives the maximal length of a sub sequences of *s1 considered for the interaction. The command line parameters incr5 and incr3 allows inclusion of unpaired residues left (incr5) and right (incr3) of the region of interaction in *s1. If the incr options are used, functionpf_unstru()has to be called with w=w+incr5+incr3. Functionpf_interact()returns two arrays of the length of *sequence, which contain the probability of a local interaction (index[0][i]) and the minimum free energy of a local interaction (index[1][i]), where i is the position in sequence *s1. Usefree_pf_two)to free the returned arrays. Notice that functionpf_interactcalls functionfree_unpaired!
Prototypes for these functions are declared in part_func_up.h.