RNAlib-2.0.1
H/part_func.h File Reference

Partition function of single RNA sequences. More...

Include dependency graph for part_func.h:

Go to the source code of this file.

Functions

float pf_fold (const char *sequence, char *structure)
 Compute the partition function $Q$ of an RNA sequence.
float pf_circ_fold (const char *sequence, char *structure)
 Compute the partition function of a circular RNA sequence.
char * pbacktrack (char *sequence)
 Sample a secondary structure from the Boltzmann ensemble according its probability
.
char * pbacktrack_circ (char *sequence)
 Sample a secondary structure of a circular RNA from the Boltzmann ensemble according its probability.
void free_pf_arrays (void)
 Free arrays from pf_fold()
void update_pf_params (int length)
 Recalculate energy parameters.
FLT_OR_DBL * export_bppm (void)
 Get a pointer to the base pair probability array.
void assign_plist_from_pr (plist **pl, FLT_OR_DBL *probs, int length, double cutoff)
 Create a plist from a probability matrix.
int get_pf_arrays (short **S_p, short **S1_p, char **ptype_p, FLT_OR_DBL **qb_p, FLT_OR_DBL **qm_p, FLT_OR_DBL **q1k_p, FLT_OR_DBL **qln_p)
 Get the pointers to (almost) all relavant computation arrays used in partition function computation.
char * get_centroid_struct_pl (int length, double *dist, plist *pl)
 Get the centroid structure of the ensemble.
char * get_centroid_struct_pr (int length, double *dist, FLT_OR_DBL *pr)
 Get the centroid structure of the ensemble.
double mean_bp_distance (int length)
 Get the mean base pair distance of the last partition function computation.
double mean_bp_distance_pr (int length, FLT_OR_DBL *pr)
 Get the mean base pair distance in the thermodynamic ensemble.
void bppm_to_structure (char *structure, FLT_OR_DBL *pr, unsigned int length)
 Create a dot-bracket like structure string from base pair probability matrix.
char bppm_symbol (const float *x)
 Get a pseudo dot bracket notation for a given probability information.
void init_pf_fold (int length)
 Allocate space for pf_fold()
char * centroid (int length, double *dist)
double mean_bp_dist (int length)
 get the mean pair distance of ensemble
double expLoopEnergy (int u1, int u2, int type, int type2, short si1, short sj1, short sp1, short sq1)
double expHairpinEnergy (int u, int type, short si1, short sj1, const char *string)

Variables

int st_back
 a flag indicating that auxilary arrays are needed throughout the computations which are necessary for stochastic backtracking

Detailed Description

Partition function of single RNA sequences.

This file includes (almost) all function declarations within the RNAlib that are related to Partion function folding...


Function Documentation

float pf_fold ( const char *  sequence,
char *  structure 
)

Compute the partition function $Q$ of an RNA sequence.

If structure is not a NULL pointer on input, it contains on return a string consisting of the letters " . , | { } ( ) " denoting bases that are essentially unpaired, weakly paired, strongly paired without preference, weakly upstream (downstream) paired, or strongly up- (down-)stream paired bases, respectively. If fold_constrained is not 0, the structure string is interpreted on input as a list of constraints for the folding. The character "x" marks bases that must be unpaired, matching brackets " ( ) " denote base pairs, all other characters are ignored. Any pairs conflicting with the constraint will be forbidden. This is usually sufficient to ensure the constraints are honored. If do_backtrack has been set to 0 base pairing probabilities will not be computed (saving CPU time), otherwise pr will contain the probability that bases i and j pair.

Note:
The global array pr is deprecated and the user who wants the computed base pair probabilities for further computations is advised to use the function export_bppm()
See also:
pf_circ_fold(), bppm_to_structure(), export_bppm()
Parameters:
sequenceThe RNA sequence to be computed
structureA pointer to a char array where a base pair probability information might be stored in a pseudo-dot-bracket notation (might be NULL, too)
Returns:
The Gibbs free energy of the ensemble ( $G = -RT \cdot \log(Q) $) in kcal/mol
float pf_circ_fold ( const char *  sequence,
char *  structure 
)

Compute the partition function of a circular RNA sequence.

See also:
pf_fold()
Parameters:
sequenceThe RNA sequence to be computed
structureA pointer to a char array where a base pair probability information might be stored in a pseudo-dot-bracket notation (might be NULL, too)
Returns:
The Gibbs free energy of the ensemble ( $G = -RT \cdot \log(Q) $) in kcal/mol
char* pbacktrack ( char *  sequence)

Sample a secondary structure from the Boltzmann ensemble according its probability
.

Parameters:
sequenceThe RNA sequence
Returns:
A sampled secondary structure in dot-bracket notation
char* pbacktrack_circ ( char *  sequence)

Sample a secondary structure of a circular RNA from the Boltzmann ensemble according its probability.

This function does the same as pbacktrack() but assumes the RNA molecule to be circular

Parameters:
sequenceThe RNA sequence
Returns:
A sampled secondary structure in dot-bracket notation
void update_pf_params ( int  length)

Recalculate energy parameters.

Call this function to recalculate the pair matrix and energy parameters after a change in folding parameters like temperature

FLT_OR_DBL* export_bppm ( void  )

Get a pointer to the base pair probability array.

Accessing the base pair probabilities for a pair (i,j) is achieved by

FLT_OR_DBL *pr = export_bppm(); pr_ij = pr[iindx[i]-j]; 
See also:
get_iindx()
Returns:
A pointer to the base pair probability array
void assign_plist_from_pr ( plist **  pl,
FLT_OR_DBL *  probs,
int  length,
double  cutoff 
)

Create a plist from a probability matrix.

The probability matrix given is parsed and all pair probabilities above the given threshold are used to create an entry in the plist

The end of the plist is marked by sequence positions i as well as j equal to 0. This condition should be used to stop looping over its entries

Note:
This function is threadsafe
Parameters:
plA pointer to the plist that is to be created
probsThe probability matrix used for creting the plist
lengthThe length of the RNA sequence
cutoffThe cutoff value
int get_pf_arrays ( short **  S_p,
short **  S1_p,
char **  ptype_p,
FLT_OR_DBL **  qb_p,
FLT_OR_DBL **  qm_p,
FLT_OR_DBL **  q1k_p,
FLT_OR_DBL **  qln_p 
)

Get the pointers to (almost) all relavant computation arrays used in partition function computation.

Parameters:
S_pA pointer to the 'S' array (integer representation of nucleotides)
S1_pA pointer to the 'S1' array (2nd integer representation of nucleotides)
ptype_pA pointer to the pair type matrix
qb_pA pointer to the QB matrix
qm_pA pointer to the QM matrix
q1k_pA pointer to the 5' slice of the Q matrix ( $q1k(k) = Q(1, k)$)
qln_pA pointer to the 3' slice of the Q matrix ( $qln(l) = Q(l, n)$)
Returns:
Non Zero if everything went fine, 0 otherwise
char* get_centroid_struct_pl ( int  length,
double *  dist,
plist pl 
)

Get the centroid structure of the ensemble.

This function is a threadsafe replacement for centroid() with a 'plist' input

The centroid is the structure with the minimal average distance to all other structures
$ <d(S)> = \sum_{(i,j) \in S} (1-p_{ij}) + \sum_{(i,j) \notin S} p_{ij} $
Thus, the centroid is simply the structure containing all pairs with $p_ij>0.5$ The distance of the centroid to the ensemble is written to the memory adressed by dist.

Parameters:
lengthThe length of the sequence
distA pointer to the distance variable where the centroid distance will be written to
plA pair list containing base pair probability information about the ensemble
Returns:
The centroid structure of the ensemble in dot-bracket notation
char* get_centroid_struct_pr ( int  length,
double *  dist,
FLT_OR_DBL *  pr 
)

Get the centroid structure of the ensemble.

This function is a threadsafe replacement for centroid() with a probability array input

The centroid is the structure with the minimal average distance to all other structures
$ <d(S)> = \sum_{(i,j) \in S} (1-p_{ij}) + \sum_{(i,j) \notin S} p_{ij} $
Thus, the centroid is simply the structure containing all pairs with $p_ij>0.5$ The distance of the centroid to the ensemble is written to the memory adressed by dist.

Parameters:
lengthThe length of the sequence
distA pointer to the distance variable where the centroid distance will be written to
prA upper triangular matrix containing base pair probabilities (access via iindx get_iindx() )
Returns:
The centroid structure of the ensemble in dot-bracket notation
double mean_bp_distance ( int  length)

Get the mean base pair distance of the last partition function computation.

See also:
mean_bp_distance_pr()
Parameters:
length
Returns:
mean base pair distance in thermodynamic ensemble
double mean_bp_distance_pr ( int  length,
FLT_OR_DBL *  pr 
)

Get the mean base pair distance in the thermodynamic ensemble.

This is a threadsafe implementation of mean_bp_dist() !

$<d> = \sum_{a,b} p_a p_b d(S_a,S_b)$
this can be computed from the pair probs $p_ij$ as
$<d> = \sum_{ij} p_{ij}(1-p_{ij})$

Note:
This function is threadsafe
Parameters:
lengthThe length of the sequence
prThe matrix containing the base pair probabilities
Returns:
The mean pair distance of the structure ensemble
void init_pf_fold ( int  length)

Allocate space for pf_fold()

Deprecated:
This function is obsolete and will be removed soon!
char* centroid ( int  length,
double *  dist 
)
Deprecated:
This function is deprecated and should not be used anymore as it is not threadsafe!
See also:
get_centroid_struct_pl(), get_centroid_struct_pr()
double mean_bp_dist ( int  length)

get the mean pair distance of ensemble

Deprecated:
This function is not threadsafe and should not be used anymore. Use mean_bp_distance() instead!
double expLoopEnergy ( int  u1,
int  u2,
int  type,
int  type2,
short  si1,
short  sj1,
short  sp1,
short  sq1 
)
double expHairpinEnergy ( int  u,
int  type,
short  si1,
short  sj1,
const char *  string 
)