RNAlib-2.0.1
H/alifold.h File Reference

compute various properties (consensus MFE structures, partition function, Boltzmann distributed stochastic samples, ...) for RNA sequence alignments More...

Include dependency graph for alifold.h:

Go to the source code of this file.

Functions

void update_alifold_params (void)
 Update the energy parameters for alifold function.
float alifold (const char **strings, char *structure)
 Compute MFE and according consensus structure of an alignment of sequences.
float circalifold (const char **strings, char *structure)
 Compute MFE and according structure of an alignment of sequences assuming the sequences are circular instead of linear.
FLT_OR_DBL * alipf_export_bppm (void)
 Get a pointer to the base pair probability array.
void free_alifold_arrays (void)
 Free the memory occupied by MFE alifold functions.
int get_mpi (char *Alseq[], int n_seq, int length, int *mini)
 Get the mean pairwise identity in steps from ?to?(ident)
float ** readribosum (char *name)
 Read a ribosum or other user-defined scoring matrix.
float energy_of_alistruct (const char **sequences, const char *structure, int n_seq, float *energy)
 Calculate the free energy of a consensus structure given a set of aligned sequences.
void encode_ali_sequence (const char *sequence, short *S, short *s5, short *s3, char *ss, unsigned short *as, int circ)
 Get arrays with encoded sequence of the alignment.
void alloc_sequence_arrays (const char **sequences, short ***S, short ***S5, short ***S3, unsigned short ***a2s, char ***Ss, int circ)
 Allocate memory for sequence array used to deal with aligned sequences.
void free_sequence_arrays (unsigned int n_seq, short ***S, short ***S5, short ***S3, unsigned short ***a2s, char ***Ss)
 Free the memory of the sequence arrays used to deal with aligned sequences.
float alipf_fold (const char **sequences, char *structure, plist **pl)
 The partition function version of alifold() works in analogy to pf_fold().
float alipf_circ_fold (const char **sequences, char *structure, plist **pl)
FLT_OR_DBL * export_ali_bppm (void)
 Get a pointer to the base pair probability array.
char * alipbacktrack (double *prob)
 Sample a consensus secondary structure from the Boltzmann ensemble according its probability
.

Variables

double cv_fact
 This variable controls the weight of the covariance term in the energy function of alignment folding algorithms.
double nc_fact
 This variable controls the magnitude of the penalty for non-compatible sequences in the covariance term of alignment folding algorithms.

Detailed Description

compute various properties (consensus MFE structures, partition function, Boltzmann distributed stochastic samples, ...) for RNA sequence alignments


Function Documentation

void update_alifold_params ( void  )

Update the energy parameters for alifold function.

Call this to recalculate the pair matrix and energy parameters after a change in folding parameters like temperature

float alifold ( const char **  strings,
char *  structure 
)

Compute MFE and according consensus structure of an alignment of sequences.

This function predicts the consensus structure for the aligned 'sequences' and returns the minimum free energy; the mfe structure in bracket notation is returned in 'structure'.

Sufficient space must be allocated for 'structure' before calling alifold().

Parameters:
stringsA pointer to a NULL terminated array of character arrays
structureA pointer to a character array that may contain a constraining consensus structure (will be overwritten by a consensus structure that exhibits the MFE)
Returns:
The free energy score in kcal/mol
float circalifold ( const char **  strings,
char *  structure 
)

Compute MFE and according structure of an alignment of sequences assuming the sequences are circular instead of linear.

Parameters:
stringsA pointer to a NULL terminated array of character arrays
structureA pointer to a character array that may contain a constraining consensus structure (will be overwritten by a consensus structure that exhibits the MFE)
Returns:
The free energy score in kcal/mol
int get_mpi ( char *  Alseq[],
int  n_seq,
int  length,
int *  mini 
)

Get the mean pairwise identity in steps from ?to?(ident)

Parameters:
Alseq
n_seqThe number of sequences in the alignment
lengthThe length of the alignment
mini
Returns:
The mean pairwise identity
float energy_of_alistruct ( const char **  sequences,
const char *  structure,
int  n_seq,
float *  energy 
)

Calculate the free energy of a consensus structure given a set of aligned sequences.

Parameters:
sequencesThe NULL terminated array of sequences
structureThe consensus structure
n_seqThe number of sequences in the alignment
energyA pointer to an array of at least two floats that will hold the free energies (energy[0] will contain the free energy, energy[1] will be filled with the covariance energy term)
Returns:
free energy in kcal/mol
void encode_ali_sequence ( const char *  sequence,
short *  S,
short *  s5,
short *  s3,
char *  ss,
unsigned short *  as,
int  circ 
)

Get arrays with encoded sequence of the alignment.

this function assumes that in S, S5, s3, ss and as enough space is already allocated (size must be at least sequence length+2)

Parameters:
sequenceThe gapped sequence from the alignment
Spointer to an array that holds encoded sequence
s5pointer to an array that holds the next base 5' of alignment position i
s3pointer to an array that holds the next base 3' of alignment position i
ss
as
circassume the molecules to be circular instead of linear (circ=0)
void alloc_sequence_arrays ( const char **  sequences,
short ***  S,
short ***  S5,
short ***  S3,
unsigned short ***  a2s,
char ***  Ss,
int  circ 
)

Allocate memory for sequence array used to deal with aligned sequences.

Note that these arrays will also be initialized according to the sequence alignment given

See also:
free_sequence_arrays()
Parameters:
sequencesThe aligned sequences
SA pointer to the array of encoded sequences
S5A pointer to the array that contains the next 5' nucleotide of a sequence position
S3A pointer to the array that contains the next 3' nucleotide of a sequence position
a2sA pointer to the array that contains the alignment to sequence position mapping
SsA pointer to the array that contains the ungapped sequence
circassume the molecules to be circular instead of linear (circ=0)
void free_sequence_arrays ( unsigned int  n_seq,
short ***  S,
short ***  S5,
short ***  S3,
unsigned short ***  a2s,
char ***  Ss 
)

Free the memory of the sequence arrays used to deal with aligned sequences.

This function frees the memory previously allocated with alloc_sequence_arrays()

See also:
alloc_sequence_arrays()
Parameters:
n_seqThe number of aligned sequences
SA pointer to the array of encoded sequences
S5A pointer to the array that contains the next 5' nucleotide of a sequence position
S3A pointer to the array that contains the next 3' nucleotide of a sequence position
a2sA pointer to the array that contains the alignment to sequence position mapping
SsA pointer to the array that contains the ungapped sequence
float alipf_fold ( const char **  sequences,
char *  structure,
plist **  pl 
)

The partition function version of alifold() works in analogy to pf_fold().

Pair probabilities and information about sequence covariations are returned via the 'pi' variable as a list of pair_info structs. The list is terminated by the first entry with pi.i = 0.

Parameters:
sequences
structure
pl
Returns:
float alipf_circ_fold ( const char **  sequences,
char *  structure,
plist **  pl 
)
Parameters:
sequences
structure
pl
Returns:
FLT_OR_DBL* export_ali_bppm ( void  )

Get a pointer to the base pair probability array.

Accessing the base pair probabilities for a pair (i,j) is achieved by

FLT_OR_DBL *pr = export_bppm(); pr_ij = pr[iindx[i]-j]; 
See also:
get_iindx()
Returns:
A pointer to the base pair probability array
char* alipbacktrack ( double *  prob)

Sample a consensus secondary structure from the Boltzmann ensemble according its probability
.

Parameters:
probto be described (berni)
Returns:
A sampled consensus secondary structure in dot-bracket notation

Variable Documentation

double cv_fact

This variable controls the weight of the covariance term in the energy function of alignment folding algorithms.

Default is 1.

double nc_fact

This variable controls the magnitude of the penalty for non-compatible sequences in the covariance term of alignment folding algorithms.

Default is 1.