Multiple Sequence Alignment Utilities

Functions to extract features from and to manipulate multiple sequence alignments (MSA).

Specialized Modules:

Deprecated Interface for Multiple Sequence Alignment Utilities

Defines

VRNA_ALN_DEFAULT: #include <ViennaRNA/sequences/alignments.h>
Use default alignment settings.

VRNA_ALN_RNA: #include <ViennaRNA/sequences/alignments.h>
Convert to RNA alphabet.

VRNA_ALN_DNA: #include <ViennaRNA/sequences/alignments.h>
Convert to DNA alphabet.

VRNA_ALN_UPPERCASE: #include <ViennaRNA/sequences/alignments.h>
Convert to uppercase nucleotide letters.

VRNA_ALN_LOWERCASE: #include <ViennaRNA/sequences/alignments.h>
Convert to lowercase nucleotide letters.

VRNA_MEASURE_SHANNON_ENTROPY

#include <ViennaRNA/sequences/alignments.h>

Flag indicating Shannon Entropy measure.

Shannon Entropy is defined as \( H = - \sum_c p_c \cdot \log_2 p_c \)

Typedefs

typedef struct vrna_pinfo_s vrna_pinfo_t: #include <ViennaRNA/sequences/alignments.h>
Typename for the base pair info repesenting data structure vrna_pinfo_s.

Functions

int vrna_aln_mpi(const char **alignment)

#include <ViennaRNA/sequences/alignments.h>

Get the mean pairwise identity in steps from ?to?(ident)

SWIG Wrapper Notes:: This function is available as function aln_mpi(). See e.g. RNA.aln_mpi() in the Python API.

Parameters:

alignment – Aligned sequences

Returns:

The mean pairwise identity

vrna_pinfo_t *vrna_aln_pinfo(vrna_fold_compound_t *fc, const char *structure, double threshold)

#include <ViennaRNA/sequences/alignments.h>

Retrieve an array of vrna_pinfo_t structures from precomputed pair probabilities.

This array of structures contains information about positionwise pair probabilies, base pair entropy and more

See also

vrna_pinfo_t, and vrna_pf()

Parameters:

fc – The vrna_fold_compound_t of type VRNA_FC_TYPE_COMPARATIVE with precomputed partition function matrices
structure – An optional structure in dot-bracket notation (Maybe NULL)
threshold – Do not include results with pair probabilities below threshold

Returns:

The vrna_pinfo_t array

int *vrna_aln_pscore(const char **alignment, vrna_md_t *md)

#include <ViennaRNA/sequences/alignments.h>

SWIG Wrapper Notes:: This function is available as overloaded function aln_pscore() where the last parameter may be omitted, indicating md = NULL. See e.g. RNA.aln_pscore() in the Python API.

int vrna_pscore(vrna_fold_compound_t *fc, unsigned int i, unsigned int j): #include <ViennaRNA/sequences/alignments.h>

int vrna_pscore_freq(vrna_fold_compound_t *fc, const unsigned int *frequencies, unsigned int pairs): #include <ViennaRNA/sequences/alignments.h>

char **vrna_aln_slice(const char **alignment, unsigned int i, unsigned int j)

#include <ViennaRNA/sequences/alignments.h>

Slice out a subalignment from a larger alignment.

See also

vrna_aln_free()

Note

The user is responsible to free the memory occupied by the returned subalignment

Parameters:

alignment – The input alignment
i – The first column of the subalignment (1-based)
j – The last column of the subalignment (1-based)

Returns:

The subalignment between column \(i\) and \(j\)

void vrna_aln_free(char **alignment)

#include <ViennaRNA/sequences/alignments.h>

Free memory occupied by a set of aligned sequences.

Parameters:

alignment – The input alignment

char **vrna_aln_uppercase(const char **alignment)

#include <ViennaRNA/sequences/alignments.h>

Create a copy of an alignment with only uppercase letters in the sequences.

See also

vrna_aln_copy

Parameters:

alignment – The input sequence alignment (last entry must be NULL terminated)

Returns:

A copy of the input alignment where lowercase sequence letters are replaced by uppercase letters

char **vrna_aln_toRNA(const char **alignment)

#include <ViennaRNA/sequences/alignments.h>

Create a copy of an alignment where DNA alphabet is replaced by RNA alphabet.

See also

vrna_aln_copy

Parameters:

alignment – The input sequence alignment (last entry must be NULL terminated)

Returns:

A copy of the input alignment where DNA alphabet is replaced by RNA alphabet (T -> U)

char **vrna_aln_copy(const char **alignment, unsigned int options)

#include <ViennaRNA/sequences/alignments.h>

Make a copy of a multiple sequence alignment.

This function allows one to create a copy of a multiple sequence alignment. The options parameter additionally allows for sequence manipulation, such as converting DNA to RNA alphabet, and conversion to uppercase letters.

Parameters:

alignment – The input sequence alignment (last entry must be NULL terminated)
options – Option flags indicating whether the aligned sequences should be converted

Returns:

A (manipulated) copy of the input alignment

float *vrna_aln_conservation_struct(const char **alignment, const char *structure, const vrna_md_t *md)

#include <ViennaRNA/sequences/alignments.h>

Compute base pair conservation of a consensus structure.

This function computes the base pair conservation (fraction of canonical base pairs) of a consensus structure given a multiple sequence alignment. The base pair types that are considered canonical may be specified using the vrna_md_t.pair array. Passing NULL as parameter md results in default pairing rules, i.e. canonical Watson-Crick and GU Wobble pairs.

SWIG Wrapper Notes:: This function is available as overloaded function aln_conservation_struct() where the last parameter md may be omitted, indicating md = NULL. See, e.g. RNA.aln_conservation_struct() in the Python API.

Parameters:

alignment – The input sequence alignment (last entry must be NULL terminated)
structure – The consensus structure in dot-bracket notation
md – Model details that specify compatible base pairs (Maybe NULL)

Returns:

A 1-based vector of base pair conservations

float *vrna_aln_conservation_col(const char **alignment, const vrna_md_t *md_p, unsigned int options)

#include <ViennaRNA/sequences/alignments.h>

Compute nucleotide conservation in an alignment.

This function computes the conservation of nucleotides in alignment columns. The simples measure is Shannon Entropy and can be selected by passing the VRNA_MEASURE_SHANNON_ENTROPY flag in the options parameter.

SWIG Wrapper Notes:: This function is available as overloaded function aln_conservation_col() where the last two parameters may be omitted, indicating md = NULL, and options = VRNA_MEASURE_SHANNON_ENTROPY, respectively. See e.g. RNA.aln_conservation_col() in the Python API.

See also

VRNA_MEASURE_SHANNON_ENTROPY

Note

Currently, only VRNA_MEASURE_SHANNON_ENTROPY is supported as conservation measure.

Parameters:

alignment – The input sequence alignment (last entry must be NULL terminated)
md – Model details that specify known nucleotides (Maybe NULL)
options – A flag indicating which measure of conservation should be applied

Returns:

A 1-based vector of column conservations

char *vrna_aln_consensus_sequence(const char **alignment, const vrna_md_t *md_p)

#include <ViennaRNA/sequences/alignments.h>

Compute the consensus sequence for a given multiple sequence alignment.

SWIG Wrapper Notes:: This function is available as overloaded function aln_consensus_sequence() where the last parameter may be omitted, indicating md = NULL. See e.g. RNA.aln_consensus_sequence() in the Python API.

Parameters:

alignment – The input sequence alignment (last entry must be NULL terminated)
md_p – Model details that specify known nucleotides (Maybe NULL)

Returns:

The consensus sequence of the alignment, i.e. the most frequent nucleotide for each alignment column

char *vrna_aln_consensus_mis(const char **alignment, const vrna_md_t *md_p)

#include <ViennaRNA/sequences/alignments.h>

Compute the Most Informative Sequence (MIS) for a given multiple sequence alignment.

The most informative sequence (MIS) [Freyhult et al., 2005] displays for each alignment column the nucleotides with frequency greater than the background frequency, projected into IUPAC notation. Columns where gaps are over-represented are in lower case.

SWIG Wrapper Notes:: This function is available as overloaded function aln_consensus_mis() where the last parameter may be omitted, indicating md = NULL. See e.g. RNA.aln_consensus_mis() in the Python API.

Parameters:

alignment – The input sequence alignment (last entry must be NULL terminated)
md_p – Model details that specify known nucleotides (Maybe NULL)

Returns:

The most informative sequence for the alignment

struct vrna_pinfo_s

#include <ViennaRNA/sequences/alignments.h>

A base pair info structure.

For each base pair (i,j) with i,j in [0, n-1] the structure lists:

its probability ‘p’
an entropy-like measure for its well-definedness ‘ent’
the frequency of each type of pair in ‘bp[]’
- ’bp[0]’ contains the number of non-compatible sequences
- ’bp[1]’ the number of CG pairs, etc.

Public Members

unsigned i: nucleotide position i

unsigned j: nucleotide position j

float p: Probability.

float ent: Pseudo entropy for \( p(i,j) = S_i + S_j - p_ij*ln(p_ij) \).

short bp[8]: Frequencies of pair_types.

char comp: 1 iff pair is in mfe structure