Multiple Sequence Alignments

Defines

VRNA_FILE_FORMAT_MSA_CLUSTAL
#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating ClustalW formatted files.

VRNA_FILE_FORMAT_MSA_STOCKHOLM
#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating Stockholm 1.0 formatted files.

VRNA_FILE_FORMAT_MSA_FASTA
#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating FASTA (Pearson) formatted files.

VRNA_FILE_FORMAT_MSA_MAF
#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating MAF formatted files.

VRNA_FILE_FORMAT_MSA_MIS
#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating most informative sequence (MIS) output.

The default reference sequence output for an alignment is simply a consensus sequence. This flag allows to write the most informative equence (MIS) instead.

VRNA_FILE_FORMAT_MSA_DEFAULT
#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating the set of default file formats.

VRNA_FILE_FORMAT_MSA_NOCHECK
#include <ViennaRNA/io/file_formats_msa.h>

Option flag to disable validation of the alignment.

VRNA_FILE_FORMAT_MSA_UNKNOWN
#include <ViennaRNA/io/file_formats_msa.h>

Return flag of vrna_file_msa_detect_format() to indicate unknown or malformatted alignment.

VRNA_FILE_FORMAT_MSA_APPEND
#include <ViennaRNA/io/file_formats_msa.h>

Option flag indicating to append data to a multiple sequence alignment file rather than overwriting it.

VRNA_FILE_FORMAT_MSA_QUIET
#include <ViennaRNA/io/file_formats_msa.h>

Option flag to suppress unnecessary spam messages on stderr

VRNA_FILE_FORMAT_MSA_SILENT
#include <ViennaRNA/io/file_formats_msa.h>

Option flag to completely silence any warnings on stderr

Functions

int vrna_file_msa_read(const char *filename, char ***names, char ***aln, char **id, char **structure, unsigned int options)
#include <ViennaRNA/io/file_formats_msa.h>

Read a multiple sequence alignment from file.

This function reads the (first) multiple sequence alignment from an input file. The read alignment is split into the sequence id/name part and the actual sequence information and stored in memory as arrays of ids/names and sequences. If the alignment file format allows for additional information, such as an ID of the entire alignment or consensus structure information, this data is retrieved as well and made available. The options parameter allows to specify the set of alignment file formats that should be used to retrieve the data. If 0 is passed as option, the list of alignment file formats defaults to VRNA_FILE_FORMAT_MSA_DEFAULT.

Currently, the list of parsable multiple sequence alignment file formats consists of:

  • msa-formats-clustal

  • msa-formats-stockholm

  • msa-formats-fasta

  • msa-formats-maf

SWIG Wrapper Notes:

In the target scripting language, only the first and last argument, filename and options, are passed to the corresponding function. The other arguments, which serve as output in the C-library, are available as additional return values. This function exists as an overloaded version where the options parameter may be omitted! In that case, the options parameter defaults to VRNA_FILE_FORMAT_MSA_STOCKHOLM. See, e.g. RNA.file_msa_read() in the Python API and Parsing Alignments in the Python examples.

Note

After successfully reading an alignment, this function performs a validation of the data that includes uniqueness of the sequence identifiers, and equal sequence lengths. This check can be deactivated by passing VRNA_FILE_FORMAT_MSA_NOCHECK in the options

parameter.

It is the users responsibility to free any memory occupied by the output arguments

names, aln, id, and structure after calling this function. The function automatically sets the latter two arguments to NULL in case no corresponding data could be retrieved from the input alignment.

Parameters:
  • filename – The name of input file that contains the alignment

  • names – An address to the pointer where sequence identifiers should be written to

  • aln – An address to the pointer where aligned sequences should be written to

  • id – An address to the pointer where the alignment ID should be written to (Maybe NULL)

  • structure – An address to the pointer where consensus structure information should be written to (Maybe NULL)

  • options – Options to manipulate the behavior of this function

Returns:

The number of sequences in the alignment, or -1 if no alignment record could be found

int vrna_file_msa_read_record(FILE *fp, char ***names, char ***aln, char **id, char **structure, unsigned int options)
#include <ViennaRNA/io/file_formats_msa.h>

Read a multiple sequence alignment from file handle.

Similar to vrna_file_msa_read(), this function reads a multiple sequence alignment from an input file handle. Since using a file handle, this function is not limited to the first alignment record, but allows for looping over all alignments within the input.

The read alignment is split into the sequence id/name part and the actual sequence information and stored in memory as arrays of ids/names and sequences. If the alignment file format allows for additional information, such as an ID of the entire alignment or consensus structure information, this data is retrieved as well and made available. The options parameter allows to specify the alignment file format used to retrieve the data. A single format must be specified here, see vrna_file_msa_detect_format() for helping to determine the correct MSA file format.

Currently, the list of parsable multiple sequence alignment file formats consists of:

  • msa-formats-clustal

  • msa-formats-stockholm

  • msa-formats-fasta

  • msa-formats-maf

SWIG Wrapper Notes:

In the target scripting language, only the first and last argument, fp and options, are passed to the corresponding function. The other arguments, which serve as output in the C-library, are available as additional return values. This function exists as an overloaded version where the options parameter may be omitted! In that case, the options parameter defaults to VRNA_FILE_FORMAT_MSA_STOCKHOLM. See, e.g. RNA.file_msa_read_record() in the Python API and Parsing Alignments in the Python examples.

Note

After successfully reading an alignment, this function performs a validation of the data that includes uniqueness of the sequence identifiers, and equal sequence lengths. This check can be deactivated by passing VRNA_FILE_FORMAT_MSA_NOCHECK in the options

parameter.

It is the users responsibility to free any memory occupied by the output arguments

names, aln, id, and structure after calling this function. The function automatically sets the latter two arguments to NULL in case no corresponding data could be retrieved from the input alignment.

Parameters:
  • fp – The file pointer the data will be retrieved from

  • names – An address to the pointer where sequence identifiers should be written to

  • aln – An address to the pointer where aligned sequences should be written to

  • id – An address to the pointer where the alignment ID should be written to (Maybe NULL)

  • structure – An address to the pointer where consensus structure information should be written to (Maybe NULL)

  • options – Options to manipulate the behavior of this function

Returns:

The number of sequences in the alignment, or -1 if no alignment record could be found

unsigned int vrna_file_msa_detect_format(const char *filename, unsigned int options)
#include <ViennaRNA/io/file_formats_msa.h>

Detect the format of a multiple sequence alignment file.

This function attempts to determine the format of a file that supposedly contains a multiple sequence alignment (MSA). This is useful in cases where a MSA file contains more than a single record and therefore vrna_file_msa_read() can not be applied, since it only retrieves the first. Here, one can try to guess the correct file format using this function and then loop over the file, record by record using one of the low-level record retrieval functions for the corresponding MSA file format.

SWIG Wrapper Notes:

This function exists as an overloaded version where the options parameter may be omitted! In that case, the options parameter defaults to VRNA_FILE_FORMAT_MSA_DEFAULT. See, e.g. RNA.file_msa_detect_format() in the Python API .

See also

vrna_file_msa_read(), vrna_file_stockholm_read_record(), vrna_file_clustal_read_record(), vrna_file_fasta_read_record()

Note

This function parses the entire first record within the specified file. As a result, it returns VRNA_FILE_FORMAT_MSA_UNKNOWN not only if it can’t detect the file’s format, but also in cases where the file doesn’t contain sequences!

Parameters:
  • filename – The name of input file that contains the alignment

  • options – Options to manipulate the behavior of this function

Returns:

The MSA file format, or VRNA_FILE_FORMAT_MSA_UNKNOWN

int vrna_file_msa_write(const char *filename, const char **names, const char **aln, const char *id, const char *structure, const char *source, unsigned int options)
#include <ViennaRNA/io/file_formats_msa.h>

Write multiple sequence alignment file.

SWIG Wrapper Notes:

In the target scripting language, this function exists as a set of overloaded versions, where the last four parameters may be omitted. If the options parameter is missing the options default to (VRNA_FILE_FORMAT_MSA_STOCKHOLM | VRNA_FILE_FORMAT_MSA_APPEND). See, e.g. RNA.file_msa_write() in the Python API .

Note

Currently, we only support msa-formats-stockholm output

Parameters:
  • filename – The output filename

  • names – The array of sequence names / identifies

  • aln – The array of aligned sequences

  • id – An optional ID for the alignment

  • structure – An optional consensus structure

  • source – A string describing the source of the alignment

  • options – Options to manipulate the behavior of this function

Returns:

Non-null upon successfully writing the alignment to file