(Nucleic Acid Sequence) String Utilitites

Functions to parse, convert, manipulate, create, and compare (nucleic acid sequence) strings.

Defines

XSTR(s)
#include <ViennaRNA/utils/strings.h>

Stringify a macro after expansion.

STR(s)
#include <ViennaRNA/utils/strings.h>

Stringify a macro argument.

FILENAME_MAX_LENGTH
#include <ViennaRNA/utils/strings.h>

Maximum length of filenames that are generated by our programs.

This definition should be used throughout the complete ViennaRNA package wherever a static array holding filenames of output files is declared.

FILENAME_ID_LENGTH
#include <ViennaRNA/utils/strings.h>

Maximum length of id taken from fasta header for filename generation.

this has to be smaller than FILENAME_MAX_LENGTH since in most cases, some suffix will be appended to the ID

VRNA_TRIM_LEADING
#include <ViennaRNA/utils/strings.h>

Trim only characters leading the string.

See also

vrna_strtrim()

VRNA_TRIM_TRAILING
#include <ViennaRNA/utils/strings.h>

Trim only characters trailing the string.

See also

vrna_strtrim()

VRNA_TRIM_IN_BETWEEN
#include <ViennaRNA/utils/strings.h>

Trim only characters within the string.

See also

vrna_strtrim()

VRNA_TRIM_SUBST_BY_FIRST
#include <ViennaRNA/utils/strings.h>

Replace remaining characters after trimming with the first delimiter in list.

See also

vrna_strtrim()

VRNA_TRIM_DEFAULT
#include <ViennaRNA/utils/strings.h>

Default settings for trimming, i.e. trim leading and trailing.

See also

vrna_strtrim()

VRNA_TRIM_ALL
#include <ViennaRNA/utils/strings.h>

Trim characters anywhere in the string.

See also

vrna_strtrim()

Functions

char *vrna_strdup_printf(const char *format, ...)
#include <ViennaRNA/utils/strings.h>

Safely create a formatted string.

This function is a safe implementation for creating a formatted character array, similar to sprintf. Internally, it uses the asprintf function if available to dynamically allocate a large enough character array to store the supplied content. If asprintf is not available, mimic it’s behavior using vsnprintf.

Note

The returned pointer of this function should always be passed to free() to release the allocated memory

Parameters:
  • format – The format string (See also asprintf)

  • ... – The list of variables used to fill the format string

Returns:

The formatted, null-terminated string, or NULL if something has gone wrong

char *vrna_strdup_vprintf(const char *format, va_list argp)
#include <ViennaRNA/utils/strings.h>

Safely create a formatted string.

This function is the va_list version of vrna_strdup_printf()

Note

The returned pointer of this function should always be passed to free() to release the allocated memory

Parameters:
  • format – The format string (See also asprintf)

  • argp – The list of arguments to fill the format string

Returns:

The formatted, null-terminated string, or NULL if something has gone wrong

int vrna_strcat_printf(char **dest, const char *format, ...)
#include <ViennaRNA/utils/strings.h>

Safely append a formatted string to another string.

This function is a safe implementation for appending a formatted character array, similar to a cobination of strcat and sprintf. The function automatically allocates enough memory to store both, the previous content stored at dest and the appended format string. If the dest pointer is NULL, the function allocate memory only for the format string. The function returns the number of characters in the resulting string or -1 in case of an error.

Parameters:
  • dest – The address of a char *pointer where the formatted string is to be appended

  • format – The format string (See also sprintf)

  • ... – The list of variables used to fill the format string

Returns:

The number of characters in the final string, or -1 on error

int vrna_strcat_vprintf(char **dest, const char *format, va_list args)
#include <ViennaRNA/utils/strings.h>

Safely append a formatted string to another string.

This function is the va_list version of vrna_strcat_printf()

Parameters:
  • dest – The address of a char *pointer where the formatted string is to be appended

  • format – The format string (See also sprintf)

  • args – The list of argument to fill the format string

Returns:

The number of characters in the final string, or -1 on error

unsigned int vrna_strtrim(char *string, const char *delimiters, unsigned int keep, unsigned int options)
#include <ViennaRNA/utils/strings.h>

Trim a string by removing (multiple) occurences of a particular character.

This function removes (multiple) consecutive occurences of a set of characters (delimiters) within an input string. It may be used to remove leading and/or trailing whitespaces or to restrict the maximum number of consecutive occurences of the delimiting characters delimiters. Setting keep=0 removes all occurences, while other values reduce multiple consecutive occurences to at most keep delimiters. This might be useful if one would like to reduce multiple whitespaces to a single one, or to remove empty fields within a comma-separated value string.

The parameter delimiters may be a pointer to a 0-terminated char string containing a set of any ASCII character. If NULL is passed as delimiter set or an empty char string, all whitespace characters are trimmed. The options parameter is a bit vector that specifies which part of the string should undergo trimming. The implementation distinguishes the leading (VRNA_TRIM_LEADING), trailing (VRNA_TRIM_TRAILING), and in-between (VRNA_TRIM_IN_BETWEEN) part with respect to the delimiter set. Combinations of these parts can be specified by using logical-or operator.

The following example code removes all leading and trailing whitespace characters from the input string:

char          string[20]  = "  \t blablabla   ";
unsigned int  r           = vrna_strtrim(&(string[0]),
                                         NULL,
                                         0,
                                         VRNA_TRIM_DEFAULT);

SWIG Wrapper Notes:

Since many scripting languages treat strings as immutable objects, this function does not modify the input string directly. Instead, it returns the modified string as second return value, together with the number of removed delimiters.

The scripting language interface provides an overloaded version of this function, with default parameters delimiters=NULL, keep=0, and options=VRNA_TRIM_DEFAULT. See, e.g. RNA.strtrim() in the Python API.

Note

The delimiter always consists of a single character from the set of characters provided. In case of alternative delimiters and non-null keep parameter, the first keep delimiters are preserved within the string. Use VRNA_TRIM_SUBST_BY_FIRST to substitute all remaining delimiting characters with the first from the delimiters list.

Parameters:
  • string – The ‘\0’-terminated input string to trim

  • delimiters – The delimiter characters as 0-terminated char array (or NULL)

  • keep – The maximum number of consecutive occurences of the delimiter in the output string

  • options – The option bit vector specifying the mode of operation

Returns:

The number of delimiters removed from the string

char **vrna_strsplit(const char *string, const char *delimiter)
#include <ViennaRNA/utils/strings.h>

Split a string into tokens using a delimiting character.

This function splits a string into an array of strings using a single character that delimits the elements within the string. The default delimiter is the ampersand '&' and will be used when NULL is passed as a second argument. The returned list is NULL terminated, i.e. the last element is NULL. If the delimiter is not found, the returned list contains exactly one element: the input string.

For instance, the following code:

char **tok = vrna_strsplit("GGGG&CCCC&AAAAA", NULL);

for (char **ptr = tok; *ptr; ptr++) {
  printf("%s\n", *ptr);
  free(*ptr);
}
free(tok);
produces this output:

* GGGG
* CCCC
* AAAAA
*
and properly free’s the memory occupied by the returned element array.

See also

vrna_strtrim()

Note

This function internally uses strtok_r()

and is therefore considered to be thread-safe. Also note, that it is the users responsibility to free the memory of the array and that of the individual element strings!

In case the input string consists of consecutive delimiters, starts or ends with one or multiple delimiters, empty strings are produced in the output list, indicating the empty fields of data resulting from the split. Use

vrna_strtrim() prior to a call to this function to remove any leading, trailing, or in-between empty fields.

Parameters:
  • string – The input string that should be split into elements

  • delimiter – The delimiting character. If NULL, the delimiter is "&"

Returns:

A NULL terminated list of the elements in the string

char *vrna_strjoin(const char **strings, const char *delimiter)
#include <ViennaRNA/utils/strings.h>
char *vrna_random_string(int l, const char symbols[])
#include <ViennaRNA/utils/strings.h>

Create a random string using characters from a specified symbol set.

Parameters:
  • l – The length of the sequence

  • symbols – The symbol set

Returns:

A random string of length ‘l’ containing characters from the symbolset

int vrna_hamming_distance(const char *s1, const char *s2)
#include <ViennaRNA/utils/strings.h>

Calculate hamming distance between two sequences.

Parameters:
  • s1 – The first sequence

  • s2 – The second sequence

Returns:

The hamming distance between s1 and s2

int vrna_hamming_distance_bound(const char *s1, const char *s2, int n)
#include <ViennaRNA/utils/strings.h>

Calculate hamming distance between two sequences up to a specified length.

This function is similar to vrna_hamming_distance() but instead of comparing both sequences up to their actual length only the first ‘n’ characters are taken into account

Parameters:
  • s1 – The first sequence

  • s2 – The second sequence

  • n – The length of the subsequences to consider (starting from the 5’ end)

Returns:

The hamming distance between s1 and s2

void vrna_seq_toRNA(char *sequence)
#include <ViennaRNA/utils/strings.h>

Convert an input sequence (possibly containing DNA alphabet characters) to RNA alphabet.

This function substitudes T and t with U and u, respectively

Parameters:
  • sequence – The sequence to be converted

void vrna_seq_toupper(char *sequence)
#include <ViennaRNA/utils/strings.h>

Convert an input sequence to uppercase.

Parameters:
  • sequence – The sequence to be converted

void vrna_seq_reverse(char *sequence)
#include <ViennaRNA/utils/strings.h>

Reverse a string in-place.

This function reverses a character string in the form of an array of characters in-place, i.e. it changes the input parameter.

Parameters:
  • sequence – The string to reverse

Post:

After execution, the input sequence consists of the reverse string prior to the execution.

char *vrna_DNA_complement(const char *sequence)
#include <ViennaRNA/utils/strings.h>

Retrieve a DNA sequence which resembles the complement of the input sequence.

This function returns a mew DNA string which is the complement of the input, i.e. the nucleotide letters A,C,G, and T are substituted by their complements T,G,C, and A, respectively.

Any characters not belonging to the alphabet of the 4 canonical bases of DNA are not altered.

Note

This function also handles lower-case input sequences and treats U of the RNA alphabet equally to T

Parameters:
  • sequence – the input DNA sequence

Returns:

The complement of the input DNA sequence

char *vrna_seq_ungapped(const char *sequence)
#include <ViennaRNA/utils/strings.h>

Remove gap characters from a nucleotide sequence.

Parameters:
  • sequence – The original, null-terminated nucleotide sequence

Returns:

A copy of the input sequence with all gap characters removed

char *vrna_cut_point_insert(const char *string, int cp)
#include <ViennaRNA/utils/strings.h>

Add a separating ‘&’ character into a string according to cut-point position.

If the cut-point position is less or equal to zero, this function just returns a copy of the provided string. Otherwise, the cut-point character is set at the corresponding position

Parameters:
  • string – The original string

  • cp – The cut-point position

Returns:

A copy of the provided string including the cut-point character

char *vrna_cut_point_remove(const char *string, int *cp)
#include <ViennaRNA/utils/strings.h>

Remove a separating ‘&’ character from a string.

This function removes the cut-point indicating ‘&’ character from a string and memorizes its position in a provided integer variable. If not ‘&’ is found in the input, the integer variable is set to -1. The function returns a copy of the input string with the ‘&’ being sliced out.

Parameters:
  • string – The original string

  • cp – The cut-point position

Returns:

A copy of the input string with the ‘&’ being sliced out

size_t *vrna_strchr(const char *string, int c, size_t n)
#include <ViennaRNA/utils/strings.h>

Find (all) occurrences of a character within a string.

string The C string to be scanned

c The character to be searched for

n The maximum number of occurences to search for (or 0 for all occurrences)

Returns:

An 1-based array of positions(0-based) or NULL on error. Position 0 specifies the number of occurrences found.