(Nucleic Acid Sequence) String Utilitites
Functions to parse, convert, manipulate, create, and compare (nucleic acid sequence) strings.
Defines
-
XSTR(s)
- #include <ViennaRNA/utils/strings.h>
Stringify a macro after expansion.
-
STR(s)
- #include <ViennaRNA/utils/strings.h>
Stringify a macro argument.
-
FILENAME_MAX_LENGTH
- #include <ViennaRNA/utils/strings.h>
Maximum length of filenames that are generated by our programs.
This definition should be used throughout the complete ViennaRNA package wherever a static array holding filenames of output files is declared.
-
FILENAME_ID_LENGTH
- #include <ViennaRNA/utils/strings.h>
Maximum length of id taken from fasta header for filename generation.
this has to be smaller than FILENAME_MAX_LENGTH since in most cases, some suffix will be appended to the ID
-
VRNA_TRIM_LEADING
- #include <ViennaRNA/utils/strings.h>
Trim only characters leading the string.
See also
-
VRNA_TRIM_TRAILING
- #include <ViennaRNA/utils/strings.h>
Trim only characters trailing the string.
See also
-
VRNA_TRIM_IN_BETWEEN
- #include <ViennaRNA/utils/strings.h>
Trim only characters within the string.
See also
-
VRNA_TRIM_SUBST_BY_FIRST
- #include <ViennaRNA/utils/strings.h>
Replace remaining characters after trimming with the first delimiter in list.
See also
-
VRNA_TRIM_DEFAULT
- #include <ViennaRNA/utils/strings.h>
Default settings for trimming, i.e. trim leading and trailing.
See also
-
VRNA_TRIM_ALL
- #include <ViennaRNA/utils/strings.h>
Trim characters anywhere in the string.
See also
Functions
-
char *vrna_strdup_printf(const char *format, ...)
- #include <ViennaRNA/utils/strings.h>
Safely create a formatted string.
This function is a safe implementation for creating a formatted character array, similar to sprintf. Internally, it uses the asprintf function if available to dynamically allocate a large enough character array to store the supplied content. If asprintf is not available, mimic it’s behavior using vsnprintf.
See also
Note
The returned pointer of this function should always be passed to free() to release the allocated memory
- Parameters:
format – The format string (See also asprintf)
... – The list of variables used to fill the format string
- Returns:
The formatted, null-terminated string, or NULL if something has gone wrong
-
char *vrna_strdup_vprintf(const char *format, va_list argp)
- #include <ViennaRNA/utils/strings.h>
Safely create a formatted string.
This function is the va_list version of vrna_strdup_printf()
Note
The returned pointer of this function should always be passed to free() to release the allocated memory
- Parameters:
format – The format string (See also asprintf)
argp – The list of arguments to fill the format string
- Returns:
The formatted, null-terminated string, or NULL if something has gone wrong
-
int vrna_strcat_printf(char **dest, const char *format, ...)
- #include <ViennaRNA/utils/strings.h>
Safely append a formatted string to another string.
This function is a safe implementation for appending a formatted character array, similar to a cobination of strcat and sprintf. The function automatically allocates enough memory to store both, the previous content stored at
dest
and the appended format string. If thedest
pointer is NULL, the function allocate memory only for the format string. The function returns the number of characters in the resulting string or -1 in case of an error.- Parameters:
dest – The address of a char *pointer where the formatted string is to be appended
format – The format string (See also sprintf)
... – The list of variables used to fill the format string
- Returns:
The number of characters in the final string, or -1 on error
-
int vrna_strcat_vprintf(char **dest, const char *format, va_list args)
- #include <ViennaRNA/utils/strings.h>
Safely append a formatted string to another string.
This function is the va_list version of vrna_strcat_printf()
- Parameters:
dest – The address of a char *pointer where the formatted string is to be appended
format – The format string (See also sprintf)
args – The list of argument to fill the format string
- Returns:
The number of characters in the final string, or -1 on error
-
unsigned int vrna_strtrim(char *string, const char *delimiters, unsigned int keep, unsigned int options)
- #include <ViennaRNA/utils/strings.h>
Trim a string by removing (multiple) occurences of a particular character.
This function removes (multiple) consecutive occurences of a set of characters (
delimiters
) within an input string. It may be used to remove leading and/or trailing whitespaces or to restrict the maximum number of consecutive occurences of the delimiting charactersdelimiters
. Settingkeep=0
removes all occurences, while other values reduce multiple consecutive occurences to at mostkeep
delimiters. This might be useful if one would like to reduce multiple whitespaces to a single one, or to remove empty fields within a comma-separated value string.The parameter
delimiters
may be a pointer to a 0-terminated char string containing a set of any ASCII character. If NULL is passed as delimiter set or an empty char string, all whitespace characters are trimmed. Theoptions
parameter is a bit vector that specifies which part of the string should undergo trimming. The implementation distinguishes the leading (VRNA_TRIM_LEADING), trailing (VRNA_TRIM_TRAILING), and in-between (VRNA_TRIM_IN_BETWEEN) part with respect to the delimiter set. Combinations of these parts can be specified by using logical-or operator.The following example code removes all leading and trailing whitespace characters from the input string:
char string[20] = " \t blablabla "; unsigned int r = vrna_strtrim(&(string[0]), NULL, 0, VRNA_TRIM_DEFAULT);
- SWIG Wrapper Notes:
Since many scripting languages treat strings as immutable objects, this function does not modify the input string directly. Instead, it returns the modified string as second return value, together with the number of removed delimiters.
The scripting language interface provides an overloaded version of this function, with default parameters
delimiters=NULL
,keep=0
, andoptions=VRNA_TRIM_DEFAULT
. See, e.g.RNA.strtrim()
in the Python API.
See also
VRNA_TRIM_LEADING, VRNA_TRIM_TRAILING, VRNA_TRIM_IN_BETWEEN, VRNA_TRIM_SUBST_BY_FIRST, VRNA_TRIM_DEFAULT, VRNA_TRIM_ALL
Note
The delimiter always consists of a single character from the set of characters provided. In case of alternative delimiters and non-null
keep
parameter, the firstkeep
delimiters are preserved within the string. Use VRNA_TRIM_SUBST_BY_FIRST to substitute all remaining delimiting characters with the first from thedelimiters
list.- Parameters:
string – The ‘\0’-terminated input string to trim
delimiters – The delimiter characters as 0-terminated char array (or NULL)
keep – The maximum number of consecutive occurences of the delimiter in the output string
options – The option bit vector specifying the mode of operation
- Returns:
The number of delimiters removed from the string
-
char **vrna_strsplit(const char *string, const char *delimiter)
- #include <ViennaRNA/utils/strings.h>
Split a string into tokens using a delimiting character.
This function splits a string into an array of strings using a single character that delimits the elements within the string. The default delimiter is the ampersand
'&'
and will be used whenNULL
is passed as a second argument. The returned list is NULL terminated, i.e. the last element isNULL
. If the delimiter is not found, the returned list contains exactly one element: the input string.For instance, the following code:
produces this output:char **tok = vrna_strsplit("GGGG&CCCC&AAAAA", NULL); for (char **ptr = tok; *ptr; ptr++) { printf("%s\n", *ptr); free(*ptr); } free(tok);
and properly free’s the memory occupied by the returned element array.* GGGG * CCCC * AAAAA *
See also
Note
This function internally uses strtok_r()
and is therefore considered to be thread-safe. Also note, that it is the users responsibility to free the memory of the array and that of the individual element strings!
In case the input string consists of consecutive delimiters, starts or ends with one or multiple delimiters, empty strings are produced in the output list, indicating the empty fields of data resulting from the split. Use
vrna_strtrim() prior to a call to this function to remove any leading, trailing, or in-between empty fields.- Parameters:
string – The input string that should be split into elements
delimiter – The delimiting character. If
NULL
, the delimiter is"&"
- Returns:
A
NULL
terminated list of the elements in the string
-
char *vrna_strjoin(const char **strings, const char *delimiter)
- #include <ViennaRNA/utils/strings.h>
-
char *vrna_random_string(int l, const char symbols[])
- #include <ViennaRNA/utils/strings.h>
Create a random string using characters from a specified symbol set.
- Parameters:
l – The length of the sequence
symbols – The symbol set
- Returns:
A random string of length ‘l’ containing characters from the symbolset
-
int vrna_hamming_distance(const char *s1, const char *s2)
- #include <ViennaRNA/utils/strings.h>
Calculate hamming distance between two sequences.
- Parameters:
s1 – The first sequence
s2 – The second sequence
- Returns:
The hamming distance between s1 and s2
-
int vrna_hamming_distance_bound(const char *s1, const char *s2, int n)
- #include <ViennaRNA/utils/strings.h>
Calculate hamming distance between two sequences up to a specified length.
This function is similar to vrna_hamming_distance() but instead of comparing both sequences up to their actual length only the first ‘n’ characters are taken into account
- Parameters:
s1 – The first sequence
s2 – The second sequence
n – The length of the subsequences to consider (starting from the 5’ end)
- Returns:
The hamming distance between s1 and s2
-
void vrna_seq_toRNA(char *sequence)
- #include <ViennaRNA/utils/strings.h>
Convert an input sequence (possibly containing DNA alphabet characters) to RNA alphabet.
This function substitudes T and t with U and u, respectively
- Parameters:
sequence – The sequence to be converted
-
void vrna_seq_toupper(char *sequence)
- #include <ViennaRNA/utils/strings.h>
Convert an input sequence to uppercase.
- Parameters:
sequence – The sequence to be converted
-
void vrna_seq_reverse(char *sequence)
- #include <ViennaRNA/utils/strings.h>
Reverse a string in-place.
This function reverses a character string in the form of an array of characters in-place, i.e. it changes the input parameter.
See also
- Parameters:
sequence – The string to reverse
- Post:
After execution, the input
sequence
consists of the reverse string prior to the execution.
-
char *vrna_DNA_complement(const char *sequence)
- #include <ViennaRNA/utils/strings.h>
Retrieve a DNA sequence which resembles the complement of the input sequence.
This function returns a mew DNA string which is the complement of the input, i.e. the nucleotide letters
A
,C
,G
, andT
are substituted by their complementsT
,G
,C
, andA
, respectively.Any characters not belonging to the alphabet of the 4 canonical bases of DNA are not altered.
See also
Note
This function also handles lower-case input sequences and treats
U
of the RNA alphabet equally toT
- Parameters:
sequence – the input DNA sequence
- Returns:
The complement of the input DNA sequence
-
char *vrna_seq_ungapped(const char *sequence)
- #include <ViennaRNA/utils/strings.h>
Remove gap characters from a nucleotide sequence.
- Parameters:
sequence – The original, null-terminated nucleotide sequence
- Returns:
A copy of the input sequence with all gap characters removed
-
char *vrna_cut_point_insert(const char *string, int cp)
- #include <ViennaRNA/utils/strings.h>
Add a separating ‘&’ character into a string according to cut-point position.
If the cut-point position is less or equal to zero, this function just returns a copy of the provided string. Otherwise, the cut-point character is set at the corresponding position
- Parameters:
string – The original string
cp – The cut-point position
- Returns:
A copy of the provided string including the cut-point character
-
char *vrna_cut_point_remove(const char *string, int *cp)
- #include <ViennaRNA/utils/strings.h>
Remove a separating ‘&’ character from a string.
This function removes the cut-point indicating ‘&’ character from a string and memorizes its position in a provided integer variable. If not ‘&’ is found in the input, the integer variable is set to -1. The function returns a copy of the input string with the ‘&’ being sliced out.
- Parameters:
string – The original string
cp – The cut-point position
- Returns:
A copy of the input string with the ‘&’ being sliced out
-
size_t *vrna_strchr(const char *string, int c, size_t n)
- #include <ViennaRNA/utils/strings.h>
Find (all) occurrences of a character within a string.
string The C string to be scanned
c The character to be searched for
n The maximum number of occurences to search for (or 0 for all occurrences)
- Returns:
An 1-based array of positions(0-based) or NULL on error. Position 0 specifies the number of occurrences found.
-
XSTR(s)