Experimental Structure Probing Data
While RNA secondary structure prediction yields good predictions in general, the model implemented in the prediction algorithms and its parameters are not perfect. This may be due to several reasons, such as uncertainties in the parameters and the simplified assumptions of the model itself. However, prediction performance can be increased by integrating (experimental) RNA structure probing data, such as derived from selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE), dimethyl sulfate (DMS), inline probing, or similar techniques.
Such experimental probing data is usually integrated in the form of small pertubations in the evaluated energy contributions (Soft Constraints) that effectively guide the prediction towards the information gained from the experiment.
In the following, you’ll find the respective API symbols that allow for the integration of experimental probing data. In particular, we implement the most commonly used methods of how such data can be converted into pseudo energies that can then be turned into soft constraints.
Specialized Modules:
Generic Probing Data API
Include Experimental Structure Probing Data to Guide Structure Predictions.
Defines
-
VRNA_PROBING_METHOD_DEIGAN2009
- #include <ViennaRNA/probing/basic.h>
A flag indicating probing data conversion method of Deigan et al. [2009] .
-
VRNA_PROBING_METHOD_DEIGAN2009_DEFAULT_m
- #include <ViennaRNA/probing/basic.h>
Default parameter for slope
m
as used in method of Deigan et al. [2009] .
-
VRNA_PROBING_METHOD_DEIGAN2009_DEFAULT_b
- #include <ViennaRNA/probing/basic.h>
Default parameter for intercept
b
as used in method of Deigan et al. [2009] .
-
VRNA_PROBING_METHOD_ZARRINGHALAM2012
- #include <ViennaRNA/probing/basic.h>
A flag indicating probing data conversion method of Zarringhalam et al. [2012] .
-
VRNA_PROBING_METHOD_ZARRINGHALAM2012_DEFAULT_beta
- #include <ViennaRNA/probing/basic.h>
Default parameter
beta
as used in method of Zarringhalam et al. [2012] .
-
VRNA_PROBING_METHOD_ZARRINGHALAM2012_DEFAULT_conversion
- #include <ViennaRNA/probing/basic.h>
Default conversion method of probing data into probabilities as used in method of Zarringhalam et al. [2012] .
-
VRNA_PROBING_METHOD_ZARRINGHALAM2012_DEFAULT_probability
- #include <ViennaRNA/probing/basic.h>
Default probability value for missing data in method of Zarringhalam et al. [2012] .
-
VRNA_PROBING_METHOD_WASHIETL2012
- #include <ViennaRNA/probing/basic.h>
A flag indicating probing data conversion method of Washietl et al. [2012] .
-
VRNA_PROBING_METHOD_EDDY2014_2
- #include <ViennaRNA/probing/basic.h>
A flag indicating probing data conversion method of Eddy [2014] .
This flag indicates to use an implementation that distinguishes two classes of structural context, in particular paired and unpaired positions.
-
VRNA_PROBING_METHOD_MULTI_PARAMS_0
- #include <ViennaRNA/probing/basic.h>
probing data conversion flag for comparative structure predictions indicating no parameter to be sequence specific
-
VRNA_PROBING_METHOD_MULTI_PARAMS_1
- #include <ViennaRNA/probing/basic.h>
probing data conversion flag for comparative structure predictions indicating 1st parameter to be sequence specific
-
VRNA_PROBING_METHOD_MULTI_PARAMS_2
- #include <ViennaRNA/probing/basic.h>
probing data conversion flag for comparative structure predictions indicating 2nd parameter to be sequence specific
-
VRNA_PROBING_METHOD_MULTI_PARAMS_3
- #include <ViennaRNA/probing/basic.h>
probing data conversion flag for comparative structure predictions indicating 3rd parameter to be sequence specific
-
VRNA_PROBING_METHOD_MULTI_PARAMS_DEFAULT
- #include <ViennaRNA/probing/basic.h>
probing data conversion flag for comparative structure predictions indicating default parameter settings
Essentially, this setting indicates that all probing data is to be converted using the same parameters. Use any combination of VRNA_PROBING_METHOD_MULTI_PARAMS_1, VRNA_PROBING_METHOD_MULTI_PARAMS_2, VRNA_PROBING_METHOD_MULTI_PARAMS_3, and so on to indicate that the first, second, third, or other parameter is sequence specific.
-
VRNA_PROBING_DATA_CHECK_SEQUENCE
- #include <ViennaRNA/probing/basic.h>
Typedefs
-
typedef struct vrna_probing_data_s *vrna_probing_data_t
- #include <ViennaRNA/probing/basic.h>
A data structure that contains RNA structure probing data and specifies how this data is to be integrated into structure predictions.
Functions
-
int vrna_sc_probing(vrna_fold_compound_t *fc, vrna_probing_data_t data)
- #include <ViennaRNA/probing/basic.h>
Apply probing data (e.g. SHAPE) to guide the structure prediction.
- SWIG Wrapper Notes:
This function is attached as method sc_probing() to objects of type fold_compound. See, e.g.
RNA.fold_compound.sc_probing()
in the Python API .
See also
vrna_probing_data_t, vrna_probing_data_free(), vrna_probing_data_Deigan2009(), vrna_probing_data_Deigan2009_comparative(), vrna_probing_data_Zarringhalam2012(), vrna_probing_data_Zarringhalam2012_comparative(), vrna_probing_data_Eddy2014_2(), vrna_probing_data_Eddy2014_2_comparative()
- Parameters:
fc – The vrna_fold_compound_t the probing data should be applied to in subsequent computations
data – The prepared probing data and probing data integration strategy
- Returns:
The number of probing data sets applied, 0 upon any error
-
vrna_probing_data_t vrna_probing_data_Deigan2009(const double *reactivities, unsigned int n, double m, double b)
- #include <ViennaRNA/probing/basic.h>
Prepare probing data according to Deigan et al. 2009 method.
Prepares a data structure to be used with vrna_sc_probing() to directed RNA folding using the simple linear ansatz
\[ \Delta G_{\text{SHAPE}}(i) = m \ln(\text{SHAPE reactivity}(i)+1)+ b \]to convert probing data, e.g. SHAPE reactivity values, to pseudo energies whenever a nucleotide \( i \) contributes to a stacked pair. A positive slope \( m \) penalizes high reactivities in paired regions, while a negative intercept \( b \) results in a confirmatory `bonus’ free energy for correctly predicted base pairs. Since the energy evaluation of a base pair stack involves two pairs, the pseudo energies are added for all four contributing nucleotides. Consequently, the energy term is applied twice for pairs inside a helix and only once for pairs adjacent to other structures. For all other loop types the energy model remains unchanged even when the experimental data highly disagrees with a certain motif.
- SWIG Wrapper Notes:
This function exists in two forms, (i) as overloaded function probing_data_Deigan2009() and (ii) as constructor of the probing_data object. For the former the second argument
n
can be omitted since the length of thereactivities
list is determined from the list itself. When the #vrna_probing_data_s constructor is called with the three parametersreactivities
,m
andb
, it will automatically create a prepared data structure for the Deigan et al. 2009 method. See, e.g.RNA.probing_data_Deigan2009()
andRNA.probing_data()
in the Python API .
See also
vrna_probing_data_t, vrna_probing_data_free(), vrna_sc_probing(), vrna_probing_data_Deigan2009_comparative(), vrna_probing_data_Zarringhalam2012(), vrna_probing_data_Zarringhalam2012_comparative(), vrna_probing_data_Eddy2014_2(), vrna_probing_data_Eddy2014_2_comparative()
Note
For further details, we refer to Deigan et al. [2009] .
- Parameters:
reactivities – 1-based array of per-nucleotide probing data, e.g. SHAPE reactivities
n – The length of the
reactivities
listm – The slope used for the probing data to soft constraints conversion strategy
b – The intercept used for the probing data to soft constraints conversion strategy
- Returns:
A pointer to a data structure containing the probing data and any preparations necessary to use it in vrna_sc_probing() according to the method of Deigan et al. [2009] or NULL on any error.
-
vrna_probing_data_t vrna_probing_data_Deigan2009_comparative(const double **reactivities, const unsigned int *n, unsigned int n_seq, double *ms, double *bs, unsigned int multi_params)
- #include <ViennaRNA/probing/basic.h>
Prepare (multiple) probing data according to Deigan et al. 2009 method for comparative structure predictions.
Similar to vrna_probing_data_Deigan2009(), this function prepares a data structure to be used with vrna_sc_probing() to directed RNA folding using the simple linear ansatz
\[ \Delta G_{\text{SHAPE}}(i) = m \ln(\text{SHAPE reactivity}(i)+1)+ b \]to convert probing data, e.g. SHAPE reactivity values, to pseudo energies whenever a nucleotide \( i \) contributes to a stacked pair. This functions purpose is to allow for adding multiple probing data as required for comparative structure predictions over multiple sequence alignments (MSA) with
n_seq
sequences. For that purpose,reactivities
can be provided for any of the sequences in the MSA. Individual probing data is always expected to be specified in sequence coordinates, i.e. without considering gaps in the MSA. Therefore, each set ofreactivities
may have a different length as specified the parametern
. In addition, each set of probing data may undergo the conversion using different parameters \( m \) and \( b \). Whether or not multiple sets of conversion parameters are provided must be specified using themulti_params
flag parameter. Use VRNA_PROBING_METHOD_MULTI_PARAMS_1 to indicate thatms
points to an array of slopes for each sequence. Along with that, VRNA_PROBING_METHOD_MULTI_PARAMS_2 indicates thatbs
is pointing to an array of intercepts for each sequence. Bitwise-OR of the two values renders both parameters to be sequence specific.See also
vrna_probing_data_t, vrna_probing_data_free(), vrna_sc_probing(), vrna_probing_data_Deigan2009(), vrna_probing_data_Zarringhalam2012(), vrna_probing_data_Zarringhalam2012_comparative(), vrna_probing_data_Eddy2014_2(), vrna_probing_data_Eddy2014_2_comparative(), VRNA_PROBING_METHOD_MULTI_PARAMS_0, VRNA_PROBING_METHOD_MULTI_PARAMS_1, VRNA_PROBING_METHOD_MULTI_PARAMS_2, VRNA_PROBING_METHOD_MULTI_PARAMS_DEFAULT
Note
For further details, we refer to Deigan et al. [2009] .
- Parameters:
reactivities – 0-based array of 1-based arrays of per-nucleotide probing data, e.g. SHAPE reactivities
n – 0-based array of lengths of the
reactivities
listsn_seq – The number of sequences in the MSA
ms – 0-based array of the slopes used for the probing data to soft constraints conversion strategy or the address of a single slope value to be applied for all data
bs – 0-based array of the intercepts used for the probing data to soft constraints conversion strategy or the address of a single intercept value to be applied for all data
multi_params – A flag indicating what is passed through parameters
ms
andbs
- Returns:
A pointer to a data structure containing the probing data and any preparations necessary to use it in vrna_sc_probing() according to the method of Deigan et al. [2009] or NULL on any error.
-
vrna_probing_data_t vrna_probing_data_Zarringhalam2012(const double *reactivities, unsigned int n, double beta, const char *pr_conversion, double pr_default)
- #include <ViennaRNA/probing/basic.h>
Prepare probing data according to Zarringhalam et al. 2012 method.
Prepares a data structure to be used with vrna_sc_probing() to directed RNA folding using the method of Zarringhalam et al. [2012] .
This method first converts the observed probing data of nucleotide \( i \) into a probability \( q_i \) that position \( i \) is unpaired by means of a non-linear map. Then pseudo-energies of the form
\[ \Delta G_{\text{SHAPE}}(x,i) = \beta\ |x_i - q_i| \]are computed, where \( x_i=0 \) if position \( i \) is unpaired and \( x_i=1 \) if \( i \) is paired in a given secondary structure. The parameter \( \beta \) serves as scaling factor. The magnitude of discrepancy between prediction and experimental observation is represented by \( |x_i - q_i| \).
See also
vrna_probing_data_t, vrna_probing_data_free(), vrna_sc_probing(), vrna_probing_data_Zarringhalam2012_comparative(), vrna_probing_data_Deigan2009(), vrna_probing_data_Deigan2009_comparative(), vrna_probing_data_Eddy2014_2(), vrna_probing_data_Eddy2014_2_comparative()
Note
For further details, we refer to Zarringhalam et al. [2012]
- Parameters:
reactivities – 1-based array of per-nucleotide probing data, e.g. SHAPE reactivities
n – The length of the
reactivities
listbeta – The scaling factor \( \beta \) of the conversion function
pr_conversion – A flag that specifies how to convert reactivities to probabilities
pr_default – The default probability for a nucleotide where reactivity data is missing for
- Returns:
A pointer to a data structure containing the probing data and any preparations necessary to use it in vrna_sc_probing() according to the method of Zarringhalam et al. [2012] or NULL on any error.
-
vrna_probing_data_t vrna_probing_data_Zarringhalam2012_comparative(const double **reactivities, unsigned int *n, unsigned int n_seq, double *betas, const char **pr_conversions, double *pr_defaults, unsigned int multi_params)
- #include <ViennaRNA/probing/basic.h>
Prepare probing data according to Zarringhalam et al. 2012 method for comparative structure predictions.
Similar to vrna_probing_data_Zarringhalam2012(), this function prepares a data structure to be used with vrna_sc_probing() to guide RNA folding using the method of Zarringhalam et al. [2012] .
This functions purpose is to allow for adding multiple probing data as required for comparative structure predictions over multiple sequence alignments (MSA) with
n_seq
sequences. For that purpose,reactivities
can be provided for any of the sequences in the MSA. Individual probing data is always expected to be specified in sequence coordinates, i.e. without considering gaps in the MSA. Therefore, each set ofreactivities
may have a different length as specified the parametern
. In addition, each set of probing data may undergo the conversion using different parameters \( beta \). Additionally, the probing data to probability conversions strategy and default values for missing data can be specified in a sequence-based manner. Whether or not multiple conversion parameters are provided must be specified using themulti_params
flag parameter. Use VRNA_PROBING_METHOD_MULTI_PARAMS_1 to indicate thatbetas
points to an array of \( beta \) values for each sequence. VRNA_PROBING_METHOD_MULTI_PARAMS_2 indicates thatpr_conversions
is pointing to an array of probing data to probability conversion strategies, and VRNA_PROBING_METHOD_MULTI_PARAMS_3 indicates multiple default probabilities for missing data. Bitwise-OR of the three values renders all of them to be sequence specific.See also
vrna_probing_data_t, vrna_probing_data_free(), vrna_sc_probing(), vrna_probing_data_Zarringhalam2012_comparative(), vrna_probing_data_Deigan2009(), vrna_probing_data_Deigan2009_comparative(), vrna_probing_data_Eddy2014_2(), vrna_probing_data_Eddy2014_2_comparative(), VRNA_PROBING_METHOD_MULTI_PARAMS_0, VRNA_PROBING_METHOD_MULTI_PARAMS_1, VRNA_PROBING_METHOD_MULTI_PARAMS_2, VRNA_PROBING_METHOD_MULTI_PARAMS_3, VRNA_PROBING_METHOD_MULTI_PARAMS_DEFAULT
Note
For further details, we refer to Zarringhalam et al. [2012]
- Parameters:
reactivities – 0-based array of 1-based arrays of per-nucleotide probing data, e.g. SHAPE reactivities
n – 0-based array of lengths of the
reactivities
listsn_seq – The number of sequences in the MSA
betas – 0-based array with scaling factors \( \beta \) of the conversion function or the address of a scaling factor to be applied for all data
pr_conversions – 0-based array of flags that specifies how to convert reactivities to probabilities or the address of a conversion strategy to be applied for all data
pr_defaults – 0-based array of default probabilities for a nucleotide where reactivity data is missing for or the address of a single default probability to be applied for all data
multi_params – A flag indicating what is passed through parameters
betas
,pr_conversions
, andpr_defaults
- Returns:
A pointer to a data structure containing the probing data and any preparations necessary to use it in vrna_sc_probing() according to the method of Zarringhalam et al. [2012] or NULL on any error.
-
vrna_probing_data_t vrna_probing_data_Eddy2014_2(const double *reactivities, unsigned int n, const double *unpaired_data, unsigned int unpaired_len, const double *paired_data, unsigned int paired_len)
- #include <ViennaRNA/probing/basic.h>
Add probing data as soft constraints (Eddy/RNAprob-2 method)
This approach of probing data directed RNA folding uses the probability framework proposed by Eddy [2014] :
\[ \Delta G_{\text{data}}(i) = - RT\ln(\mathbb{P}(\text{data}(i)\mid x_i\pi_i)) \]to convert probing data to pseudo energies for given nucleotide \( x_i \) and class probability \( \pi_i \) at position \( i \). The conditional probability is taken from a prior-distribution of probing data for the respective classes.
Here, the method distinguishes exactly two different classes of structural context, (i) unpaired and (ii) paired positions, following the lines of the RNAprob-2 method of Deng et al. [2016] . The reactivity distribution is computed using Gaussian kernel density estimation (KDE) with bandwidth \( h \) computed using Scott factor
\[ h = n^{-\frac{1}{5}} \]where \( n \) is the number of data points of the prior distribution.
See also
vrna_probing_data_t, vrna_probing_data_free(), vrna_sc_probing(), vrna_probing_data_Eddy2014_2_comparative(), vrna_probing_data_Deigan2009(), vrna_probing_data_Deigan2009_comparative(), vrna_probing_data_Zarringhalam2012(), vrna_probing_data_Zarringhalam2012_comparative(),
- Parameters:
reactivities – A 1-based vector of probing data, e.g. normalized SHAPE reactivities
n – Length of
reactivities
unpaired_data – Pointer to an array of probing data for unpaired nucleotides
unpaired_len – Length of
unpaired_data
paired_data – Pointer to an array of probing data for paired nucleotides
paired_len – Length of
paired_data
- Returns:
A pointer to a data structure containing the probing data and any preparations necessary to use it in vrna_sc_probing() according to the method of Eddy [2014] or NULL on any error.
-
vrna_probing_data_t vrna_probing_data_Eddy2014_2_comparative(const double **reactivities, unsigned int *n, unsigned int n_seq, const double **unpaired_datas, unsigned int *unpaired_lens, const double **paired_datas, unsigned int *paired_lens, unsigned int multi_params)
- #include <ViennaRNA/probing/basic.h>
Add probing data as soft constraints (Eddy/RNAprob-2 method) for comparative structure predictions.
Similar to vrna_probing_data_Eddy2014_2(), this function prepares a data structure for probing data directed RNA folding. It uses the probability framework proposed by Eddy [2014] :
\[ \Delta G_{\text{data}}(i) = - RT\ln(\mathbb{P}(\text{data}(i)\mid x_i\pi_i)) \]to convert probing data to pseudo energies for given nucleotide \( x_i \) and class probability \( \pi_i \) at position \( i \). The conditional probability is taken from a prior-distribution of probing data for the respective classes.
This functions purpose is to allow for adding multiple probing data as required for comparative structure predictions over multiple sequence alignments (MSA) with
n_seq
sequences. For that purpose,reactivities
can be provided for any of the sequences in the MSA. Individual probing data is always expected to be specified in sequence coordinates, i.e. without considering gaps in the MSA. Therefore, each set ofreactivities
may have a different length as specified the parametern
. In addition, each set of probing data may undergo the conversion using different prior distributions for unpaired and paired nucleotides. Whether or not multiple sets of conversion priors are provided must be specified using themulti_params
flag parameter. Use VRNA_PROBING_METHOD_MULTI_PARAMS_1 to indicate thatunpaired_datas
points to an array of unpaired probing data for each sequence. Similarly, VRNA_PROBING_METHOD_MULTI_PARAMS_2 indicates thatpaired_datas
is pointing to an array paired probing data for each sequence. Bitwise-OR of the two values renders both parameters to be sequence specific.See also
vrna_probing_data_t, vrna_probing_data_free(), vrna_sc_probing(), vrna_probing_data_Eddy2014_2(), vrna_probing_data_Deigan2009(), vrna_probing_data_Deigan2009_comparative(), vrna_probing_data_Zarringhalam2012(), vrna_probing_data_Zarringhalam2012_comparative(), VRNA_PROBING_METHOD_MULTI_PARAMS_0, VRNA_PROBING_METHOD_MULTI_PARAMS_1, VRNA_PROBING_METHOD_MULTI_PARAMS_2, VRNA_PROBING_METHOD_MULTI_PARAMS_DEFAULT
- Parameters:
reactivities – 0-based array of 1-based arrays of per-nucleotide probing data, e.g. SHAPE reactivities
n – 0-based array of lengths of the
reactivities
listsn_seq – The number of sequences in the MSA
unpaired_datas – 0-based array of 0-based arrays with probing data for unpaired nucleotides or address of a single array of such data
unpaired_lens – 0-based array of lengths for each probing data array in
unpaired_datas
paired_datas – 0-based array of 0-based arrays with probing data for paired nucleotides or address of a single array of such data
paired_lens – 0-based array of lengths for each probing data array in
paired_data
multi_params – A flag indicating what is passed through parameters
unpaired_datas
andpaired_datas
- Returns:
A pointer to a data structure containing the probing data and any preparations necessary to use it in vrna_sc_probing() according to the method of Eddy [2014] or NULL on any error.
-
void vrna_probing_data_free(vrna_probing_data_t d)
- #include <ViennaRNA/probing/basic.h>
Free memory occupied by the (prepared) probing data.
-
double **vrna_probing_data_load_n_distribute(unsigned int n_seq, unsigned int *ns, const char **sequences, const char **file_names, const int *file_name_association, unsigned int options)
- #include <ViennaRNA/probing/basic.h>
-
VRNA_PROBING_METHOD_DEIGAN2009