Energy Parameters

Modified Bases

The functions vrna_sc_mod(), vrna_sc_mod_json() and alike implement an energy correction framework to account for modified bases in the secondary structure predictions. To supply these functions with the energy parameters and general specifications of the base modification, the following JSON data format may be used:

JSON data must consist of a header section modified_bases This header is an object with the mandatory keys:

  • name specifying a name of the modified base

  • unmodified that consists of a single upper-case letter of the unmodified version of this base,

  • the one_letter_code key to specify which letter is used for the modified bases in the subsequent energy parameters, and

  • an array of pairing_partners`

The latter must be uppercase characters. An optional sources key may contain an array of related publications, e.g. those the parameters have been derived from.

Next to the header may follow additional keys to specify the actual energy contributions of the modified base in various loop contexts. All energy contributions must be specified in free energies \(\Delta G\) in units of \(\text{kcal} \cdot \text{mol}^{-1}\). To allow for rescaling of the free energies at temperatures that differ from the default (\(37^\circ C\)), enthalpy parameters \(\Delta H\) may be specified as well. Those, however are optional. The keys for free energy (at \(37^\circ C\)) and enthalpy parameters have the suffixes _energies and _enthalpies, respectively.

The parser and underlying framework currently supports the following loop contexts:

  • base pair stacks (via the stacking key prefix).

    This key must point to an object with one key value pair for each stacking interaction data is provided for. Here, the key consists of four upper-case characters denoting the interacting bases, where the the first two represent one strand in 5’ to 3’ direction and the last two the opposite strand in 3’ to 5’ direction. The values are energies in \(kcal \cdot mol^{-1}\).

  • terminal mismatches (via the mismatch key prefix).

    This key points to an object with key value pairs for each mismatch energy parameter that is available. Keys are 4 characters long nucleotide one-letter codes as used in base pair stacks above. The second and fourth character denote the two unpaired mismatching bases, while the other two represent the closing base pair.

  • dangling ends (via the dangle5 and dangle3 key prefixes).

    The object behind these keys, again, consists of key value pairs for each dangling end energy parameter. Keys are 3 characters long where the first two represent the two nucleotides that form the base pair, and the third is the unpaired base that either stacks on the 3’ or 5’ end of the enclosed part of the base pair.

  • terminal pairs (via the terminal key prefix).

    Terminal base pairs, such as AU or GU, sometimes receive an additional energy penalty. The object behind this key may list energy parameters to apply whenever particular base pairs occur at the end of a helix. Each of those parameters is specified as key value pair, where the key consists of two upper-case characters denoting the terminal base pair.

Below is a JSON template specifying most of the possible input parameters. Actual energy parameter files can be found in the source code tarball within the misc/ subdirectory.

{
  "modified_base" : {
    "name" : "My modification (M)",
    "sources" : [
      {
        "authors" : "Author 1, Author 2",
        "title" : "UV-melting of modified oligos",
        "journal" : "Some journal",
        "year" : 2022,
        "doi" : "10.0000/000000"
      }
    ],
    "unmodified" : "G",
    "pairing_partners" : [
      "U","A"
    ],
    "one_letter_code" : "M",
    "fallback" : "G",
    "stacking_energies" : {
      "MAUU" :  -1.2,
      "AGMC" :  -2.73
    },
    "stacking_enthalpies" : {
      "MAUU" :  -11.1,
      "AGMC" :  -9.73
    },
    "terminal_energies" : {
      "MU" : 0.5,
      "UM" : 0.5
    },
    "terminal_enthalpies" : {
      "MU" : 2.0,
      "UM" : 2.0
    },
    "mismatch_energies" : {
      "CMGM" : -1.11,
      "AGUM" : -0.73
    },
    "mismatch_enthalpies" : {
      "CMGM" : -11.11,
      "AGUM" : -7.73
    },
    "dangle5_energies" : {
      "UAM" : -1.01
    },
    "dangle5_enthalpies" : {
      "UAM" : -6.01
    },
    "dangle3_energies" : {
      "CGM" : -2.1,
      "GCM" : -1.3
    }
  }
}

An actual example of real-world data may look like

{
  "modified_base" : {
    "name" : "Pseudouridine",
    "sources" : [
      {
        "authors": "Graham A. Hudson, Richard J. Bloomingdale, and Brent M. Znosko",
        "title" : "Thermodynamic contribution and nearest-neighbor parameters of pseudouridine-adenosine base pairs in oligoribonucleotides",
        "journal" : "RNA 19:1474-1482",
        "year" : 2013,
        "doi" : "10.1261/rna.039610.113"
      }
    ],
    "unmodified" : "U",
    "pairing_partners" : [
      "A"
    ],
    "one_letter_code" : "P",
    "fallback" : "U",
    "stacking_energies" : {
      "APUA" :  -2.8,
      "CPGA" : -2.77,
      "GPCA" : -3.29,
      "UPAA" : -1.62,
      "PAAU" : -2.10,
      "PCAG" : -2.49,
      "PGAC" : -2.2,
      "PUAA" : -2.74
    },
    "stacking_enthalpies" : {
      "APUA" : -22.08,
      "CPGA" : -16.23,
      "GPCA" : -24.07,
      "UPAA" : -20.81,
      "PAAU" : -12.47,
      "PCAG" : -17.29,
      "PGAC" : -11.19,
      "PUAA" : -26.94
    },
    "terminal_energies" : {
      "PA" : 0.31,
      "AP" : 0.31
    },
    "terminal_enthalpies" : {
      "PA" : -2.04,
      "AP" : -2.04
    },
    "duplexes" : {
      "CGAPACGGCUAUGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -9.93,
        "dG37_p"  : -10.12
      },
      "CGCPACGGCGAUGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -10.96,
        "dG37_p"  : -11.17
      },
      "CGGPACGGCCAUGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -11.71,
        "dG37_p"  : -11.53
      },
      "CGUPACGGCAAUGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -9.10,
        "dG37_p"  : -8.83
      },
      "CGAPCCGGCUAGGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -11.92,
        "dG37_p"  : -11.53
      },
      "CGCPCCGGCGAGGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -12.93,
        "dG37_p"  : -12.57
      },
      "CGGPCCGGCCAGGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -12.76,
        "dG37_p"  : -12.94
      },
      "CGUPCCGGCAAGGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -9.76,
        "dG37_p"  : -10.24
      },
      "CGAPGCGGCUACGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -11.45,
        "dG37_p"  : -11.40
      },
      "CGCPGCGGCGACGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -12.35,
        "dG37_p"  : -12.45
      },
      "CGGPGCGGCCACGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -12.59,
        "dG37_p"  : -12.81
      },
      "CGUPGCGGCAACGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -10.34,
        "dG37_p"  : -10.11
      },
      "CGAPUCGGCUAAGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -10.42,
        "dG37_p"  : -10.86
      },
      "CGCPUCGGCGAAGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -12.06,
        "dG37_p"  : -11.91
      },
      "CGGPUCGGCCAAGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -12.51,
        "dG37_p"  : -12.27
      },
      "CGUPUCGGCAAAGC" : {
        "length1" : 7,
        "length2" : 7,
        "dG37"    : -9.51,
        "dG37_p"  : -9.58
      },
      "GCGCAPCGCGUA" : {
        "length1" : 6,
        "length2" : 6,
        "dG37"    : -9.90,
        "dG37_p"  : -9.71
      },
      "GCGCCPCGCGGA" : {
        "length1" : 6,
        "length2" : 6,
        "dG37"    : -10.63,
        "dG37_p"  : -10.84
      },
      "GCGCGPCGCGCA" : {
        "length1" : 6,
        "length2" : 6,
        "dG37"    : -10.43,
        "dG37_p"  : -10.46
      },
      "GCGCUPCGCGAA" : {
        "length1" : 6,
        "length2" : 6,
        "dG37"    : -8.55,
        "dG37_p"  : -8.50
      },
      "PAGCGCAUCGCG" : {
        "length1" : 6,
        "length2" : 6,
        "dG37"    : -8.93,
        "dG37_p"  : -8.99
      },
      "PCGCGCAGCGCG" : {
        "length1" : 6,
        "length2" : 6,
        "dG37"    : -9.56,
        "dG37_p"  : -9.66
      },
      "PGGCGCACCGCG" : {
        "length1" : 6,
        "length2" : 6,
        "dG37"    : -10.30,
        "dG37_p"  : -10.27
      },
      "PUGCGCAACGCG" : {
        "length1" : 6,
        "length2" : 6,
        "dG37"    : -9.77,
        "dG37_p"  : -9.65
      }
    }
  }
}