general biohaskell information

Information regarding bioinformatics and Haskell can be found on the biohaskell wiki. There are pointers toward the mailing list as well.

Ketil Malde and I are working on splitting up our bioinformatics libraries into smaller ones. Each library is supposed to do only one thing. This goes against the trend of having all-encompassing bioinformatics libraries but allows us to push new versions easily – at least that is the plan.

I write software for RNA secondary structure, whole genome ncRNA search, miRNAs and other stuff.

individual library and program descriptions

Note that with the library split, the loader will generally use the enumerator / iteratee packages. Furthermore, I’m strongly considering switching to iteratee completely due to the parallel-composition options.

All described packages are either already in a form, where they provide library functions and in addition, a program; or will soon be.

Biobase

This library is currently in the process of being dismantled. Individual data sources go in the BiobaseZZZ libraries. I’ll probably keep it around for some common / often used things.

BiobaseDotP

Parsers for Vienna dot-bracket like formats. Includes parsing two-line RNAfold output, RNAstrand dot-bracket notation and the RNAwolf extended RNA secondary structure notation.

BiobaseFR3D

Provides importers for FR3D resource files. Of particular interest are basepairs files which describe canonical and non-canonical (non-Watson-Crick) base pairings in RNA secondary structure.

BiobaseInfernal

Loads different Infernal file formats. Understands taxonomy files, and verbose hits. Other parsers will be (re-)integrated soon, as they are still based on parsec3.

BiobaseMAF

Provides a loader for MAF files. Based on Oleg Keselyovs and John Latos iteratee.

BiobaseTrainingData

Parameter training for RNA secondary structure prediction tools requires data to train on. Since there are a number of different available formats, and handling them all in the training tools is a pain, we have this library and programs. MkTrainingData transforms different formats and they all produce a common training data format. This format is Haskell-readable (and only partially human-readable) line-by-line. Generating additional training data is therefor easy as one can just cat together different training files.

BiobaseTurner

A data structure for Mathews / Turner RNA and DNA energy parameters. This library currently only provides an importer, not export functions. There are two reasons: (i) We currently have no use-case where we need more than import facilities (ii) The file structure is geared towards humans, not machines. If you need to be able to export, send a mail.

BiobaseXNA

Provides representations and functions for RNA primary and secondary structure.

CMCompare

main page

(no library yet, only an executable)

A program to compare two Infernal covariance models. Useful to determine if a newly designed structural multiple alignment in CM form has high discriminatory power. If it does not, it will produce a lot of false positives.

MC-Fold-DP

main page

A polynomial-time variant of the MC-Fold RNA folding program by Parisien and Major. Part of our ongoing effort to provide asymptotically fast prediction tools for extended RNA secondary structures.

RNAFold

Haskell version of the ViennaRNA RNAfold program. This is only the library. The program can be found in RNAFoldProgs, but this will change soon.

RNAwolf

main page

The algorithm implemented here-in provides extended RNA secondary structure prediction. Each predicted nucleotide pairing is extended with an annotation describing which of three nucleotide edges is engaged in the pairing. In addition, each nucleotide may be engaged in more than one pairing.

HsTools

Some helper functions. This library needs a clean-up.