NAME - Compares predicted loci from data files as generated by to a sequence database using CMSEARCH.

SYNOPSIS [options] [file]


-b name, --cmsearch-dir=name

The directory with the covairance models for CMSEARCH. Required option.

-s name, --seq-dir=name

Directory with sequence files. For each sequence identifier in your input file you need to have a corresponding FASTA formatted file. The files should be named with the sequence identifier and the extension .fa or .fasta. If your identifier in your input file is for example contig100 then you should have a file named contig100.fa. (If your identifier is of the form ``assembly.chromosome" as for example used by UCSC alignments, it is also possible to name the file chr22.fa for a sequence identifier hg17.chr22).


You can add additional options for cmsearch here. E.g. use --cmsearch-opts="-T 40" to increase the score threshold to 40. By default a score threshold of log_2(2*length(seq)) is used.

-v, --version

Prints version information and exits.

-h --help

Prints a brief help message and exits.


Prints the manual page and exits.

DESCRIPTION is a simple program to compare your hits to a sequence database using CMSEARCH. To use it you need (i) a directory with covariance models (e.g. those for Rfam families) (ii) the sequence files to which the coordinates in your results file refer (iii) The cmsearch program from the Infernal package

Beware that this search can take a very long time!

Make sure that you have the sequence files available and named correctly (see notes for the --seq-dir option). In this example we assume that the files are in the subdirectory seq

You can run the following command to compare each locus in the file results.dat with each of the covariance models in the directory rfam):

 # --seq-dir=seq --cm-dir=rfam \
                  results.dat > annotated.dat

Any cmsearch hit the name of the matching model and the score is added in double quotes as additional field to the locus line.


Ivo Hofackerd <>