rnazCmsearch.pl
- Compares predicted loci from data files as generated
by rnazCluster.pl
to a sequence database using CMSEARCH.
rnazCmsearch.pl [options] [file]
The directory with the covairance models for CMSEARCH. Required option.
Directory with sequence files. For each sequence identifier in your
input file you need to have a corresponding FASTA formatted file. The
files should be named with the sequence identifier and the extension
.fa
or .fasta
. If your identifier in your input file is for
example contig100
then you should have a file named
contig100.fa
. (If your identifier is of the form
``assembly.chromosome" as for example used by UCSC alignments, it is
also possible to name the file chr22.fa
for a sequence identifier
hg17.chr22
).
You can add additional options for cmsearch here. E.g. use --cmsearch-opts="-T 40" to increase the score threshold to 40. By default a score threshold of log_2(2*length(seq)) is used.
Prints version information and exits.
Prints a brief help message and exits.
Prints the manual page and exits.
rnazCMsearch.pl
is a simple program to compare your hits to a sequence
database using CMSEARCH. To use it you need
(i) a directory with covariance models (e.g. those for Rfam families)
(ii) the sequence files to which the coordinates in your results file refer
(iii) The cmsearch
program from the Infernal package
Beware that this search can take a very long time!
Make sure that you have the sequence files available and named
correctly (see notes for the --seq-dir
option). In this example we
assume that the files are in the subdirectory seq
You can run the following command to compare each locus in the file
results.dat
with each of the covariance models in the directory rfam
):
# rnazCMsearch.pl --seq-dir=seq --cm-dir=rfam \ results.dat > annotated.dat
Any cmsearch hit the name of the matching model and the score is added in double quotes as additional field to the locus line.
Ivo Hofackerd <ivo@tbi.univie.ac.at>