NAME - Compares predicted loci from data files as generated by to a sequence database using BLAST.

SYNOPSIS [options] [file]


-b name, --blast-dir=name

The directory with your BLAST database. If not set, the value from the BLASTDB environment variable is used.

-d name, --database=name

Name of the BLAST database to compare with. Must exist in the directory set with --blast-dir or in the directory set by BLASTDB.

-s name, --seq-dir=name

Directory with sequence files. For each sequence identifier in your input file you need to have a corresponding FASTA formatted file. The files should be named with the sequence identifier and the extension .fa or .fasta. If your identifier in your input file is for example contig100 then you should have a file named contig100.fa. (If your identifier is of the form ``assembly.chromosome" as for example used by UCSC alignments, it is also possible to name the file chr22.fa for a sequence identifier hg17.chr22).

-e X, --e-value=X

E-value cutoff. All hits with E < X are reported. (Default: 1e-06)

-v, --version

Prints version information and exits.

-h --help

Prints a brief help message and exits.


Prints the manual page and exits.

DESCRIPTION is a simple program to compare your hits to a sequence database using BLAST. To use it you need (i) a sequence database (ii) the sequence files to which the coordinates in your results file refer (iii) a NCBI BLAST installation, i.e. a blastall executable somewhere.

First you have to create a BLAST index file for your sequence database. You should have a FASTA formatted file of your database. Assume for example that the file rfam contains all sequences of the Rfam database. Run the following command

 # formatdb -t rfam -i rfam -p F

Make sure that you have the sequence files available and named correctly (see notes for the --seq-dir option). In this example we assume that the files are in the subdirectory seq

You can run the following command to compare each locus in the file results.dat with the newly created rfam database (which is in the subdirectory rfam):

 # --database=rfam --seq-dir=seq \
                --blast-dir=rfam --e-value=1e-06 \
                  results.dat > annotated.dat

If there is a hit better than E=1e-06 the name of the matching sequence and the E-value is added in double quotes as additional field to the locus line.


Stefan Washietl <>