rnazBlast.pl
- Compares predicted loci from data files as generated
by rnazCluster.pl
to a sequence database using BLAST.
rnazBlast.pl [options] [file]
The directory with your BLAST database. If not set, the value from the
BLASTDB
environment variable is used.
Name of the BLAST database to compare with. Must exist in the
directory set with --blast-dir
or in the directory set by
BLASTDB
.
Directory with sequence files. For each sequence identifier in your
input file you need to have a corresponding FASTA formatted file. The
files should be named with the sequence identifier and the extension
.fa
or .fasta
. If your identifier in your input file is for
example contig100
then you should have a file named
contig100.fa
. (If your identifier is of the form
``assembly.chromosome" as for example used by UCSC alignments, it is
also possible to name the file chr22.fa
for a sequence identifier
hg17.chr22
).
E-value cutoff. All hits with E < X are reported. (Default: 1e-06)
Prints version information and exits.
Prints a brief help message and exits.
Prints the manual page and exits.
rnazBlast.pl
is a simple program to compare your hits to a sequence
database using BLAST. To use it you need (i) a sequence database (ii)
the sequence files to which the coordinates in your results file refer
(iii) a NCBI BLAST installation, i.e. a blastall
executable
somewhere.
First you have to create a BLAST index file for your sequence
database. You should have a FASTA formatted file of your
database. Assume for example that the file rfam
contains all
sequences of the Rfam database. Run the following command
# formatdb -t rfam -i rfam -p F
Make sure that you have the sequence files available and named
correctly (see notes for the --seq-dir
option). In this example we
assume that the files are in the subdirectory seq
You can run the following command to compare each locus in the file
results.dat
with the newly created rfam
database (which is in
the subdirectory rfam
):
# rnazBlast.pl --database=rfam --seq-dir=seq \ --blast-dir=rfam --e-value=1e-06 \ results.dat > annotated.dat
If there is a hit better than E=1e-06 the name of the matching sequence and the E-value is added in double quotes as additional field to the locus line.
Stefan Washietl <wash@tbi.univie.ac.at>