rnazIndex.pl
- Convert data files as generated by rnazCluster.pl
to different formats.
rnazIndex.pl [options] [file]
Generate GFF formatted output.
Generate BED formatted output.
Append a column named LABEL to the HTML-table holding the data from
the input file column with index #. e.g. rnazIndex.pl --html --col
19:Alifoldz --col 20:RNAmicro annotated.dat
Get sequences in FASTA format for loci or windows. See options
--seq-dir
, --forward
, --reverse
!
Directory with sequence files. You only need this for FASTA output
(see option --fasta
). The files should be named with the sequence
identifier and the extension .fa
or .fasta
. If your identifier
in your input file is for example contig100
then you should have a
file named contig100.fa
. (If your identifier is of the form
``assembly.chromosome" as for example used by UCSC alignments, it is
also possible to name the file chr22.fa
for a sequence identifier
hg17.chr22
).
Only relevant for FASTA output (see option --fasta
). You can set if
you want the forward or reverse complement of the sequence
corresponding to a locus. Since loci don't have strand information you
might consider both strands for further analysis. Windows have strand
information, so if you export windows as FASTA these options are
ignored.
In UCSC MAF alignment files it is common to use sequence identifiers like for example ``hg17.chr22". However, in BED are usually specific for a given assembly and therefore only ``chr22" is used in the BED files. With this option you change any identifier of the form ``X.Y" into ``Y". Moreover, the scores are multiplied by 1000 and rounded to integers since the UCSC genome browser expects scores between 0 and 1000.
Use the locus information to generate the lines for the GFF and BED files. This is the default.
Print the "windows" and not the "loci". Probably, rarely used function.
With this option you get a HTML table which links to the the HTML
pages which you can create by using the --html
option in
rnazCluster.pl
. Redirect the output to some file which resides in
the results
directory created by rnazCluster.pl
and open the
file with your favourite web-browser.
Prints a short help message and exits.
Prints a detailed manual page and exits.
rnazIndex.pl
reads tab-delimited data files as generated by
rnazCluster.pl
and converts them to GFF, BED or HTML formatted
files.
GFF is the most widely used annotation file format and supported by many programs and systems (http://www.sanger.ac.uk/Software/formats/GFF).
BED is the native annotation file format used by the UCSC genome browser (http://genome.ucsc.edu).
# rnazIndex.pl --gff results.dat > results.gff
Converts the results.dat
file to GFF format.
# rnazIndex.pl --ucsc --bed results.dat > results.bed
Create UCSC style BED format.
# rnazIndex.pl --html results.dat > results/index.html
Generates HTML formatted table.
# rnazIndex.pl --forward --fasta --seq-dir=seq results.dat
Exports sequences in FASTA format.
Stefan Washietl <wash@tbi.univie.ac.at>