NAME

rnazCluster.pl - Cluster RNAz hits and print a summary of the results.

SYNOPSIS

 rnazCluster.pl [options] [file]

OPTIONS

-c X, --cutoff=X: Only consider hits with RNAz class probablility P>X (Default:0.5)
-w, --windows
-l, --loci: Set these flags to print information for ``windows" and/or ``loci" in the output. By default, both single windows and combined loci are printed.
-d, --header: Print a header explaining the fields of the output (see below for a detailed description of the fields).
--html: Generates HTML formatted output of the results in the subdirectory results. For this option to work you need to have installed ghostscript and a few programs from the ViennaRNA package. More precisely you need the following executables in your PATH: gs, RNAalifold, colorrna.pl, coloraln.pl. Alternatively you can adjust the locations of these programs directly in the rnazCluster.pl script. Please note that if you use this option the program will get very slow because the figures have to be generated. It is also important that you have run RNAz with the --show-gaps option!
--html-dir: Name of directory where HTML pages are stored. Default: results
-v, --version: Prints version information and exits.
-h, --help: Prints a short help message and exits.
--man: Prints a detailed manual page and exits.

DESCRIPTION

rnazCluster.pl reads RNAz output files and combines hits in overlapping windows to ``loci". It prints a summary of the windows and/or loci as a tabulator delimited text to the standard output. An explanation of the fields can be found below. See the user manual for a more detailed meaning of these values.

To work properly, your RNAz output file needs to contain position information. This means there must have been genomic locations in your original alignments you scored with RNAz (i.e. MAF files with a reference sequence). Moreover, the original input alignments have to be ordered by the genomic location of the reference sequence.

If you want HTML output please see the notes for the --html option above.

FIELDS

"Window" lines

windowID
Consecutive numbered ID for each window
locusID
The locus which this window belongs to
sequenceID
Identifier of the sequence (e.g. human.chr1 or contig42)
start
Start position of the reference sequence in the window
end
End position of the reference sequence in the window
strand
Indicates if the reference sequence is from the positive or negative strand
N
Number of sequences in the alignment
columns
Number of columns in the alignment
identity
Mean pairwise identity of the alignment
meanMFE
Mean minimum free energy of the single sequences as calculated by the RNAfold algorithm
consensusMFE
``consensus MFE" for the alignment as calculated by the RNAalifold algorithm
energyTerm
Contribution to the consensus MFE which comes from the energy part of the RNAalifold algorithm
covarianceTerm
Contribution to the consensus MFE which comes from the covariance part of the RNAalifold algorithm
combPerPair
Number of different base combinations per predicted pair in the consensus seconary structure
z
Mean z-score of the sequences in the alignment
SCI
Structure conservation index for the alignment
decValue
Support vector machine decision value
P
RNA class probability as calculated by the SVM

"Loci" lines

locusID
Consecutive numbered ID for each locus
sequenceID
Identifier of the sequence (e.g. human.chr1 or contig42)
start
Start position of the reference sequence in the window
end
End position of the reference sequence in the window
strand
Indicates if the reference sequence is from the positive or negative strand
maxN
Maximum number of sequences in the alignments of this locus
maxIdentity
Maximum mean pairwise indentity in the alignments of this locus
maxP
Maximum RNA class probability in the alignments of this locus
minZ
Minimum z-score in the alignments of this locus.

EXAMPLES

 # rnazCluster.pl rnaz.out

Parses and clusters the hits in the file rnaz.out and prints loci and cluster information to the standard output.

 # rnazCluster.pl -c 0.9 --html rnaz.out > results90.out

Clusters all hits from the file rnaz.out with P>0.9, writes the tab-delimited output to the file results90.out and, at the same time, generates a website in a subdirectory called results.

AUTHORS

Stefan Washietl <wash@tbi.univie.ac.at>