NAME

rnazCluster.pl - Cluster RNAz hits and print a summary of the results.


SYNOPSIS

 rnazCluster.pl [options] [file]


OPTIONS

-c X, --cutoff=X

Only consider hits with RNAz class probablility P>X (Default:0.5)

-w, --windows
-l, --loci

Set these flags to print information for ``windows" and/or ``loci" in the output. By default, both single windows and combined loci are printed.

-d, --header

Print a header explaining the fields of the output (see below for a detailed description of the fields).

--html

Generates HTML formatted output of the results in the subdirectory results. For this option to work you need to have installed ghostscript and a few programs from the ViennaRNA package. More precisely you need the following executables in your PATH: gs, RNAalifold, colorrna.pl, coloraln.pl. Alternatively you can adjust the locations of these programs directly in the rnazCluster.pl script. Please note that if you use this option the program will get very slow because the figures have to be generated. It is also important that you have run RNAz with the --show-gaps option!

--html-dir

Name of directory where HTML pages are stored. Default: results

-v, --version

Prints version information and exits.

-h, --help

Prints a short help message and exits.

--man

Prints a detailed manual page and exits.


DESCRIPTION

rnazCluster.pl reads RNAz output files and combines hits in overlapping windows to ``loci". It prints a summary of the windows and/or loci as a tabulator delimited text to the standard output. An explanation of the fields can be found below. See the user manual for a more detailed meaning of these values.

To work properly, your RNAz output file needs to contain position information. This means there must have been genomic locations in your original alignments you scored with RNAz (i.e. MAF files with a reference sequence). Moreover, the original input alignments have to be ordered by the genomic location of the reference sequence.

If you want HTML output please see the notes for the --html option above.


FIELDS

"Window" lines

  1. windowID

    Consecutive numbered ID for each window

  2. locusID

    The locus which this window belongs to

  3. sequenceID

    Identifier of the sequence (e.g. human.chr1 or contig42)

  4. start

    Start position of the reference sequence in the window

  5. end

    End position of the reference sequence in the window

  6. strand

    Indicates if the reference sequence is from the positive or negative strand

  7. N

    Number of sequences in the alignment

  8. columns

    Number of columns in the alignment

  9. identity

    Mean pairwise identity of the alignment

  10. meanMFE

    Mean minimum free energy of the single sequences as calculated by the RNAfold algorithm

  11. consensusMFE

    ``consensus MFE" for the alignment as calculated by the RNAalifold algorithm

  12. energyTerm

    Contribution to the consensus MFE which comes from the energy part of the RNAalifold algorithm

  13. covarianceTerm

    Contribution to the consensus MFE which comes from the covariance part of the RNAalifold algorithm

  14. combPerPair

    Number of different base combinations per predicted pair in the consensus seconary structure

  15. z

    Mean z-score of the sequences in the alignment

  16. SCI

    Structure conservation index for the alignment

  17. decValue

    Support vector machine decision value

  18. P

    RNA class probability as calculated by the SVM

"Loci" lines

  1. locusID

    Consecutive numbered ID for each locus

  2. sequenceID

    Identifier of the sequence (e.g. human.chr1 or contig42)

  3. start

    Start position of the reference sequence in the window

  4. end

    End position of the reference sequence in the window

  5. strand

    Indicates if the reference sequence is from the positive or negative strand

  6. maxN

    Maximum number of sequences in the alignments of this locus

  7. maxIdentity

    Maximum mean pairwise indentity in the alignments of this locus

  8. maxP

    Maximum RNA class probability in the alignments of this locus

  9. minZ

    Minimum z-score in the alignments of this locus.


EXAMPLES

 # rnazCluster.pl rnaz.out

Parses and clusters the hits in the file rnaz.out and prints loci and cluster information to the standard output.

 # rnazCluster.pl -c 0.9 --html rnaz.out > results90.out

Clusters all hits from the file rnaz.out with P>0.9, writes the tab-delimited output to the file results90.out and, at the same time, generates a website in a subdirectory called results.


AUTHORS

Stefan Washietl <wash@tbi.univie.ac.at>