MolScript v1.4 (C) 1993

by Per Kraulis

(This document was adapted to html by A.Godknecht)

MolScript is a program for creating molecular graphics in the form of PostScript plot files. Possible representations are simple wire models, CPK spheres, ball-and-stick models, text labels and Jane Richardson-type schematic drawings of proteins, based on atomic coordinates in various formats. Colour, greyscale, shading and depth cueing can be applied to the various graphical objects.

Click here for some Examples

Written by:

  Per Kraulis, Department of Molecular Biology, Uppsala University, Sweden.
               Center for Structural Biochemistry, Karolinska Inst, Sweden.

In publications, refer to:

 Per J. Kraulis, "MOLSCRIPT: a program to produce both detailed and
           schematic plots of protein structures", Journal of Applied 
           Crystallography (1991) vol 24, pp 946-950.

Copyright, distribution and license

The program source code has been copyrighted by the author. The program may *not* be distributed freely. MolScript is not public domain software. Contact Per Kraulis for instructions, if you want to obtain the program.

The address is:

Dr. Per Kraulis
Center for Structural Biochemistry
Karolinska Institute
NOVUM
S-141 57 Huddinge
SWEDEN

phone +46  8 608 9266
fax   +46  8 608 9290
email pjk@ciclid.csb.ki.se

Distribution
Introduction
Version history
Installing the program
Program dimensions
Program invokation
Command line options
How to use MolScript
A minimal input file
Example files
Input file syntax description
Input stream files
Macros
One input file - one paper - several plots
Encapsulated PostScript
raster3d
Plot header
Stereo plots
Reading coordinates
The coordinate system
Transforming coordinates
Copying atoms into a new molecule
Deleting a molecule
Selection mechanism
Atom selections
Residue selections
Name comparisons
Graphics commands
The graphics state
Graphics state parameters
Colour specifications
Vector specifications
Common problems
Syntax specification
Known bugs
About the program
Acknowledgements
References

Distribution

The program is distributed via Internet (FTP). Tapes can only be considered in cases when there is absolutely no way to distribute via the net. The distribution contains source code, documentation and example input files.

Alternatively, you may ask for permission to copy the program from some other lab.

There is no fee for academic institutions.

Please remember that when plots prepared by MolScript are used in lectures, publications or other similar occasions, then a reference to the author and any papers he has written about MolScript (see above) is to be made.

Introduction

MolScript reads an input file that describes what to draw, and outputs a PostScript file suitable for plotting by a PostScript laser printer or for displaying by, for example, the program 'xpsview' on the SGI IRIS-4D machines.

The input file specifies what coordinate file(s) to use, how the coordinates are to be transformed within the fixed coordinate system, and what graphical objects to create, and modifications to various parameters such as colours, shading, line width and others from their default values.

The output file is in PostScript format, and is meant solely for printers or display programs; it is formatted, but incomprehensible to an ordinary user.

MolScript can produce Encapsulated PostScript files (EPS or EPSF) suitable for importing into certain display programs.

MolScript can also create input for the ray-tracing package 'Raster3D', written by David Bacon, Ethan Merritt and others.

Version history

       6-Dec-1990  first attempts
v1.0  30-Jan-1991  first decently executable version
v1.1  28-Feb-1991  first exportable version
       7-Apr-1991  changed wildcard character '#' to '%' (as in X-PLOR)
      19-Apr-1991  conforms to Document Structuring Conventions version 3.0
      11-Nov-1991  bug fix: output transformation matrix transposed
v1.2  24-Apr-1992  bug fixes: aspect ratio and segment pruning (Eric Fauman),
                   Encapsulated Postscript (Paul McLaughlin, Michael Sutcliffe)
       8-Jul-1992  added wildcards '#' (any number) and '+' (any digit) in
                   string comparisons to conform fully with X-PLOR
v1.3  11-Sep-1992  bug fix: selection in atomradius/atomcolour (Leo Caves)
       3-Nov-1992  bug fix: ABORT changed to MABORT, problem in PCCURV
                   with + signs (Arne Elofsson)
v1.4  21-Dec-1993  modifications for Raster3D v2.0 (Ethan Merritt),
                   added store matrix commands, inline PDB feature, noframe,
                   input stream files, macros


Changes from previous versions

v1.1  None.


v1.2

  - The keyword 'encapsulated' now requires a specification of the maximum
    extent paper area to be drawn into by the subsequent plot(s).

  - The wildcards '#' (any number) and '+' (any digit) have been added
    for string comparisons.


v1.3  None.

v1.4

  - Interface to Raster3D v2.0, written by Ethan A. Merritt.

  - Added input stream files facility using '@'.

  - Added macro facility using '$'.

  - Added command line options -s, -e and -r (for UNIX).

  - Added 'inline-PDB' feature, allowing self-contained MolScript input files.

  - Added the commands 'store-matrix' and 'recall-matrix', which are used
    to define and use a specific transformation matrix without having to
    specify it fully each time.

  - Flag 'noframe' to omit the frame around a plot.

  - Zero atoms in a selection no longer gives an error, except for 'position'.

Version development policy

The development of MolScript progresses at a very uneven speed. New versions containing novel features are planned, but the design and implementation of these take place whenever there is time. No commitments as to release dates can be given.

However, the author gives the following statement of intent: No new features will conflict with previously available features. Earlier versions of MolScript will be proper subsets of later versions, i e everything that works with one version of MolScript will work with a later version. Of course, erroneous behaviour (i e bugs) may be changed.

Only in absolutely extremely well-founded cases will there be exceptions to this. There is a high probability that no such exceptions will ever occur. But if any, then such exceptions will be high-lighted in the documentation.

The reason for giving this statement is to make it possible for other programmers to produce software that automatically generates MolScript input files, which will be valid also for later versions of MolScript.

Installing the program

Please make sure that you do not mix the files in the 'molscript' and 'forlib' directories for this version of MolScript with those from an earlier version. There will be differences which may produce very strange errors either when compiling/linking, or when running the program.

There is a 'makefile' for the Silicon Graphics IRIS-4D IRIX systems. Other UNIX systems should be able to use the same makefile with only minor modifications. Some examples of such makefiles are included as 'makefile.*'. Note that these other makefiles are no longer kept up to date.

Do the following:

% mv molscript.tar.Z /usr/local    # or similar directory
% cd /usr/local
% zcat molscript.tar.Z | tar xvopf -	# un-compress and un-tar in one go
% cd forlib
% make                             # creates the forlib.a file
% cd ../molscript
% make                             # creates the executable 'molscript'
% make examples                    # runs the PostScript examples
% make raster3d                    # runs the Raster3D examples

The 'forlib.a' object library is created, and then the 'molscript' executable. It will be placed in the current directory. A good place to have the executable is in '/usr/local/bin'. Check that the files and directories have the correct protection so that the users can access the program and examples.

The author has tried to avoid machine specifics in these programs and packages. Of course, no guarantees can be given. The prime candidate for modification when moving to another system is the usage of 'INCLUDE' statements in the source code. These will have to be pre-processed away if your compiler does not support this feature.

Nearly all system-specific source code is located in the 'system.f' and 'system.inc' files which are in the forlib directory. These may have to be modified for other systems.

A specific VAX/VMS version is no longer supported. However, some hints are still present in the source code, so search for the string 'VAX' in the *.f and *.inc files.

Program dimensions

All dimensioning parameters are located in the 'molscript.dim' file. Change whatever is needed, and recompile. The lengths of input and output string buffers cannot be changed without considerable modifications to the source code.

Program invokation

The executable 'molscript' should be located in a directory available in each user's path, such as '/usr/local/bin'. This allows the program to be invoked as any other program. MolScript reads the input file from 'standard input', and writes the PostScript code to 'standard output'. Therefore, the following command creates the PostScript file 'protein.ps' from the input file 'protein.in':

   % molscript < protein.in > protein.ps

On the SGI IRIS-4D, there is system software to view PostScript files. In the old IRIX system (before release 4.0) the command is:

   % psview -F protein.ps

In the new IRIX system (release 4.0 and newer) the command is:

   % xpsview protein.ps

If you want to look at a plot directly without making a PostScript file, then depending on the system release, give either command:

   % molscript < protein.in | psview -F

   % molscript < protein.in | xpsview -

The '-F' means that 'psview' creates the biggest possible display window. The '-' means that 'xpsview' gets its input from 'standard input', not a file. Click the left mouse button when the red border appears, to view the results.

There may exist software similar to 'xpsview' on systems other than the SGI IRIS-4D. Ask your system manager.

The program outputs some messages about what is happening to 'standard error' (i e the window if you use any of the above commands). This is quite normal. The program beeps and aborts if something is wrong in the input file.

On VAX/VMS systems, MolScript is best executed from a COM file. Use the 'ASSIGN/USER' command to assign logical unit FOR001 to the input file, while logical unit FOR002 should be the desired PostScript file. Then execute the MolScript program as usual under VAX/VMS with the 'RUN' command.

The PostScript file is suitable for immediate output on a PostScript laser printer.

Command line options

There are three different options that can be given on the invoking command line:

   -s  silent execution: No messages are output, except when an error occurs.


   -e  produce an Encapsulated PostScript file. This option can be used
       instead of adding the keyword 'encapsulated' at the top of the input
       file. The default area is assumed (see below). If the area is
       explicitly defined in the plot, then this option should not be used.


   -r  produce an input file for Raster3D. This option can be used instead
       of adding the keyword 'raster3d' at the top of the input file.

The -e and -r options are mutually exclusive. The options are given immediately after the program name, before the file redirection specifications. Example:

   % molscript -s < protein.in > protein.ps

This facility will (almost certainly) not work on VAX/VMS systems.

How to use MolScript

The best way to use MolScript is to work on an SGI IRIS-4D machine (or a similar system with a program like 'xpsview' and a windowing system). One window should contain the input file within an editor, and another window is used to run MolScript and 'xpsview' to display the results. Edit the input file, save it, run 'molscript', run 'xpsview', edit the input file, and so on.

Start with the simplest possible plot (read coordinates, centre all atoms, output CA trace), and work iteratively to orient the molecule properly. Schematic objects (strand, helix, coil, turn) can be put in at an early stage (use Kabsch & Sander's DSSP program to get the secondary structure assignments, if these are not available directly) and may actually be easier to work with from the start.

Only when a good view has been found is it worthwhile to start the fine tuning: setting slab (to get depth-cueing of coil radius), changing colours and shading, drawing CPK or ball-and-stick, adding labels. In particular the fine tuning of labels (depth cueing, size, position, centering, offset) is very dependent on the orientation being set once and for all.

Colour and shading fine tuning is dependent on exactly what kind of output device you will be using (see below: Colour specifications).

Jane Richardson has listed some general points on how schematic drawings of proteins should look (Richardson 1985). Look at her drawings to get inspiration and guidance (Richardson 1981, 1985).

A point that the author would like to make is that 90 degree views of a structure, if carefully selected, can be very instructive. In fact, such views can often bring home the message better than stereo plots. With MolScript, 90 degree views are ridiculously easy to prepare. Stereo plots, however, are trickier to make (see below).

When displaying superpositions of structures (for instance, sets of structures derived from NMR data), then the 'store-matrix' and 'recall-matrix' commands must be used, rather than the ordinary ways of getting a good view. The reason is that if the 'position' command is used, this will give a slightly different transformation for different coordinate sets, thus destroying the least-squares fit that (almost certainly) has been done previously using some other program. In the display of superimposed NMR structures, this will give plots where it looks as if the structures fit better at the center of the plot than they actually should. The use of 'store-matrix' after the first coordinate set has been transformed, and 'recall-matrix' for the other coordinate sets, ensures that the original least-squares fit is kept intact.

A minimal input file

   ! --- This is a minimal MolScript input file; this line is a comment.

   plot
     read mol "protein.pdb";             ! Read the coordinate file.
     transform atom *                    ! Xform all atoms so that the centre-
       by centre position atom *;        ! of-gravity is placed at origin.
     trace amino-acids;                  ! Output a CA trace of the chain.
   end_plot

   ! --- Here the minimal input file ends.

Example files

A number of example input files called '*.in' are part of the distribution. These show some examples of what can be done. The makefile can be used to run these examples.

Input file syntax description

The input file is free-format; blanks, new-lines and tabs can be used freely, as long as items are separated by at least one of these. Comments are indicated by exclamation-mark '!': everything from that character to end-of-line is ignored by MolScript. Input lines longer than 80 characters are truncated, and will therefore probably generate syntax errors.

Note the semi-colon ';' after each command (see example above): this indicates the end of the command, and is required. MolScript complains about 'item does not match syntax' if it is missing.

Any error makes the program just stop. No items can be abbreviated, they must be spelled correctly, and must be in the correct character case (i e 'trace' is a valid graphics command, while 'Trace' is not).

The input file must follow a well-defined syntax (Wirth 1976). The syntax descriptions in this manual use a few conventions:

Lower-case letters are used for items that are part of the command exactly as they look. The commands are reserved words, and cannot be used for anything else (except, of course, as part of double-quoted strings).

Lower-case letters are also used to indicate items that are literal values, e g integers. This is indicated by a suffix:

   .i    integer value
   .r    real value (including decimal point)
   .w    word (a string of characters that acts as name or identifier)
   .s    string (a string of characters enclosed by double quotes)

Upper-case letters are syntax items that are further defined somewhere else. Such a definition is indicated by an equals '=' character.

An item that is optional is bracketed by '[]' (square brackets). An item that may be repeated 0 or more times is bracketed by '{}' (curly braces). A choice of one item out of a list is indicated by '< a1 | a2 | ... >' (left-arrow and right-arrow, with vertical-bar separating the items).

Input stream files

MolScript initially reads its input from 'standard input'. An input file can contain a reference to another file from where input should be read until end-of-file, when reading from 'standard input' is resumed.

This change of input stream file is specified by the item

   @filename

where the character '@' indicates that the rest of the item is a file name. Note that there is no semi-colon; this item is not part of the ordinary syntax of MolScript. Rather, it is a special item that only affects the source of the input.

This facility makes it possible for a complicated plot to be split up into several files. For example, if the structure to be plotted must be composed by several 'read' 'copy' and 'delete' statements, then this can be done in a separate file. This file is then referred to using '@' and the drawing done afterwards. In this way, several different drawings using the same coordinate setup can be made easily, with a guarantee that the coordinate sets are identical in the different plots.

The input stream can be changed in a nested fashion, to a depth specified by the MAXSTR parameter in 'molscript.dim'.

Macros

Syntax: MACRO-DEF = macro macroname.w { whatever } end_macro A macro is a block of text that is referred to by a name (or symbol). In MolScript a macro is defined by the keywords 'macro' and 'end_macro'. The first item after the 'macro' keyword is the macro name (or symbol). Everything else between 'macro' and 'end_macro' is stored as text, without interpretation.

A definition can be placed before a plot, or as an ordinary statement. In the first case, there must be no semi-colon ';' after the keyword end_macro, while in the latter case there must be a semi-colon. The macro is defined until the end of the input file, regardless of where it was defined. Macro definitions cannot be recursive.

A macro is called by giving the macro name preceded by a dollar '$'. That item will be substituted by the contents of the macro, which will be interpreted as usual. As for an input stream file (see above), this item is not part of the ordinary syntax of MolScript. Instead it can be seen as a way to change the input source, similar to the input stream file facility. When the end of the macro has been reached, the input is resumed from after the macro call. Macros may call each other, and may also change input stream files.

As an example, two fragments of input files are given which would produce identical results (note the placement of semi-colons):

     ! ----- variant 1, without macro -----

     cpk either require in type ALA and atom CB,
                require in type THR and atom CG2,
                require in type VAL and atom CG*,
                require in type ILE and either atom CD1 or atom CG2,
                require in type LEU and atom CD*
             or require in type MET and atom CE;


     ! ----- variant 2, with macro -----

     macro methyl-sel
         either require in type ALA and atom CB,
                require in type THR and atom CG2,
                require in type VAL and atom CG*,
                require in type ILE and either atom CD1 or atom CG2,
                require in type LEU and atom CD*
             or require in type MET and atom CE
     end_macro;

     cpk $methyl-sel;

Input stream files and macros can be used together, for instance to create libraries of commonly used residue or atom selections, which are read in at the top of the MolScript input file as macro definitions, and then used later simply by calling the macros.

One input file - one paper - several plots

Syntax: FILE_CONTENTS = [ < ENCAPSULATED | raster3d > ]  { PLOT }

        ENCAPSULATED = encapsulated xlowerleft.r ylowerleft.r
                                    xupperright.r yupperright.r

One input file generates one and only one paper (or display). However, one paper can contain more than one plot. These are positioned independently of each other on the paper. A later plot obliterates the part of a previous plot that it overlaps, if any.

This feature can be used to create stereo plots on one single paper (see below: Stereo plots). It can also be used to show the same structure in different orientations, or to show details or small parts (such as active sites) of a structure together with an overview.

If the required output is not the ordinary PostScript file, then an appropriate flag must be present as the very first item. Alternatively, a command line option controlling this can be used (see above).

Encapsulated PostScript

The flag 'encapsulated' has the effect of making the output PostScript file into a so-called Encapsulated PostScript file (EPS or EPSF), which is suitable for import into a number of other display programs, including some programs on the Macintosh. An EPS file cannot be output directly on a laser printer, nor displayed using 'xpsview'.

The flag 'encapsulated' must be followed by an area specification which indicates the maximum extent of anything subsequently drawn on the paper. The area is specified in PostScript units. Note that there is no semi-colon after the area specification.

It is the user's responsibility to ensure that nothing is drawn outside the specified area; MolScript does not check for this. What happens if this is violated depends entirely on the importing program.

raster3d

The flag 'raster3d' makes MolScript output a file that is meant for the ray-tracing Raster3D package. More precisely, the file is intended as input for the 'render' program in that package.

Version 1.4 of MolScript works with version 2.0 of Raster3D. This interface was written by Ethan A. Merritt.

Version 2.0 of the Raster3D package was written by David J. Bacon, Wayne F. Anderson, Ethan A. Merritt, Michael Murphy and others. The package is freely available and can be obtained by anonymous FTP from stanzi.bchem.washington.edu (IP 128.95.12.38).

If you use pictures made by Raster3D from files produced by MolScript, then be sure to cite *both* programs properly. The references for Raster3D can be found in the README.RASTER3D file that is part of the distribution from the anonymous FTP site mentioned above.

The Raster3D package cannot handle all graphical objects which can be produced by MolScript. In particular, no labels can be rendered in Raster3D. MolScript outputs a message when it ignores a command due to this and other similar limitations. Colour specifications in HSB are automatically converted to RGB for Raster3D. An additional parameter 'helixthickness' has been introduced for the helices in Raster3D.

All graphical objects are output to the Raster3D file, even when they are outside the visible volume. The reason for this is that MolScript does not know what the aspect ratio will be of the image produced by 'render', and so cannot decide confidently which will fall outside the image.

The input file for the 'render' program of Raster3D has a header which contains a number of parameters determining various aspects of the pixel map produced by 'render'. When MolScript produces such a file, it puts reasonable default values into the header. In order to get other values into the header file, it is possible to have a header file called 'header.r3d' present in the directory where MolScript is executed, which will be used instead of the default header hard-wired into MolScript. The program does not check that the external header file is valid.

It is also, of course, possible to edit by hand the 'render' input file after it has been produced by MolScript. The meaning of the various entries in the file is described in the Raster3D documentation.

Plot header

Syntax: PLOT = { MACRO-DEF } plot HEADER { COMMAND } end_plot

        HEADER = [ noframe ] [ AREA ] [ BACKGROUND ] [ WINDOW ] [ SLAB ]

The first item in the input file is 'plot' (except for any initial comments or the flags 'encapsulated' or 'raster3d'). Then some optional header items may follow. These must appear here for the current plot, or not at all. The plot is finished by the item 'end_plot', without semi-colon. If a macro definition is given outside of a plot, then there must be no semi-colon after it.

   noframe

Supresses the drawing of a frame around the plot. The filling in of the background colour will take place regardless.

   AREA = area xlowerleft.r ylowerleft.r xupperright.r yupperright.r ;

Default: area 50.0 100.0 550.0 700.0;

Controls where on the paper the plot will be placed. The values given are the x and y coordinates of the lower left and upper right corners, in PostScript units. The default values are appropriate for A4 paper. A black border line is drawn around the area. This item has no effect for Raster3D.

   BACKGROUND = background COLOUR ;

Default: background white;

Determines the colour of the background within the plot area. For colour specifications, see below. This item has no effect for Raster3D.

   WINDOW = window sidelength.r ;

Default: window value-to-encompass-objects-drawn.r;

Sets the length of the side of the coordinate system volume that is viewed. This corresponds to twice the maximum absolute value of the x or y coordinate of a point that is still visible. The window will always be fitted into the plot area. If the plot area is non-square, the actual window will be larger in either x or y (depending on the aspect ratio) than the specified window value.

A large value for window means that a large volume is viewed, making the objects small. Conversely, a small value enlarges the objects and views a smaller part of the coordinate system space.

If no explicit window is set, then MolScript uses a window that just encompasses the drawn objects. That value is then output as a message. This means that the scale (the relation Angstrom : PostScript units) will depend exactly on what is drawn, and its orientation. To obtain a constant scale, window must be explicitly set.

   SLAB = slab thickness.r ;

Default: slab value-to-encompass-objects-drawn.r;

Determines the depth of the coordinate system volume box that is viewed. Graphical segments that fall outside this volume are not output. If a small value is given, then only a thin slice of the coordinate space is visible.

If no explicit slab is set, then MolScript uses a slab that just encompasses the drawn objects. That value is then output as a message. Certain depth cue effects (coil radius, colour depth cue shading) depend on an explicitly set slab value, otherwise they are not applied.

Both window and slab define a volume with respect to the origin of the coordinate system, such that the volume is bounded by [-window/2, window/2] in x and y (if the plot area is square) and [-slab/2, slab/2] in z. The origin is always at the centre of the viewed volume.

If either or both of window and slab have been set explicitly, then graphical segments that fall entirely outside the bounds will not be output at all to the PostScript file. This saves disk space and printing execution time.

Do not set either window or slab until you have found a good orientation. Otherwise you might risk clipping your graphical objects when you change the orientation. Especially with slab, this may go unnoticed.

Stereo plots

There is no specific command to make MolScript create stereo plots. Instead, a stereo plot can be made by creating two similar plots on different papers, the only difference being a slight rotation of all coordinates (about 6 degrees) in one of them.

Alternatively, make two plots on the same paper (position them with the area command) and draw the same objects in both, adding a small rotation to one of them. There is an example input file showing how this is done.

Reading coordinates

Syntax:  READ = read molname.w < inline-PDB filename.s > ;

The read command has two arguments: first the molecule name to assign to the atoms read, and then either the keyword 'inline-PDB', or the name of the coordinate file (enclosed by double quotes). Several coordinate sets can be read in by using the command several times, but each coordinate set must be assigned to a different molecule name.

The keyword 'inline-PDB' (followed directly by a semi-colon) instructs MolScript to read the subsequent lines of the input file as a PDB file, until a PDB record with the keyword END appears. This PDB part of the input file is unusual in that it is not free-format, but must follow the PDB format rules exactly. The usual MolScript interpretation of the input file is resumed after the line containing the PDB END record. If there is no such END record, then an error will (most likely) occur as MolScript will try to interpret the rest of the input file as a PDB format file.

This facility allows the creation of entirely self-contained MolScript input files, which contain both coordinate data as well as transformation and drawing commands. It is intended mainly as a format that other programs (such as molecule display or analysis programs) could generate. In this way the MolScript input file can be considered as an intermediate plot file, which is to be converted to printable PostScript by MolScript.

If a string enclosed by double quotes is given, then that string is interpreted as a coordinate file name. The format of the coordinate file is determined from the file name: If it ends with a suffix that is recognisable to MolScript, then that format is used. Otherwise the file is assumed to be in PDB format. The following formats can be read:

   .PDB  .pdb     Brookhaven Protein Data Bank
   .MSA  .msa     atom input file for Connolly's MS program
   .DG   .dg      atom coordinates from DISGEO (Havel & Wuthrich)
   .RD   .rd      Diamond format (Frodo variant)
   .CDS  .cds     Diamond format (MRC variant)
   .WAH  .wah     Wayne Hendrickson (PROLSQ) format

Please note that MolScript cannot assume responsibility for the anarchy of file formats. If your file cannot be read, then the file is in error, not MolScript.

When the coordinate file has been read, all strings in the data are left-shifted, and blanks are squeezed out, to facilitate comparisons.

As a special case, any '*' (star) characters in atom names are changed to ''' (single-quote) to avoid a clash with the wildcard character used in MolScript. This is relevant for nucleic acids in PDB format; there, primes are represented by '*' characters.

Default atomic radii and colours (see parameters atomcolour and atomradius) are set when the coordinate file is read.

The chain identifiers in X-PLOR type PDB files (Brunger 1988) are ignored by MolScript.

The coordinate system

There is one single fixed coordinate system: the viewer looks down from the positive z axis towards the origin, at the xy plane. The coordinate system is right-handed; x increase to the right, and y upwards. The viewpoint cannot be changed. To view the molecule from another angle, the atomic coordinates have to be transformed within the coordinate system. The distance unit is Angstrom.

Some parameters are defined in PostScript units on the final plot. Examples are the plot area specification and the line width and the line dash parameters. A PostScript unit is 1/72 of an inch (this is related to typographical conventions).

Transforming coordinates

Syntax: STORE-MATRIX = store-matrix ;

        TRANSFORM = transform ATOM-SEL { by OPERATION } ;

          OPERATION = < CENTRE | TRANSLATION | ROTATION | recall-matrix >

The only way to look at a molecule from different viewpoints in MolScript is to transform the atomic coordinates within the fixed coordinate system. The coordinates of the selected atoms are transformed by applying a matrix defined by one or more operations. This means that different sets of atoms can be transformed independently.

The operations concatenate in the proper order before being applied to the selected atoms. Any number of operations can be given. This means that there is no point in trying to modify previous operations to change the view. Just add another operation to get the desired effect.

An already defined transformation matrix can be stored and used later by the commands 'store-matrix' and 'recall-matrix'. Only one matrix can be stored, and it is always the matrix specified by the most recent 'transform' command which is stored by the 'store-matrix' command. If no 'transform' command has been given when the matrix is stored, then a unit matrix is stored. The stored matrix is not changed when a new plot is started within an input file. This can be used as a means to employ the same transformation in several different plots on the same page.

   CENTRE = centre VECTOR

Transform the coordinates so that the given point is moved to the origin. This is most often used with the 'position' facility, which returns the geometrical centre-of-gravity of a set of atoms (see below: Vector specifications). Note that this will refer to the coordinates as they are, before the operations of that particular transform command have been applied. Therefore, this particular facility will produce rather unpredictable results unless it is the first operation.

   TRANSLATION = translation VECTOR

Translate the coordinates by the specified vector.

   ROTATION = rotation < AXIS | MATRIX >

     AXIS = < x | y | z > degrees.r

     MATRIX = x1.r y1.r z1.r x2.r y2.r z2.r x3.r y3.r z3.r

Rotations occur around the fixed coordinate system origin. A rotation operation is given either by an explicit 3x3 matrix (the validity of which is not checked by MolScript) or by a rotation around a specified axis by the given amount in degrees. A positive value rotates counter-clockwise around the axis, when viewed from positive axis down towards the origin.

   recall-matrix

Concatenates the stored matrix to the transformation. If no matrix has been explicitly stored, then this operation has no effect.

Copying atoms into a new molecule

Syntax: COPY = copy newmolname.w ATOM-SEL ;

The selected atoms and their residues can be copied into a new molecule. This is useful for example when creating symmetry-related molecules.

Deleting a molecule

Syntax: DELETE = delete molname.w ;

An entire molecule, defined by the given molecule name, can be deleted. This is useful mainly to make space for other coordinate data to be read in from file, or copied from other molecules.

Selection mechanism

MolScript has a powerful mechanism for selecting sets of atoms or residues as arguments to various commands. The specification works as a logical expression: those atoms or residues are selected for which the entire logical expression evaluates to true.

The logical operations 'not', 'and' and 'or' are implemented, and can be used in a nested fashion. The 'and' operator has the following form:

   require EXP1, EXP2, EXP3 ... and EXPn

which means that expressions 1, 2, 3,..., n must all be true for an atom or residue to be selected. Similarly, the 'or' operator has the form:

   either EXP1, EXP2, EXP3, ... or EXPn

meaning that if any one of expressions 1, 2, 3,..., n is true then that atom or residue is selected. Note the comma ',' characters: they are required between expressions, except before 'and' and 'or', where there must be none.

The basic selection specifications are described below. Note that selection of atoms and residues cannot be freely mixed. The syntax controls which type of selection is valid at any place in an expression.

The selection mechanism was inspired by a similar facility in the molecular dynamics refinement program X-PLOR (Brunger 1988).

Syntax: ATOM-SEL = [ not ] < ATOM-AND | ATOM-OR | ATOM-SPEC >

          ATOM-AND = require ATOM-SEL { , ATOM-SEL } and ATOM-SEL

          ATOM-OR = either ATOM-SEL { , ATOM-SEL } or ATOM-SEL

        RES-SEL = [ not ] < RES-AND | RES-OR | RES-SPEC >

          RES-AND = require RES-SEL { , RES-SEL } and RES-SEL

          RES-OR = either RES-SEL { , RES-SEL } or RES-SEL

Atom selections

If an atom selection expression yields no atoms, the no error occurs; the command it is used in is simply gives no output. However, if the atom selection is used for the 'position' vector specification, then an error occurs.

Syntax: ATOM-SPEC = < backbone | ATOM-NAME | B-FACTOR |
                      OCCUPANCY | IN | SPHERE | CLOSE >


   backbone

This is short-hand for the following selection expression:

   either
     require in amino-acids
     and     either atom N, atom CA, atom C or atom O
   or
     require not in amino-acids
     and     either atom *', atom O%P or atom P

That is, if a residue is an amino-acid, then its N, CA, C and O atoms are selected. If it is not an amino-acid, then the atoms with names appropriate for the nucleic acid residue phosphate and (deoxy)ribose groups are selected. In the latter case an expression that selects all primed atoms is used.

   ATOM-NAME = atom name.w

Selects the atoms with the given name. The name may contain wildcard characters (see below: Name comparisons).

   B-FACTOR = b-factor low.r high.r

Selects all atoms with a B-factor value within the given range.

   OCCUPANCY = occupancy low.r high.r

Selects all atoms with an occupancy value within the given range.

   IN = in RES-SEL

Selects all atoms within the selected residue(s). This is a selection specification much used for the commands 'ball-and-stick' and 'cpk'.

   SPHERE = sphere VECTOR radius.r

Selects all atoms within a sphere with centre at the given vector (see below: Vector specifications) and with the given radius.

   CLOSE = close ATOM-SEL distance.r

Selects all atoms closer than the given distance to any of the given atoms. The atoms given as argument are NOT part of the finally selected set. That is, this expression selects only neighbours to certain atoms, excluding the atoms themselves.

Residue selections

By residue name, MolScript means residue number with chain identifier and insertion codes, if any, added (no blanks in between). Residue type means kind of residue (amino acid or other).

Syntax: RES-SPEC = < amino-acids | waters | MOLECULE |
                     SEQUENCE | RES-NAME | RES-TYPE | CONTAINS >


   amino-acids

This is short-hand for the following selection expression:

   either type ALA, type SER, type THR, type GLY, type PRO, type CPR,
          type ASN, type GLN, type ASP, type GLU, type ARG, type LYS,
          type HIS, type PHE, type TYR, type TRP, type TRY,
          type VAL, type ILE, type LEU, type MET,
          type CYS, type CSH, type CYH or type CSM

All standard three-letter codes for amino acid residues are recognized, as well as some non-standard ones; CPR for cis-proline, TRY for tryptophan, and CSH, CYH and CSM for cysteine.

   waters

This is short-hand for the following selection expression:

   either type H2O, type HHO, type OHH, type HOH,
          type OH2, type SOL or type WAT

At least some of the commonly occurring residue type designations for water molecules are covered by this expression.

   MOLECULE = molecule molname.w

Selects the residues within the specified molecule. The molecule name is that given when the coordinate file was read. The name may contain wildcard characters (see below: Name comparisons).

   SEQUENCE = from resname.w to resname.w

Selects a stretch of residues between and including the given residues. If there actually are more than one stretch of residues in the read coordinate files that match, then all stretches are selected. The names may contain wildcard characters (see below: Name comparisons). Note that MolScript does NOT complain if the last residue name is not found. In such a case it simply selects all residues from the first given to the end.

For example, if a coordinate file contains amino acids from 1 to 100, and waters also numbered 1 to 57 (as may occur in PDB files), then a sequence selection expression 'from 5 to 15' will pick both the stretch of amino-acid residues from 5 to 15, and the waters from 5 to 15.

This is usually not a problem in connection with commands such as helix, strand and coil, since any selected non-amino acid residues are simply ignored by these. In fact, this behaviour can be advantageous when dealing with symmetrical subunits. The wildcard character facility can then be used to pick both strands (or whatever) in both chains with one single command.

   RES-NAME = residue resname.w

Selects the residue(s) with the given name. The name may contain wildcard characters (see below: Name comparisons). Note that the residue name is left- shifted and blanks squeezed out when the coordinate file is read (see above: Reading coordinates). This means that the chain identifier and insertion code (if any) are part of the residue name, even if they were separate in the input coordinate file.

   RES-TYPE = type restype.w

Selects the residues with the given type. The type may contain wildcard characters (see below: Name comparisons).

   CONTAINS = contains ATOM-SEL

Selects the residue(s) that contains the given atom(s).

Name comparisons

Comparisons between atom names, residue types and names, and molecule names with those read from the coordinate file follow certain rules:

The comparison is case-sensitive; 'Tyr' is not equal to 'TYR'.

All strings have been left shifted when read from the coordinate file. All blanks have been squeezed out of the strings.

It is possible to use wildcards in the comparison: '*' means any string (zero or more characters), '%' means any single character, '#' means any number (zero or more digits), and '+' means any single digit. Some examples:

   atom *           all atoms
   atom N*          all nitrogen atoms (and sodium, neon, niobium,...)
   atom %G*         all gamma (G) atoms; CG, OG, OG1, SG (and possibly others)
   type T*          residue types THR, TRP and TYR (and possibly others)
   type T%R         residue types THR and TYR

If the coordinate file contains '*' in atom names (nucleic acids in PDB files) then these are converted into single-quotes '''. If your coordinate file contains '*' in residue names or types, or '%', '#' or '+' characters anywhere, then you are in trouble; proper name comparison will be very difficult.

Graphics commands

The following commands actually produce objects that are visible on the plot (provided they are not clipped by window or slab). Aspects of their appearance are controlled by the current graphics state, i e the currently defined values of the various parameters (see below: The graphics state). The parameters affecting each command are listed.

The commands trace, coil, turn, helix and strand have some properties in common. Nothing will be output between CA atoms that are farther apart than 4.2 Angstrom. It is therefore possible to give a residue selection without bothering about explicitly handling chain breaks. Any selected non-amino acid type residues are simply ignored by these commands. The terminii of these graphical objects always end at the CA atom positions, to ensure continuity.

Note that none of these commands are mutually exclusive. It is quite possible to create both helix and strand for the same residues. Of course, the results will look ridiculous.

   BALL-AND-STICK = ball-and-stick ATOM-SEL [ ATOM-SEL ] ;

Parameters: atomradius, atomcolour, bonddistance, stickradius, sticktaper,
            planecolour, colourdepthcue, linewidth, linedash, linecolour,
            depthcue

If one atom selection is given, then small spheres are drawn for the selected atoms with a radius of 1/4 of the atom radius. Sticks are drawn between atoms that are closer to each other than the value of the parameter bonddistance.

If two atom selections are given, then no balls are drawn, only sticks connecting one atom in the first selection and another atom in the second selection.

The colour of the sticks is determined by the planecolour parameter. The colourdepthcue parameter applies to the colour, but the shading parameters do not. The line parameters apply to the outer line of the stick.

   BONDS = bonds ATOM-SEL [ ATOM-SEL ] ;

Parameters: bonddistance, linecolour, linewidth, linedash, depthcue

Bonds in the form of lines are drawn between the selected atoms, if they are closer to each other than the value of the parameter bonddistance. If one atom selection is given, then all bonds between atoms within this selection are drawn.

If two atom selections are given, then only bonds connecting one atom in the first selection and another atom in the second selection are drawn. This is useful when one single bond distance cutoff is insufficient to give the desired result (for example, the bonds to the Fe atom versus the cross-ring distances in the five-membered rings in heme groups).

   COIL = coil RES-SEL ;

Parameters: coilradius, smoothsteps, splinefactor, segments, shading,
            shadingexponent, planecolour, colourdepthcue, linewidth,
            linedash, linecolour, depthcue

Creates a smooth coil winding through CA atom positions. The coil goes through the exact CA positions only at the first and last atoms. At least two consecutive CA atoms are needed.

The CA coordinates are first copied and smoothed a number of iterations (parameter smoothsteps) using Priestle's method (Priestle 1988). Then a curve is drawn through the smoothed coordinates, and a coil is created. The radius of the coil is determined by a parameter. If slab has been explicitly set, then depth cueing applies to the coil radius. The curvature of the coil can be modulated with the parameter splinefactor. Shading, line and depth cue parameters apply as usual.

   CPK = cpk ATOM-SEL ;

Parameters: atomradius, atomcolour, linecolour, linewidth, linedash,
            colourdepthcue, depthcue

Spheres with the van der Waals radii of the atoms are drawn. The colours and radii are given default values when the coordinate file is read (see below: Graphics state parameters). The circle around the sphere is affected by the line parameters and by the depthcue parameter. The colourdepthcue parameter affects the sphere colour. The spheres are not actually spheres, but disks. They will therefore not overlap entirely correctly.

   HELIX = helix RES-SEL ;

Parameters: helixwidth, coilradius, segments, shading, shadingexponent,
            planecolour, plane2colour, colourdepthcue, linewidth, linedash,
            linecolour, depthcue, helixthickness

Creates a smooth helical ribbon through the CA atom positions. The helix width (which is a parameter) tapers off at the terminii to twice the coilradius. Since the helix passes through the CA atoms, the helix radius cannot be changed. The helix creation method is optimized for regular alpha-helix, but handles common irregularities like proline bends reasonably well. Only CA atom coordinates are used to compute the helix. At least three consecutive CA atoms are needed, otherwise no helix is created.

The colour of the outer surface of the helix is controlled by the planecolour parameter, while the inner surface uses the plane2colour parameter. Shading, line and depth cue parameters apply as usual. Note that the shading is performed as if the helix had been rotated into the plane of the paper.

The helixthickness parameter is relevant only for Raster3D output.

   LABEL = label < VECTOR | ATOM-SEL > string.s ;

Parameters: depthcue, linecolour, labelcentre, labelclip, labelmask,
            labeloffset, labelrotation, labelsize

Outputs a label at a specified position. If an atom selection is given, then the string will be output at the positions of all the given atoms. The string to output must be enclosed by double quotes.

The colour of the string characters is determined by linecolour. The size is determined by labelsize given in PostScript units. Depth cue applies; strings farther away from the viewer will be smaller. The parameter labelclip determines whether the strings will be hidden by graphical objects, or will appear no matter what (of course, window and slab always apply).

The position of the string can be fine-tuned by the labeloffset parameter. This vector is added to the position (explicit or that of the atoms) before the string is actually put into the view. The labelcentre parameter defines what point in the string is put at the coordinate; if switched on, the middle of the string is used, otherwise the lower-left point of the string is used. The orientation of the string is controlled by the labelrotation parameter.

The string itself is processed in several steps before output. If the variant with an atom selection is used, then special format codes in the string are replaced by data for each specified atom. The format codes are:

   %r   name of the residue the atom is in (6 characters)
   %t   type of the residue the atom is in (4 characters)
   %c   one-letter code for residue type, X if non-amino acid (1 character)
   %a   atom name (4 characters)

Each of these items can appear only once in the string. The character case of the items are taken as they are in the coordinate file (almost always uppercase). This, however, is affected by the label mask, see below. Any characters that are not format codes in the string will be part of the finally output string, even blanks. The percent character '%' cannot be output as such in label strings with the atom selection variant of the label command.

The string thus produced by replacing format codes (if any) is passed through a step that uses a mask to determine whether to change the case of some characters or not, and if they are to be output as Greek characters. The mask is set by the labelmask parameter, and consists of a string where a flag is set for each character position.

The available label mask flags are:

   r    Roman characters (default)
   g    Greek characters
  ' '   (blank) no change of character case (default)
   l    change character to lowercase
   u    change character to uppercase

The r and g flags are independent of the ' ', l and u flags. When setting the labelmask parameter, note that a character flag has to be in exactly the correct position in the string for correct processing. Note that to get the Greek characters normally used for amino-acid residue atoms, the characters will have to be changed to lowercase as well.

The final processing step consists of squeezing out blanks that were part of the residue name or type, or the atom name when format codes have been used. Note the order here: The label mask is applied first, and only afterwards are blanks removed. Otherwise it would be very difficult to predict the effect of the label mask.

Two examples of string processing:

   'LABEL'
   'lABEL'   after applying mask 'l'

   '%r.%a'         in an atom selection label command (dot means blank)
   '123....CA..'   after replacing the format codes for an atom
   '123....Ca..'   after applying mask '........l..'
   '123.Ca'        after final squeezing out of blanks from format code items

The fonts used in the output file are Times-Roman and Symbol.

No labels are output to a Raster3D file.

   LINE = line VECTOR { to VECTOR } ;

Parameters: linewidth, linedash, linecolour, depthcue

Draws lines from the first coordinate, to the next, and so on. Any number of coordinates can be given. Line and depth cue parameters apply as usual. Note that the coordinates are given in the fixed coordinate system, which do not transform with atom coordinates.

   STRAND = strand RES-SEL ;

Parameters: strandthickness, strandwidth, smoothsteps, segments, shading,
            shadingexponent, planecolour, plane2colour, colourdepthcue,
            linewidth, linedash, linecolour, depthcue

Creates an arrow that winds through the smoothed CA positions of the selected residues. The smoothing is performed in the same way as for coil, and the parameter smoothsteps applies. Only the CA atom coordinates are used for computing strand position and normal vectors. At least three consecutive CA atoms are needed, otherwise no strand is created.

The strand will not go through the CA positions except at the ends. The segments parameter applies, but the actual number of segments created will be (segments / 2) + 1. The reason for this is that the curvature for strands is less than for helices, so fewer segments are needed. The strand thickness and width are determined by parameters.

The colour of the strand width surface is determined by planecolour, while the strand thickness surface is controlled by the plane2colour parameter. Shading, line and depth cue parameters apply as usual.

   TRACE = trace RES-SEL ;

Parameters: linecolour, linewidth, linedash, depthcue

Draws lines between consecutive CA atoms in the chain of selected residues.

   TURN = turn RES-SEL ;

Parameters: coilradius, splinefactor, segments, shading, shadingexponent,
            planecolour, colourdepthcue, linewidth, linedash, linecolour,
            depthcue

Same as coil, but no smoothing of the coordinates is performed. The resulting curve will therefore go through all CA atom positions.

The graphics state

There are a number of parameters, collectively termed the graphics state, that control various aspects of the graphical objects. When a graphical object, such as a helix, is created, the coordinates and appearance of it are determined by the graphics state and the data at that stage in the processing of the input file. If the helix width is changed at some later point in the input file, then this will affect only helices created later, not those that have already been made.

All parameters work in this way. This means that different objects in the same plot can have independent values for colour, size, shading, and so on.

Syntax: SET = set PARAMETER { , PARAMETER } ;

          PARAMETER = < ATOMCOLOUR | ATOMRADIUS | BONDDISTANCE | COILRADIUS |
                        COLOURDEPTHCUE | DEPTHCUE | HELIXTHICKNESS |
                        HELIXWIDTH | LABELCENTRE | LABELCLIP | LABELMASK |
                        LABELOFFSET | LABELROTATION | LABELSIZE | LINECOLOUR |
                        LINEDASH | LINEWIDTH | PLANECOLOUR | PLANE2COLOUR |
                        SEGMENTS | SHADING | SHADINGEXPONENT | SMOOTHSTEPS |
                        SPLINEFACTOR | STICKRADIUS | STICKTAPER |
                        STRANDTHICKNESS | STRANDWIDTH >

The set command is used to change the graphics state. Each set command defines a new graphics state. Any number of parameters can be changed in one single set command. Note the ',' (comma) between each parameter specification.

The number of graphics states that can be defined within one plot is limited (but high; this is determined in the molscript.dim file). Therefore, it is a wise strategy to group as many parameter changes into as few 'set' commands as possible.

Graphics state parameters

   ATOMCOLOUR = atomcolour ATOM-SEL COLOUR

Defaults: atom C* grey 0.2, atom N* blue, atom O* red, atom H* white,
          atom S* yellow, atom P* purple, others grey 0.8

Applies to: cpk, ball-and-stick

Change colour for the specified atoms. Each atom can in principle be given its own colour.

   ATOMRADIUS = atomradius ATOM-SEL radius.r

Defaults: atom C* 1.7, atom N* 1.7, atom O* 1.35, atom H* 1.0, atom S* 1.8,
          atom P* 1.8, others 1.7

Applies to: cpk, ball-and-stick

Change radius for the specified atoms. Each atom can in principle be given its own radius. Radii are used directly for cpk, while balls in ball-and-stick use the radii after dividing by 4. The unit is Angstrom.

   BONDDISTANCE = bonddistance distance.r

Default: 1.9

Applies to: bonds, ball-and-stick

Change maximum bond distance. Atoms closer than this will have bonds or sticks drawn between them, otherwise not. The unit is Angstrom.

   COILRADIUS = coilradius radius.r

Default: 0.2

Applies to: coil, turn, helix

Change the radius of coil and turn segments. The width of helices at the terminii is twice the value of this parameter. The unit is Angstrom.

   COLOURDEPTHCUE = colourdepthcue factor.r

Default: 0.0  (i e switched off)

Applies to: coil, turn, helix, strand, cpk, ball-and-stick

Change the degree of colour depth cueing. Colours will be mixed with black according to distance from the viewer. Valid values are in the range [-1.0, 1.0]. A positive value means that objects farther away are made progressively more black, while a negative value means that objects closer to the viewer are made more black. A value of 1.0 (-1.0) makes the farthest (nearest) object completely black. Only plane, sphere and stick colours are affected, not line colour. This effect is combined with shading.

   DEPTHCUE = depthcue factor.f

Default: 0.75

Applies to: line, bonds, label, trace, coil, turn, helix, strand, cpk,
            ball-and-stick

Change the degree of general depth cueing. Line widths (including lines drawn at edges of plane segments and spheres), character sizes and coil and turn segment radii will be reduced the farther away they are from the viewer. The factor ranges from 0.0 (no effect) to 1.0 (full effect). Coil and turn radii are affected only if slab has been set explicitly.

The default value is appropriate for line width and coil and turn radii, but generally too high for character size.

   HELIXTHICKNESS  = helixthickness thick.r

Default: 0.3

Applies to: helix (only for Raster3D)

Change the thickness of the helix ribbon. The unit is Angstrom. This parameter has an effect only for the Raster3D output file; helix ribbons in the PostScript output file are always of zero thickness.

   HELIXWIDTH = helixwidth width.r

Default: 2.4

Applies to: helix

Change the width of the helix ribbon. The unit is Angstrom. The width of the helix at the terminii is twice the value of the parameter coilradius.

   LABELCENTRE = labelcentre < on off >

Default: on

Applies to: label

Switch on or off the centering of labels at the position given.

   LABELCLIP = labelclip < on off >

Default: off

Applies to: label

Switches on or off clipping of labels against other graphical segments.

   LABELMASK = labelmask maskstring.s

Default: "rrrrr..."  (Roman characters)
         "     ..."  (no change of character case)

Applies to: label

Sets the mask to be applied to label strings before outputting. This is described above: Graphics commands, label.

   LABELOFFSET = labeloffset VECTOR

Default: 0.0 0.0 0.0

Applies to: label

Sets the translation vector to be added to the position given for labels before placing them in the coordinate system. The unit is Angstrom.

   LABELROTATION = labelrotation < on off >

Default: off

Applies to: label

Switch on or off rotation of labels 90 degrees.

   LABELSIZE = labelsize size.r

Default: 20.0

Applies to: label

Change the size, in PostScript units, of label characters. The actual character size of a label also depends on the depth cue.

   LINECOLOUR = linecolour COLOUR

Default: black

Applies to: line, bonds, label, trace, coil, turn, helix, strand,
            ball-and-stick, cpk

Change the colour of lines and labels. The lines at the edges of planes and spheres are also affected.

   LINEDASH = linedash length.r

Default: 0.0

Applies to: line, bonds, trace, coil, turn, helix, strand, ball-and-stick, cpk

Change the dashing of lines. A value of 0.0 gives solid lines, otherwise the value gives the length of each dash in PostScript units. A value of 6.0 gives reasonable results.

   LINEWIDTH = linewidth width.r

Default: 1.0

Applies to: line, bonds, trace, coil, turn, helix, strand, ball-and-stick, cpk

Change the width of lines, in PostScript units. The lines at the edges of planes and spheres are also affected.

The linewidth parameter value is divided by 25 to give the radius of the lines in Raster3D files in Angstrom units. This corresponds very roughly to the width of lines in a PostScript plot of a small protein.

   PLANECOLOUR = planecolour COLOUR

Default: white

Applies to: helix, strand, coil, turn, ball-and-stick

Change the colour of the main planes: these are the outer planes of helices, the flat planes of strands, the planes of coil and turn and also the area of sticks (in ball-and-stick).

   PLANE2COLOUR = plane2colour COLOUR

Default: grey 0.5

Applies to: helix, strand

Change the colour of the secondary planes: these are the inner planes of helices and the thickness planes of strands.

   SEGMENTS = segments number.i

Default: 6

Applies to: helix, strand, coil, turn

Change the number of graphical segments to create between each CA atom point. More segments give smoother graphical objects, at the price of larger PostScript files that take longer to display or print. For strands, the number of segments between each CA atom point is actually (number / 2) + 1, where the division is integer division. The default value is appropriate for medium- sized proteins, and should be decreased for large proteins, and increased for small proteins or close-up views.

   SHADING = shading factor.r

Default: 0.5

Applies to: helix, strand, coil, turn

Change the amount of shading effect to mix in with the plane colour. Valid values are in the range [0.0, 1.0]. A value of 0.0 disables shading, while a value of 1.0 makes planes whose normals are nearly parallel to the paper completely black. When plotting large proteins, it is usually a good idea to switch off shading altogether; this reduces the clutter of the picture.

   SHADINGEXPONENT = shadingexponent exponent.r

Default: 1.5

Applies to: helix, strand, coil, turn

Change the exponent of the shading function relating the plane normal to the amount of shading. A large exponent makes the shading strong already at a small angle between viewer and plane normal, while a small exponent makes the shading appear only at a large angle. The value must be larger than 0.0.

   SMOOTHSTEPS = smoothsteps number.i

Default: 2

Applies to: strand, coil

Change the number of Priestle smoothing steps (Priestle 1988) to apply to coil and strand CA atom coordinates before using them to create coil and strand graphical objects. The value must be larger than 0. The command turn would be the same as coil if this parameter were to have the value 0.

   SPLINEFACTOR = splinefactor factor.r

Default: 1.0

Applies to: coil, turn

Change the vector length factor for creating the Hermite spline curves that are used for coil and turn. A value of 0.0 gives straight lines between each control point, while larger values give progressively stronger curvature.

   STICKRADIUS = stickradius radius.r

Default: 0.2

Applies to: ball-and-stick

Change the radius of sticks. The unit is Angstrom.

   STICKTAPER = sticktaper factor.r

Default: 0.75

Applies to: ball-and-stick

Change the amount of tapering effect when a stick points away from the viewer. A too large value can give undesirable pseudo-depth cue effects. Valid values are in the range [0.0, 1.0], where 0.0 is no effect, and 1.0 is full effect.

STRANDTHICKNESS = strandthickness thickness.r Default: 0.6 Applies to: strand Change the thickness of strands. The thickness must be larger than 0.01. The unit is Angstrom.

   STRANDWIDTH = strandwidth width.r

Default: 2.0

Applies to: strand

Change the width of strands. The width must be larger than 0.02. The unit is Angstrom.

Colour specifications

Colours can be specified by a name (a total of 8 defined), by greyscale, by the RGB system, or by the HSB system.

The RGB system uses the basic colours red, green and blue. From three real values in the range [0.0, 1.0], a colour is defined by mixing the basic colours in these amounts. Thus, '0.0 1.0 0.0' gives pure green, and '0.3 0.3 0.0' gives yellow of low intensity. Note that there are no comma characters ',' between the values.

The HSB system is based on a 'colour wheel', similar to that used by the Evans & Sutherland PS300 system. Three real values in the range [0.0, 1.0] specify a colour. The first value defines hue (0.0 = red, 0.2 yellow, 0.333 = green, 0.6666 = blue, 0.8 = violet, 1.0 = red again), the second saturation (the whiteness of the colour: 0.0 = completely white, 1.0 = pure colour), and the third is brightness (0.0 = black, 1.0 = full intensity).

The greyscale takes one real value in the range [0.0, 1.0], where 0.0 = black, and 1.0 = white.

The defined colour names are: black, white, red, yellow, green, cyan, blue and purple. These are simply abbreviations for RGB specifications.

Note that the actual colours that are displayed depend on the hardware. Different screens can give slightly different colours, even though the specification is exactly the same. Ordinary laser printers convert colours into greyscale. The exact mapping of display colour (or greyscale) to hardcopy colour (greyscale) is machine-dependent. A colour that looks good on one machine may be bad on another. You will have try it out for yourself.

A rule-of-thumb is that if you know that the plot is going to be output on a greyscale laser printer, then avoid colours, since they will generally be rendered too dark on such a machine. Instead use explicit greyscale specifications, often rather light ones.

The item 'gray' may be used instead of 'grey', for reasons of backwards compatibility (i e, correcting this mistake properly would upset some users).

Syntax: COLOUR = < GREY RGB HSB black white red yellow green cyan blue purple >

          GREY = grey value.r

          RGB = rgb red.r green.r blue.r

          HSB = hsb hue.r saturation.r brightness.r

Vector specifications

A vector (coordinate) can be specified by explicit xyz values, which are written without separating commas, or by specifying an atom selection after the item 'position'. This will compute the geometrical centre-of-gravity of the atoms as the vector value. Note that in transform commands, this is done before any of the operations have been applied.

Syntax: VECTOR = < XYZ | POSITION >

          XYZ = x.r y.r z.r

          POSITION = position ATOM-SEL

Common problems

The following is a list of common problems, and their remedies.

   'item does not match syntax'

Check that there is a semi-colon ';' after the previous command. If there is, then look for some other syntactical error close to where the failure occurs in the input file.

   'identifier or word is reserved word'

You are trying to use a command word as, for example, molecule name. Or maybe you have forgotten some argument to a command.

   'no space for xxx; MAXyyy'

There is too little space for some kind of graphical segment. Either simplify your plot (omit parts, or reduce the segments parameter), or ask your system manager to change the relevant program dimension parameter (given by the MAXyyy part in the error message) and recompile the program.

   'unexpected end-of-file in input'

Check that there is a after the 'end_plot' item. At least on the SGI IRIS-4D, the last line must end with a , otherwise it will be ignored when is reached.

   weird helices or strands

If you use a too generous definition of helices and strand, then most likely there will be strange effects at the terminii. To reduce or remove the problem, just shorten the secondary structure elements. Usually the output from DSSP (Kabsch & Sander 1983) can be used to define helices and strands that look fine. The algorithm for helix graphical object generation assumes reasonably good alpha-helical geometry, but can usually handle deviations like proline bends gracefully. Of course, it cannot take care of all kinds of irregularities.

   difficulty to position label strings

Since parameters such as labelcentre and labeloffset are part of the graphics state, and hence persistent, it is a common mistake to set an offset for a particular label, but forgetting that all labels after that in the input file will also be affected. Just change the offset or centering parameters.

   coils/turns not connected to helices/strands

Most likely due to not giving the proper residue selection for the relevant commands. If residues 1 to 4 and 7 to 10 are strands, then the turn should be given from 4 to 7, not from 5 to 6. The graphical objects begin and end at the given residues.

   coil radius is not depth-cued

Coil radius is depth-cued only when 'slab' is set explicitly in the input file.

Syntax specification

FILE_CONTENTS = [ < ENCAPSULATED | raster3d > ]  { PLOT }

  ENCAPSULATED = encapsulated xlowerleft.r ylowerleft.r
                              xupperright.r yupperright.r

PLOT = { MACRO-DEF } plot HEADER { COMMAND } end_plot

HEADER = [ AREA ] [ BACKGROUND ] [ WINDOW ] [ SLAB ]

  AREA = area xlowerleft.r ylowerleft.r xupperright.r yupperright.r ;

  BACKGROUND = background COLOUR ;

  WINDOW = window sidelength.r ;

  SLAB = slab thickness.r ;

COMMAND = < READ | COPY | DELETE | TRANSFORM | STORE-MATRIX | CPK | BONDS |
            BALL-AND-STICK | TRACE | COIL | TURN | HELIX | STRAND | LINE |
            LABEL | SET | MACRO-DEF >

  READ = read molname.w filename.s ;

  COPY = copy newmolnamew.w ATOM-SEL ;

  DELETE = delete molname.w ;

  TRANSFORM = transform ATOM-SEL { by OPERATION } ;

    OPERATION = < CENTRE | TRANSLATION | ROTATION | recall-matrix >

      CENTRE = centre VECTOR

      TRANSLATION = translation VECTOR

      ROTATION = rotation < AXIS | MATRIX >

        AXIS = < x | y | z > degrees.r

        MATRIX = x1.r y1.r z1.r x2.r y2.r z2.r x3.r y3.r z3.r

  STORE-MATRIX = store-matrix ;

  CPK = cpk ATOM-SEL ;

  BONDS = bonds ATOM-SEL [ ATOM-SEL ] ;

  BALL-AND-STICK = ball-and-stick ATOM-SEL [ ATOM-SEL ] ;

  TRACE = trace RES-SEL ;

  COIL = coil RES-SEL ;

  TURN = turn RES-SEL ;

  HELIX = helix RES-SEL ;

  STRAND = strand RES-SEL ;

  LINE = line VECTOR { to VECTOR } ;

  LABEL = label < VECTOR | ATOM-SEL > string.s ;

  SET = set PARAMETER { , PARAMETER } ;

    PARAMETER = < ATOMCOLOUR | ATOMRADIUS | BONDDISTANCE | COILRADIUS |
                  COLOURDEPTHCUE | DEPTHCUE | HELIXTHICKNESS | HELIXWIDTH |
                  LABELCENTRE | LABELCLIP | LABELMASK | LABELOFFSET |
                  LABELROTATION | LABELSIZE | LINECOLOUR | LINEDASH |
                  LINEWIDTH | PLANECOLOUR | PLANE2COLOUR | SEGMENTS | SHADING |
                  SHADINGEXPONENT | SMOOTHSTEPS | SPLINEFACTOR | STICKRADIUS |
                  STICKTAPER | STRANDTHICKNESS | STRANDWIDTH >

      (the parameter items are defined under the heading 'Parameters')


  MACRO-DEF = macro macroname.w { whatever } end_macro

  VECTOR = < XYZ | POSITION >

    XYZ = x.r y.r z.r

    POSITION = position ATOM-SEL


  ATOM-SEL = [ not ] < ATOM-AND | ATOM-OR | ATOM-SPEC >

    ATOM-AND = require ATOM-SEL { , ATOM-SEL } and ATOM-SEL

    ATOM-OR = either ATOM-SEL { , ATOM-SEL } or ATOM-SEL

    ATOM-SPEC = < backbone | ATOM-NAME | B-FACTOR |
                  OCCUPANCY | IN | SPHERE | CLOSE >

      ATOM-NAME = atom name.w

      B-FACTOR = b-factor low.r high.r

      OCCUPANCY = occupancy low.r high.r

      IN = in RES-SEL

      SPHERE = sphere VECTOR radius.r

      CLOSE = close ATOM-SEL distance.r


  RES-SEL = [ not ] < RES-AND | RES-OR | RES-SPEC >

    RES-AND = require RES-SEL { , RES-SEL } and RES-SEL

    RES-OR = either RES-SEL { , RES-SEL } or RES-SEL

    RES-SPEC = < amino-acids | waters | MOLECULE |
                 SEQUENCE | RES-NAME | RES-TYPE | CONTAINS >

      MOLECULE = molecule molname.w

      SEQUENCE = from resname.w to resname.w

      RES-NAME = residue resname.w

      RES-TYPE = type restype.w

      CONTAINS = contains ATOM-SEL


  COLOUR = < GREY RGB HSB black white red yellow green cyan blue purple >

    GREY = grey value.r

    RGB = rgb red.r green.r blue.r

    HSB = hsb hue.r saturation.r brightness.r

Known bugs

In general, the simple hidden-surface removal algorithm relies on segments being small, and not overlapping severely. If unreasonable values for, say, helix width or atom radii are given, then the algorithm breaks down, and very strange results are obtained.

If slab is set explicitly, then clipping is performed, but only on a segment- by-segment basis. A graphical segment will be fully visible, or not at all.

The strands sometimes show defects at twist points, and when viewed directly from the ends. If this is a real problem in your case, then either increase the segment number (for example from 6 to 10) for that strand, or rotate the atoms slightly around x or y. This should take care of the problem.

Lines delimiting strands can sometimes appear jagged, especially when a thick linewidth is used, and close to strand twist points. This bug is due to the simple hidden-surface removal algorithm, and there is no simple way to eliminate this effect.

In the CPK models, the spheres do not overlap properly in PostScript plots. They are simply created as flat disks and then depth-sorted.

When using a mixture of representations, such as helices and ball-and-stick, then some graphical segments are occasionally clipped incorrectly. This is due to the simple hidden-surface removal algorithm. In a few cases, this can be remedied by changing orientation slightly.

The depth cue colour shading generally does not produce very good results. That is why it is turned off by default.

The graphics commands bonds and ball-and-stick are inefficient in execution time, especially in the case of two atom selections given when the first selection contains many atoms. Therefore, put the selection with fewest atoms as the first one.

The input procedures for atomic coordinate file formats CDS and WAH have not been properly tested, and may contain bugs.

No schematic representations for nucleic acids (Lesk & Lesk 1989) have been implemented in this version.

There should be an option to produce files of graphical objects in some general format for other rendering programs.

The Encapsulated PostScript produced by version 1.1 was flawed. That particular problem has now been fixed, but unfortunately has not been checked thoroughly. Other bugs may still lurk here.

The input line counter is not incremented when the 'inline-PDB' facility is used. This means that line numbers reported for errors will be incorrect when the error occurs after a 'inline-PDB' statement has been executed.

Labels cannot be output to a Raster3D file.

About the program

MolScript was conceived in the tradition of molecular graphics of Jane Richardson, Arthur Lesk & Karl Hardman, and John Priestle. The aim was to have the ability to produce nice schematic drawings of proteins as well as detailed views of specific residues. Of course, the plots should be better than those made either by drawing by hand, or with previously available programs. The author believes that this aim has been reached.

The idea was to use PostScript as the target graphics code, and to use the facilities of PostScript to get as powerful a program as possible with relatively little work. The main trick is to use the painter's metafor for hidden-surface removal: just depth sort the graphical objects, and output the farthest objects first. PostScript then takes care of hidden-surface removal, since the definition for it states that marks applied to the surface (lines or area fill colour) obliterate whatever was present at that point previously. An unfortunate side effect of this is that it is practically impossible to implement translucent surfaces within this framework.

Good shading effects were deemed essential. Effort has been put into making the shading produce good results, and allowing the user to control it properly. The graphical objects helix, strand and coil were made to look as much like Jane Richardson's drawings as possible.

A property of MolScript that the author thinks is a feature, not a bug, is that irregularities in helices and strands are still visible to some degree in the schematic drawings. Especially helices manage to convey bends and other irregularities without turning ugly. Since only CA-coordinates are used for creating the schematic drawings, even poorly refined structures can give reasonable-looking plots.

The spline function used to create segment coordinates for various objects (coil, turn, strand, helix) is the Hermite spline (Foley & Van Dam 1982). This is based on two control points, which the curve passes through, and directional vectors at at these points. The advantage of this spline is that it passes through all its control points. However, it requires direction vectors of defined lengths. In MolScript, such vectors are computed in different ways depending on the graphical object.

The fixed coordinate system simplifies a number of computational aspects. Due to the advanced coordinate transformation facility, the fixed coordinate system is not a limitation. On the contrary, it is actually easier to get good views in this way.

Orthogonal projection is used for the object-to-view mapping. Perspective projection was not considered worth the extra programming effort. A special stereo command was considered, but was found to violate too many implicit design rules. Believe it or not, but the program structure simply does not allow easy implementation of an automatic stereo option.

The design of the interface aims at logical behaviour, and a clear underlying functional model. Once the user understand this model, it should be easy to predict the effect of various commands, even in very complicated cases.

In effect, MolScript is a compiler, which takes the input file as source code that is compiled into a plot in PostScript form. Hence, the input file parsing is done with a general package that implements well-known principles for basic syntax definition and analysis (Wirth 1976).

The implementation uses a number of general procedure packages accumulated by the author during a number of years. This was in fact a necessary (but not sufficient) requirement for the successful completion of this project.

The interface to Raster3D has been made such that the same input file should produce similar results when the output is a Raster3D picture as when a PostScript picture is made. However, there are some differences.

Acknowledgements

I thank T. Alwyn Jones, Mats Kihlen, Ylva Lindqvist, Erling Wikman, Hans Eklund, Carl-Ivar Branden and others at the Department of Molecular Biology, BMC, Uppsala University, for support, ideas and comments. This work was supported in part by Nordisk Industrifond and The Swedish Natural Science Research Council (NFR).

The following have contributed bug discoveries, fixes and other suggestions for the post-v1.1 versions of the program:

Eric Fauman, Michael Sutcliffe, Paul McLaughlin, Leo Caves, Arne Elofsson. Ethan A. Merritt wrote the interface to Raster3D v2.0.

References

Adobe Systems Inc, "PostScript Language, Reference Manual", Addison-Wesley,
  Reading, Massachusetts, 1985.

Adobe Systems Inc, "PostScript Language Reference Manual, Second Edition",
  Addison-Wesely, Reading, Massachusetts, 1990.

D J Bacon & W F Anderson, J Mol Graph (1988) v 6, pp 219-220.

A T Brunger, "X-PLOR, version 1.5 Manual" Yale University, New Haven,
  Connecticut, 1988.

J D Foley & A Van Dam, "Fundamentals of Interactive Computer Graphics",
  Addison-Wesley, 1982.

W Kabsch & C Sander, Biopolymers (1983) v 22, pp 2577-2637.

A M Lesk & K D Hardman, Science (1982) v 216, pp 539-540.

A M Lesk & K D Hardman, Methods in Enzymology (1985) v 115, pp 381-390.

V I Lesk & A M Lesk, J Appl Cryst (1989) v 22, pp 569-571.

W M Newman & R F Sproull, "Principles of Interactive Computer Graphics",
  McGraw-Hill, 1979.

E A Merritt & M Murphy (in preparation).

J P Priestle, J Appl Cryst (1988) v 21, pp 572-576.

J S Richardson, Adv Prot Chem (1981), v 34, pp 167-339.

J S Richardson, Methods in Enzymology (1985) v 115, pp 359-380.

N Wirth, "Algorithms + Data Structure = Programs", Prentice-Hall, 1976.

MolScript v1.4 (C) 1993

by Per Kraulis

Click here for some Examples

Written by:

In publications, refer to:

Copyright, distribution and license

The address is:

Contents:

Distribution

Introduction

Version history

Installing the program

Program dimensions

Program invokation

Command line options

How to use MolScript

A minimal input file

Example files

Input file syntax description

Input stream files

Macros

One input file - one paper - several plots

Encapsulated PostScript

raster3d

Plot header

Stereo plots

Reading coordinates

The coordinate system

Transforming coordinates

Copying atoms into a new molecule

Deleting a molecule

Selection mechanism

Atom selections

Residue selections

Name comparisons

Graphics commands

The graphics state

Graphics state parameters

Colour specifications

Vector specifications

Common problems

Syntax specification

Known bugs

About the program

Acknowledgements

References