dnacheck

-- only available on UNIX systems on which Midas is installed --

-- not yet implemented in Chimera --

NAME

dnacheck - regenerate DNA files to match Protein Data Bank specifications

SYNOPSIS

dnacheck [ -c file ] [ -h ] [ -r ] [ -A ] [ -C directory ] [ -D ] [ -C ] [ -I ] [ PDB_file [ output_file ] ]

DESCRIPTION

Dnacheck reads a Protein Data Bank (PDB) file containing DNA atomic coordinates and creates a file that matches the Protein Data Bank format for nucleotides. Dnacheck corrects three types of problems:

Residue and atom name mapping. PDB specifies a set of residue and atom names for nucleotides. Frequently, data files not from the PDB use slightly different naming conventions. Dnacheck knows about some of these conventions and will convert names to the PDB standard set.
AMBER output. AMBER's older force field (Weiner et al.) models nucleotides as separate phosphate and sugar-base residues. Dnacheck recombines the sugar-base residues with the phosphate residues to form complete PDB nucleotide residues.
Reversed direction. PDB format requires that DNA strands be specified starting from the 5' end. Some files have strands that start from the 3' end. Dnacheck reverses the order of the residues in these strands.

Dnacheck does not, unfortunately, correct RNA file deficiencies.

CONFIGURATION FILES AND BLUEPRINTS

Dnacheck works in four steps:

reads a configuration file describing a list of blueprint files and some naming conventions associated with the blueprints, and then reads the blueprints
applies the blueprints to the input file to take care of type (1) problems
applies simple heuristics to take care of type (2) problems
takes care of type (3) problems by applying simple heuristics to the first two residues of a chain to determine whether strand-reversal is necessary

Blueprints are simply PDB-format files each containing the ATOM or HETATM records of a single residue, followed by CONECT records. The CONECT records must be present, and may be generated using the pdbrun command.

The format of the configuration file is most easily explained by example, in this case part of the default configuration file:

blueprint T synonym THY THE alias C5M C7 C5A alias O4* O1* alias O1P OA alias O2P OB

The first line states that there is a blueprint that should be applied to residues named "T." (By convention, the file name of the blueprint is the same as the name of the residue to which it applies.) The second line states that this blueprint should also be applied to residues named "THY" or "THE." The third line states that atoms named "C7" or "C5A" should be renamed "C5M." The fourth through sixth lines add similar name translations.

The configuration file consists of a series of these blueprint descriptions. Residues in the input file matching a blueprint are altered to have the same residue type as the blueprint name, atoms in the residues are translated if they match one of the aliases, and atom record types (ATOM vs. HETATM) are modified to match those in the blueprint. If there are residues in the input file that do not match any blueprints, they are left unmodified.

COMMAND-LINE ARGUMENTS

-c config-file
Specify the name of the configuration file. The default config-file is config. Config-file must be in either the current directory or the default directory (see -C).

-h
Convert any residue which is not connected to other residues into a "hetero-residue" (i.e., change the PDB records for the atoms in the residue from type ATOM to type HETATM). Normally, dnacheck retains the record type of atoms from the input file. This option is most useful when there are many unconnected residues, such as waters, which are of type ATOM in the input file, but should actually be of type HETATM.

-r
Renumber the residues. Normally, dnacheck retains the sequence numbers, insertion codes, and chain identifiers of residues from the input file. This option makes dnacheck renumber the residues, making all insertion codes into the space character. Hetero-residues are given consecutive sequence numbers starting from 1. The other residues are split into chains, with residues in each chain given consecutive sequence numbers starting from 1. If there is only one chain in the file, the chain identifier is set to the space character; otherwise, the chain identifiers are set to consecutive alphabetic characters starting with "A."

-A
Do not fix type (2) problems (see DESCRIPTION).

-C directory
Set the default directory for configuration files and blueprints to directory. Normally, the default directory is /usr/local/chimera/share/dnacheck. Dnacheck looks for configuration files and blueprints first in the current directory, then in the default directory. Even with this option set, dnacheck will still search the current directory first.

-D
Do not fix type (3) problems (see DESCRIPTION).

-I
Do not fix type (1) problems (see DESCRIPTION).

PDB_file
The input Protein Data Bank (PDB) file may contain any legal PDB records. Only ATOM records will be used. All others are silently discarded. If PDB_file is not given or is -, the data is read from standard input.

output_file
The output is a set of PDB-format records. If output_file is not given or is -, the records are written to standard output.

LIMITATIONS

Only DNA structures are handled; RNA structures are not. If someone were to develop RNA blueprints (and test them using the -C option), we would be willing to redistribute them.

AUTHOR

Conrad Huang
Computer Graphics Laboratory
University of California, San Francisco