Chimera Data Representation and File Formats

Tom Goddard
August 17, 2012

Here we summarize the kinds of data Chimera operates on.

The Chimera file formats page lists about 70 file formats read by Chimera. Details of some of the data representations are described in the Chimera Programmer's Guide.

Molecules with Atoms

Chimera represents molecules following the Protein Data Bank representation. A PDB entry given by a 4-letter ID code corresponds to a Chimera Molecule. This is not great terminology since it is sometimes a large molecular assembly such as a whole ribosome, and not a single molecule. Chimera has Chains, Residues, Atoms using the same conventions as the PDB. Chimera explicitly represents Bonds between atoms while the PDB mostly implicitly handles bonds using predefined residue topologies.

NMR ensembles are treated as separate molecules for each conformation.

Arbitrary attributes (numbers and text) can be assigned to atoms, bonds, residues and molecules such as atom charge, residue conservation or SNP mutation sites, using the define attribute capability. Standard attributes are read from PDB files: bfactor, occupancy, and anisotropic b-factor.

Molecular Dynamics Trajectories

Dynamics trajectories maintain atom coordinates for many time points. The set of atoms cannot vary over the trajectory. Trajectories are shown and analyzed with the Chimera MD Movie tool.

Molecular Assemblies

Symmetric molecular assemblies such as virus capsids are described using an asymmetric unit atomic model (e.g. PDB model) and positioning matrices to place copies of the asymmetric unit. Copies can be placed using a 3 by 4 rotation and translation matrix, for example, from the REMARK 350 BIOMT records of PDB files using the Chimera Multiscale Models tool or the sym command. Standard symmetries (cyclic, dihedral, helical, icosahedral, crystal space groups, ...) are supported and matrices are generated to represent those symetries.

Sequences

Sequences and multiple sequence alignments are supported by the Chimera MultAlign Viewer with standard handling deletions and insertions and attributes such as conservation at each residue postion for an alignment. Sequences are associated with specific molecule chain atomic models and allows approximate sequence matches.

Density Maps

Density maps and other 3-dimensional numeric scalar arrays such as electrostatic potential are handled by the Chimera Volume Viewer tool. Scalar values can be 8, 16, 32, 64 signed and unsigned integers, or 32 or 64 bit floating point values. The 3-dimensional grid is mapped to physical coordinates (e.g. Angstroms) by an arbitrary linear transform and translation (3 by 4 matrix) allowing skewing for crystallographic density.

Masks defining regions of a 3-dimensional density map are supported by the Chimera Segment Map tool. Masks are 3-d integer arrays where the value at each grid point is an id number for the region that grid point belongs to.

Map time series are handled by the Volume Series tool with separate time points handled as independent maps which must have identical sizes and transform matrices.

Geometric Objects

Chimera represents coarse models as spheres connected by cylinders using the Volume Tracer tool. Individual spheres could represent fluorescently labeled segments along a stretch of DNA. The could represent ribosomes in a cell (just spheres without cylinders). An XML file format called Chimera marker model format was developed 8 years ago for these models. Markers (spheres) and links (cylinders) have radii, colors and arbitrary user assigned attribute values.

Surfaces represented as sets of triangles are supported and commonly used to depict solvent-excluded molecular surfaces, density map contour surfaces, and surfaces spanning usually hand-traced curves as well as segmentation surfaces traced in another program IMOD. Solvent-excluded surfaces associate an atom with each surface vertex. Surfaces have normal vectors and per-vertex colors and arbitrary sets of vertices or triangles can be hidden. A surface consists of an arbitrary number of pieces which are independently selectable. A surface piece can consist of disconnected patches. Surfaces can be closed or have edges.

Standard geometric shapes such as spheres, cylinders, boxes, cones, tubes, icosahedra are represented as surfaces using the shape command. Geometric shapes can also be drawn with the Chimera BILD language including lines, arrows, polygons, labels, spheres, .... An example use is to represent alpha helices identified in electron microscopy as cylinders. They are also used as coarse representations of atomic models using generated by the Axes, Planes and Centroids tool.

Renderings of all Chimera 3-d objects can be exported in several geometric formats such as X3D, VRML, OBJ, STL, POV-Ray, WebGL typically for transfer to animation software, 3-d printers and web display.

Geometric shape objects are in general not connected with molecular data (atoms, residues, molecules).

Depiction Data

Depiction data is a key type of data created in Chimera and includes colors, display styles, camera view points, and visibility information. For example the depiction of an enzyme binding site with only conserved residues shown in atomic detail with color coding about residue function. Depiction data saved in Chimera session files.