# Import Chimera modules used in this example.
import chimera
# First, we'll open up a model to work with. This molecule (4fun) is very small,
# comprised of just a couple residues, but it is perfect for illustrating
# some important components of Chimera's object model.
# For more information on how to open/close models in Chimera, see the
# "Basic Model Manipulation" Example in the Programmer's Guide (coming soon). For now,
# just understand that this code opens up any molecules stored in the file
# '4fun.pdb' and returns a list of references to opened models.
opened = chimera.openModels.open('4fun.pdb')
# Because only one molecule was opened, 'opened' is a list with just one element.
# Get a reference to that element (which is a 'Molecule'
# instance) and store it in 'mol'
mol = opened[0]
# Now that we have a molecule to work with, an excellent way of examining its data structures is to flatten it out and write
# it to a file. We'll write this file in the 'mol2' format, a free-format ascii file that describes molecular structure.
# It is not necessary to have any prior knowledge of the 'mol2' format to understand this example, just a basic
# comprehension of file formats that use coordinate data. Check out the finished product.
# It should serve as a good reference while you're going through the example.
# Get a reference to a file to write to:
f = open("4fun.mol2", 'w')
# mol2 uses a series of Record Type Indicators (RTI), that indicate the type of structure that will be described in
# the following lines.
# An RTI is simply an ASCII string which starts with an asterisk ('@'), followed by a string of characters,
# and is terminated by a new line.
# Here, we define some RTI's that we will use througout the file to describe the various parts of our model:
MOLECULE_HEADER = "@MOLECULE"
ATOM_HEADER = "@ATOM"
BOND_HEADER = "@BOND"
SUBSTR_HEADER = "@SUBSTRUCTURE"
# Don't worry about this next dictionary for now. It will just be used to
# map Chimera-specific things to their corresponding mol2-specific things.
chimera2sybyl = {'Car' : 'C.ar',
'C3' : 'C.3',
'C2' : 'C.2',
'C1' : 'C.1',
'Cac' : 'Any',
'N3+' : 'N.4',
'N3' : 'N.3',
'Npl' : 'N.pl3',
'N2' : 'N.2',
'N1' : 'N.1',
'Nox' : 'Any',
'Ntr' : 'Any',
'Ng+' : 'Any',
'O3' : 'O.3',
'Oar' : 'Any',
'O2' : 'O.2',
'O2-' : 'Any',
'O3-' : 'Any',
'S3+' : 'Any',
'S3' : 'S.3',
'S2' : 'S.2',
'Sac' : 'Any',
'Son' : 'S.O2',
'Sxd' : 'S.O',
'Pac' : 'Any',
'Pox' : 'Any',
'P3+' : 'Any',
'HC' : 'Any',
'H' : 'H',
'DC' : 'Any',
'D' : 'Any'
}
# Writing Out per-Molecule Information
# The "@MOLECULE" RTI indicates that the next couple of lines will contain information relevant
# to the molecule as a whole. First, write out the Record Type Indicator (RTI):
f.write("%s\n" % MOLECULE_HEADER)
# The next line contains the name of the molecule. This can be accessed through the 'mol.name' attribute.
# (Remember, 'mol' is a reference to the molecule we opened). If the model you open came from a pdb file, 'name' will most
# often be the name of the file (without the '.pdb' extension). For this example, 'mol.name' is "4fun".
f.write("%s\n" % mol.name)
# Next, we need to write out the number of atoms, number of bonds, and number of substructures in the model (substructures
# can be several different things; for the sake of simplicity, the only substructures we'll worry about here are residues).
# This data is accessible through attributes of a molecule object: 'mol.atoms', 'mol.bonds', and 'mol.residues' all contain
# lists of their respective components. We can determine how many atoms, bonds, or residues this
# molecule has by taking the 'len' of the appropriate list.
# save the list of references to all the atoms in 'mol':
ATOM_LIST = mol.atoms
# save the list of references to all the bonds in 'mol':
BOND_LIST = mol.bonds
# save the list of references to all the residues in 'mol':
RES_LIST = mol.residues
f.write("%d %d %d\n" % ( len(ATOM_LIST), len(BOND_LIST), len(RES_LIST)) )
# type of molecule
f.write("PROTEIN\n")
# indicate that no charge-related information is available
f.write("NO_CHARGES\n")
f.write("\n\n")
# Writing Out per-Atom Information
# Next, write out atom-related information. In order to indicate this, we must first write out the
# atom RTI:
f.write("%s\n" % ATOM_HEADER)
# Each line under the 'ATOM' RTI consists of information pertaining to a single atom. The following information about each
# atom is required: an arbitrary atom id number, atom name, x coordinate, y coordinate, z coordinate, atom type, id of the
# substructure to which the atom belongs , name of the substructure to which the atom belongs.
# You can look at each atom in the molecule by looping through its 'atoms' attribute.
# Remember, 'ATOM_LIST' is the list of atoms stored in 'mol.atoms.' It's more efficient
# to get the list once, and assign it to a variable, then to repeatedly ask for 'mol.atoms'.
for atom in ATOM_LIST:
# Now that we have a reference to an atom, we can write out all the necessary information to the file.
# The first field is an arbitrary id number. We'll just use that atom's index within the 'mol.atoms' list.
f.write("%d " % ATOM_LIST.index(atom) )
# Next, we need the name of the atom, which is accessible via the 'name' attribute.
f.write("%s " % atom.name)
# Now for the x, y, and z coordinate data.
# Get the atom's 'xformCoord' object. This is essentially a wrapper that holds information about the
# coordinate position (x,y,z) of that atom. 'xformCoord.x', 'xformCoord.y', and 'xformCoord.z' store the x, y,
# and z coordinates,
# respectively, as floating point integers. This information comes from the coordinates given for each atom
# specification in the input file
coord = atom.xformCoord()
f.write("%g %g %g " % (coord.x, coord.y, coord.z) )
# The next field in this atom entry is the atom type. This is a string which stores information about the
# chemical properties of the atom. It is accessible through the 'idatmType' attribute of an atom object.
# Because Chimera uses slightly different atom types than SYBYL (the modeling program for which .mol2 is the primary
# input format), use a dictionary called chimera2sybyl (defined elsewhere) that converts Chimera's atom types to
# the corresponding SYBYL version of the atom's type.
f.write("%s " % chimera2sybyl[atom.idatmType])
# The last two fields in an atom entry pertain to any substructures to which the atom may belong.
# As previously noted, we are only interested in residues for this example.
# Every atom object has a 'residue' attribute, which is a reference to the residue to which that atom belongs.
res = atom.residue
# Here, we'll use 'res.id' for the substructure id field. 'res.id' is a string which represents a unique id
# for that residue (a string representation of a number, i.e. "1" , which are sequential, for all the
# residues in a molecule).
f.write("%s " % res.id)
# The last field to write is substructure name. Here, we'll use the 'type' attribute of 'res'. the 'type' attribute contains
# a string representation of the residue type (e.g. "HIS", "PHE", "SER"...). Concatenate onto this the residue's 'id'
# to make a unique name for this substructure (because it is possible, and probable, to have more than one
# "HIS" residue in a molecule. This way, the substructure name will be "HIS6" or "HIS28")
f.write("%s%s\n" % (res.type, res.id) )
f.write("\n\n")
# Writing Out per-Bond Information
# Now for the bonds. The bond RTI says that the following lines will contain information about bonds.
f.write("%s\n" % BOND_HEADER)
# Each line after the bond RTI contains information about one bond in the molecule.
# As noted earlier, you can access all the bonds in a molecule through the 'bonds' attribute,
# which contains a list of bonds.
for bond in BOND_LIST:
# each bond object has an 'atoms' attribute, which is list of length 2, where each item in the list is
# a reference to one of the atoms to which the bond connects.
a1, a2 = bond.atoms
# The first field in a mol2 bond entry is an arbitrary bond id. Once again, we'll just use that
# bond's index in the 'mol.bonds' list
f.write("%d " % BOND_LIST.index(bond) )
# The next two fields are the ids of the atoms which the bond connects. Since we have a reference to both these
# atoms (stored in 'a1' and 'a2'), we can just get the index of those objects in the 'mol.atoms' list:
f.write("%s %s " % (ATOM_LIST.index(a1), ATOM_LIST.index(a2)) )
# The last field in this bond entry is the bond order. Chimera doesn't currently calcuate bond orders,
# but for our educational purposes here, this won't be a problem.
# The mol2 format expects bond order as a string: "1" (first-order), "2" (second-order), etc., so
# just write out "1" here (even though this may not be correct).
f.write("1\n")
f.write("\n\n")
# Writing Out per-Residue Information
# Almost done!!! The last section contains information about the substructures (i.e. residues for this example)
# You know the drill:
f.write("%s\n" % SUBSTR_HEADER)
# We've already covered some of these items (see above):
for res in RES_LIST:
# residue id field
f.write("%s " % res.id )
# residue name field
f.write("%s%s " % (res.type, res.id) )
# the next field specifies the id of the root atom of the substructure. For the case of residues,
# we'll use the alpha-carbon as the root.
# Each residue has an 'atoms' attribute which is a dictionary. The keys in this dictionary are
# atom names (e.g. 'C', 'N', 'CA'), and the values are lists of references to atoms in the residue that have that
# name. So, to get the alpha-carbon of this residue:
alpha_carbon = res.atoms['CA'][0]
# and get the id of 'alpha_carbon' from the 'mol.atoms' list
f.write("%d " % ATOM_LIST.index(alpha_carbon) )
# The final field of this substructure entry is a string which specifies what type of substructure it is:
f.write("RESIDUE\n")
f.write("\n\n")
f.close()
# And that's it! Don't worry if you didn't quite understand all the ins and outs of the mol2 file format.
# The purpose of this exercise was to familiarize yourself with Chimera's object model; writing out a mol2 file
# was just a convenient way to do that. The important thing was to gain an understanding of how Chimera's atoms,
# bonds, residues, and molecules all fit together.