[Chimera-users] PDB Header via command line?

Boris Steipe boris.steipe at utoronto.ca
Thu Nov 20 15:39:49 PST 2014

This was on my ToDo list ... so here you go.

If you are interested only in a list of missing atoms or residues as defined in the header, you don't need Chimera at all. These are standard strings in the PDB header. Just use grep:

$ grep -H "REMARK 465 " *.pdb
$ grep -H "REMARK 470 " *.pdb
Option -c gives you a count of how many REMARK records of this type are in each file, for a more concise output. 
(cf. http://www.wwpdb.org/documentation/format33/remarks2.html) You might also want to consider REMARKs 475 and 480 ... fantasy coordinates.

On the other hand, if you don't trust the PDB header records (I myself wouldn't), and want to compare with a topology library and list all missing atoms, a small modification of Elaine's script will work. If its only "certain"(?) residues you are looking for, add a test for "r.type" after "for r in m.residues:"


Disclaimer: I am new to this, so if this code can be improved I am happy to learn :-) 

# =================================================================
# listMissingAtoms.py
# Compares the template for each residue in all models with the
# actually existing atoms. Lists all atoms that are missing
# from the model. Considers only existing residues and does not
# consider header REMARKs.
# Test with PDB files 4CH8, 3G85 and 4UMN in a local directory.
# 3G85 has missing residues, but all observed residues are complete.

import chimera
import os
from os.path import expanduser

listH = False  # Set to True if missing H atoms should also be listed

# close all currently open models
for x in chimera.openModels.list(all=True):

os.chdir(expanduser("~") + "/test") # cd to a subdirectory of my home directory
pdbFiles = [f for f in os.listdir('.') if f.endswith(".pdb")]

out = ''

for pdb in pdbFiles:
    chimera.openModels.open(os.getcwd() + "/" + pdb)

    for m in chimera.openModels.list(modelTypes=[chimera.Molecule]):
        missing = []
        for r in m.residues:
            tmplRes = chimera.restmplFindResidue(r.type, False, False)
            atoms = ''
            if not tmplRes:
                continue  # this residue type is not in the template library
            foundAt = r.atomNames() # atoms in pdbfile
            for templAt in tmplRes.atomsMap.keys(): # atoms in template
                if ((templAt[0] != "H" or listH) and
                     templAt not in foundAt):   # atom in template is not in pdbfile
                    # add this atom to the string
                    atoms += templAt + " "
            if atoms: # string is not empty
                res = ("%3s %4d%1s%1s: " %
                          r.id.insertionCode if r.id.insertionCode else " ",
                          r.id.chainId) +
                # add this residue information to "missing"  
        if missing: # mising atoms were found ...
            # add the pdb filename to the output string
            out += "\n\n" + pdb + "\n"
            for line in missing:
                # add all residues to output string
                out += "    " + line + "\n"
    for x in chimera.openModels.list(all=True):


# === END =========================================================

On Nov 20, 2014, at 1:48 PM, Eric Pettersen <pett at cgl.ucsf.edu> wrote:

> On Nov 20, 2014, at 7:25 AM, Korbin West <khwest16 at wabash.edu> wrote:
>> Hi,
>> I'm trying to find missing atoms in certain residues. I know they are in the pdb header information, but is there a way I can use Chimera or python script to access this using the command line only? I have a large number of proteins to go through, so checking them all individually is not really efficient.
> Every Molecule model has a pdbHeaders attribute that is a dictionary whose keys are PDB records types (e.g. "REMARK", "HETNAM") and whose values are lists of text lines for that record type.  Here's some sample code that would print out all the REMARK records of each model, prefixed by the model name:
> from chimera import openModels, Molecule
> for m in openModels.list(modelTypes=[Molecule]):
> 	try:
> 		for rec in m.pdbHeaders["REMARK"]:
> 			print m.name, rec
> 	except KeyError:
> 		print m.name, "has no REMARK records"
> Maybe you can adapt the above into a script that does what you need.
> --Eric
>                         Eric Pettersen
>                         UCSF Computer Graphics Lab
>                         http://www.cgl.ucsf.edu
> _______________________________________________
> Chimera-users mailing list
> Chimera-users at cgl.ucsf.edu
> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users

More information about the Chimera-users mailing list