Molecular Data in Chimera 2

The Messy Parts

Eric Pettersen

UCSF Computer Graphics Lab

Mostly the Same

Most molecular data/API remains unchanged. Unless mentioned explicitly in this presentation, such data/API remains intact.

Changes to:

Altloc handling
Per-coordset attrs and pseudobonds
Associated models
Sequences/chains/chain IDs/alignments
Miscellaneous

[any material that should appear in print but not on the slide]

Altloc Handling

In Chimera 1, each altloc had a separate atom instance. This produced problems in determining how many bonds an atom actually had, how many atoms were in a residue, etc.

In Chimera 2, atoms will have a "current altloc" (much like molecules have a current coordinate set):

Changing an atom's altloc may also change its neighbors, grand-neighbors, etc. to be consistent with the new altloc.
Altlocs ordered from most-to-least probable (Chimera 1 criteria, and computed on an aggregate basis [i.e. across connected altloc sets])

[any material that should appear in print but not on the slide]

Per-Coordset Attrs and Pseudobonds

Attributes—Fetching
If an attr is not found in an atom/bond/residue/molecule, it is then looked up in the coordset (using a special call, not with a simple getattr, to avoid clashing with built-in coordset attrs/methods and clashing with the same attr of other instances).

Attributes—Setting
Per-coordset attrs are set with a special call (set_traj_attr?) that first checks to see if the attr exists in the main object. If it does exist there, it is copied into all coordsets and destroyed in the main object. In either case, the attr is then set in the requested coordset (default current coordset). Molecule models will have a is_traj attr so that trajectories can be identified even if only one coordset is loaded.

[any material that should appear in print but not on the slide]

Per-Coordset Attrs and Pseudobonds

Pseudobonds
There will be two types of pseudobond groups (inheriting from a common base). One is essentially the same as Chimera 1. The other is associated with a molecule and only has pseudobonds between atoms of that molecule. It will store its pseudobonds in the coordsets of that molecule, and will close when the molecule closes. The pseudobond group manager holds the first type and has a convenience function for getting all groups, but each molecule is the actual manager of its groups.

[any material that should appear in print but not on the slide]

Associated/Subsidiary Models

In Chimera 1, a model can have associated models that close when it closes. In Chimera 2, an associated model is more like a "subsidiary" model: it closes when the primary closes, moves when the primary moves, and is shown in the model panel "under" the primary model.

Unlike Chimera 1, a molecular surface model in Chimera 2 will be associated with its molecule.

To keep a subsidiary model open past the closure of its primary, it needs to be copied and made primary. In the case of a molecular surface, it would also need to be converted into a generic surface.

[any material that should appear in print but not on the slide]

Sequences/Chains et al.

Definitions

Sequence: Essentially a glorified list of letters
Chain: Subclass of Sequence, with a corresponding list of polymeric residues, some of which may be None/NULL. Has an associated chain ID.

Nitty-Gritty Details
Chain IDs

Chain IDs can be multiple characters, but should only be so in extreme cases (> 62 chains in an asymmetric unit).
Chain IDs need not be unique.
Chain IDs are first looked up in the residue's chain (polymeric residues) and then in the residue itself. Singleton residues can have non-blank chain IDs.

[any material that should appear in print but not on the slide]

Sequences/Chains et al.

More Nitty-Gritty Details
Chains

For residues connected through a polymeric bond, or an implied polymeric bond (missing structure)
Other connecting bonds (e.g. disulphide bonds) are irrelevant.
Connecting two residues through a polymeric bond combines their chains, with the chain ID being the "earlier" of the two (N-terminal, 5' end). No implicit residue renumbering occurs.
Splitting chains at a polymeric bond produces two chains (assuming multiple residues in both parts) with the same ID.
Unlike the list of residues in a molecule, the list of residues in a chain can contain None/NULL values. Using empty residues is a possible alternative, but has issues with indeterminate sequence numbers and insertion codes and with residues not directly owned by the molecule (i.e. not in molecule.residues)
For the sake of alignments containing chains, when a chain is destroyed it first fires a "demotion to sequence" trigger with a copy of itself as a sequence in the trigger data.

[any material that should appear in print but not on the slide]

Sequences/Chains et al.

Even More Nitty-Gritty Details

Alignments
Alignments will be classes so that they can be handled in nogui mode, but unlike sequences/chains they will be Python classes.

[any material that should appear in print but not on the slide]

Miscellaneous

MolResIds get the heave ho.
No covalent bond across a missing-structure gap — just a pseudobond. A read-only atom property (numSubstituents?) returns a value that accounts for the pseudobond.
Not really a change: no persistent empty molecules/residues/chains
PDB mixtures handled as separate structures

[any material that should appear in print but not on the slide]

San Francisco, Feb 2013