Molecular Data in Chimera 2
The Messy Parts
Eric Pettersen
UCSF Computer Graphics Lab
Mostly the Same
Most molecular data/API remains unchanged. Unless mentioned explicitly in this presentation, such data/API remains intact.
Changes to:
- Altloc handling
- Per-coordset attrs and pseudobonds
- Associated models
- Sequences/chains/chain IDs/alignments
- Miscellaneous
[any material that should appear in print but not on the slide]
Altloc Handling
In Chimera 1, each altloc had a separate atom instance. This produced problems in determining how many bonds an atom actually had, how many atoms were in a residue, etc.
In Chimera 2, atoms will have a "current altloc" (much like molecules have a current coordinate set):
- Changing an atom's altloc may also change its neighbors, grand-neighbors, etc. to be consistent with the new altloc.
- Altlocs ordered from most-to-least probable (Chimera 1 criteria, and computed on an aggregate basis [i.e. across connected altloc sets])
[any material that should appear in print but not on the slide]
Per-Coordset Attrs and Pseudobonds
Attributes—Fetching
If an attr is not found in an atom/bond/residue/molecule, it is then looked up in the coordset (using a special call, not with a simple getattr, to avoid clashing with built-in coordset attrs/methods and clashing with the same attr of other instances).
Attributes—Setting
Per-coordset attrs are set with a special call (set_traj_attr?) that first checks to see if the attr exists in the main object. If it does exist there, it is copied into all coordsets and destroyed in the main object. In either case, the attr is then set in the requested coordset (default current coordset). Molecule models will have a is_traj attr so that trajectories can be identified even if only one coordset is loaded.
[any material that should appear in print but not on the slide]
Per-Coordset Attrs and Pseudobonds
Pseudobonds
There will be two types of pseudobond groups (inheriting from a common base). One is essentially the same as Chimera 1. The other is associated with a molecule and only has pseudobonds between atoms of that molecule. It will store its pseudobonds in the coordsets of that molecule, and will close when the molecule closes. The pseudobond group manager holds the first type and has a convenience function for getting all groups, but each molecule is the actual manager of its groups.
[any material that should appear in print but not on the slide]
Associated/Subsidiary Models
In Chimera 1, a model can have associated models that close when it closes.
In Chimera 2, an associated model is more like a "subsidiary" model:
it closes when the primary closes, moves when the primary moves, and is shown in the
model panel "under" the primary model.
Unlike Chimera 1, a molecular surface model in Chimera 2 will be associated with
its molecule.
To keep a subsidiary model open past the closure of its primary, it needs to be
copied and made primary. In the case of a molecular surface, it would also need to be
converted into a generic surface.
[any material that should appear in print but not on the slide]
Sequences/Chains et al.
Definitions
- Sequence
- Essentially a glorified list of letters
- Chain
- Subclass of Sequence, with a corresponding list of polymeric residues,
some of which may be None/NULL. Has an associated chain ID.
Nitty-Gritty Details
Chain IDs
- Chain IDs can be multiple characters, but should only be so in
extreme cases (> 62 chains in an asymmetric unit).
- Chain IDs need not be unique.
- Chain IDs are first looked up in the residue's chain (polymeric residues)
and then in the residue itself. Singleton residues can have non-blank chain IDs.
[any material that should appear in print but not on the slide]
Sequences/Chains et al.
More Nitty-Gritty Details
Chains
- For residues connected through a polymeric bond, or an implied
polymeric bond (missing structure)
- Other connecting bonds (e.g. disulphide bonds) are irrelevant.
- Connecting two residues through a polymeric bond combines their chains,
with the chain ID being the "earlier" of the two (N-terminal, 5' end).
No implicit residue renumbering occurs.
- Splitting chains at a polymeric bond produces two chains (assuming multiple
residues in both parts) with the same ID.
- Unlike the list of residues in a molecule, the list of residues in a chain
can contain None/NULL values. Using empty residues is a possible alternative, but has issues with indeterminate sequence numbers and insertion codes and with residues not directly owned by the molecule (i.e. not in
molecule.residues)
- For the sake of alignments containing chains, when a chain is destroyed it
first fires a "demotion to sequence" trigger with a copy of itself as
a sequence in the trigger data.
[any material that should appear in print but not on the slide]
Sequences/Chains et al.
Even More Nitty-Gritty Details
Alignments
Alignments will be classes so that they can be handled in nogui mode,
but unlike sequences/chains they will be Python classes.
[any material that should appear in print but not on the slide]
Miscellaneous
- MolResIds get the heave ho.
- No covalent bond across a missing-structure gap — just a pseudobond.
A read-only atom property (
numSubstituents?) returns a value that accounts
for the pseudobond.
- Not really a change: no persistent empty molecules/residues/chains
- PDB mixtures handled as separate structures
[any material that should appear in print but not on the slide]