If Chimera were written completely in Python, we may have been able to use one of the persistent object packages (e.g., from Zope). However, since we have custom (C Python) objects, both from the OTF and in extensions, we need our own method of saving objects.
A primary consideration is whether to use a single file for the entire session, or to have a file for each model and a Chimera-specific file for all other objects. The former approach has the advantage of simplicity, but the latter has more flexibility. In particular, if the amount of data is very large (e.g., a molecular dynamics trajectory), it may be desirable to use the original data file rather than to save models explicitly to a single file. Thus, a saved session will consist of a session file containing Chimera-specific data and a number of model data file, in whatever format is most appropriate.
Even though the information in the session file is Chimera-specific, there
is no reason to design a new
Frequently, it would be impossible to use the original model data files because additional data is added to molecules or atoms during the session. In these cases, the model data files need to be explicitly saved. Model data file format candidates include Protein Data Bank (PDB), macromolecular Crystallographic Interchange File (mmCIF), Tripos (mol2), and Chemical Markup Language (CML). None of the four formats use binary data, and are therefore reasonably portable. The PDB specification does not allow extension of the format; adding information via USER records is clumsy and verbose. The mmCIF specification does allow extensions, but it is difficult to write a general parser that can handle all legal mmCIF format files. The mol2 specification comes from Tripos and is a proprietary standard. This leaves CML as the only viable Chimera session file format. CML defines a data-type dictionary (DTD) for use with XML. Since we already want to use XML for the session file format, CML is the clear winner for the model data file format.
The major steps in creating a session file are:
Chimera core objects are created by core code under our control. Hence, they are in known modules and may be easily identified. Chimera extension objects, on the other hand, are created by extension code about which we have little knowledge. To save extension objects, we need to define a protocol for locating and communicating with them. The protocol is described in Extension Management. The combined set of core and extension objects is called the save set.
Once the save set has been identified, we need to make sure that it is closed, i.e., the set contains all the information needed to recreate the entire set of objects when the session is restored. (Note that this falls short of verifying that the set is complete, i.e., the set contains all the information needed to identically reproduce the session on restoration. Closure may be verified using only the information within the set; completeness can only be verified by examining the entire universe of objects in the session.)
For implementation purposes, a set is closed when member objects only refer
to Python primitives or other member objects. Closure is guaranteed by checking
all references within each objects of the save set for references to non-member
objects. The standard Python introspection mechanisms (e.g., type
,
dir
, and __dict__
) may be used to recursively traverse
Python member objects to check for references outside of the save set. These
same mechanisms are not guaranteed to work with Python extension objects implemented
in C or C++; for these objects, we require them to supply a verification method.
In fact, for objects that have a chimeraSessionVerification
method,
the method should be used rather than the standard introspection procedure.
In addition to checking for closure, we need to check that each member object
has a procedure for converting itself to XML (see below).
If a save set is incomplete, there are two approaches to make the set complete:
The former approach tries to generate a superset of the initial save set, but has the requirement that member objects must be able to identify referenced object not in the save set. The latter approach finds a subset of the initial save set, but has the drawback that some objects will not be saved in the session.
Again, if Chimera were written completely in Python, conversion of a set of
Python objects to XML would be easy. The marshall
module of the
XML package from the Python XML SIG already does this. Unfortunately, Chimera
has a number of Python extension objects (wrappy-generated wrappers
around Object Technology Framework C++ objects), so the standard code will not
work as is. The division into session file and model data files further complicate
the conversion procedure.
The first problem in saving a set of objects is how we save object references.
Since object references have no obvious textual counterpart, we need to create
unique identifiers for all objects and store object references as identifiers
of the target objects. Python already provides such a mechanism: the built-in
id
function. The main hitch to using this function comes when we
want to use the original model data file in the saved session. If the original
model data file does not contain the Python id information (e.g., in
a pristine PDB file), there is no guarantee that the same objects will retain
the same Python id when the model is rebuilt in the restored session. Therefore
the unique identifier problem needs to be solved in a Chimera-specific way:
by creating a mapping from Chimera unique identifiers to Python objects for
use in converting object references to XML.
The second problem is how we save Python extension objects written in C or C++. As with verification, standard Python objects may be saved by using the introspection mechanisms, but these mechanisms may not work with Python extension objects. As with verification, we require that extension objects supply a method for saving themselves in XML.
The last problem is the order in which the objects are saved. While XML processing may be done by reading the entire document into memory and processing it as a grove of data, it is still desirable to make it possible to recreate the Chimera session in a one-pass on-the-fly reconstruction. If the objects are saved in such a way that (most) object references refer to previously created objects, then a one-pass restoration algorithm is feasible. (Due to the session file-model data file division, model objects must be saved separately from other objects. Since many of the object references are expected to be by extension object to molecular data, the ordering problem is actually simplified by the multiple file constraint.)
The complete session saving procedure is then:
Session restoration is the inverse of the session creation process:
As with session creation, C/C++ Python extension objects are problematic. The session file must contain enough information to identify the Python extension, function and function arguments needed to create these objects.