Saved Session in Chimera

Design

A session file saves the state of a program in such a way that the program may be terminated and then restarted at (approximately) the same point at a later time. For Chimera, the state of the program is arrived at via a combination of module initialization code and object creation. To simplify saving sessions, we need to design the code so that only objects need to be saved in sessions; i.e., the module initialization code must be setup program state to be the same for all sessions, and objects contain all session-specific information. Thus, saving a Chimera session file reduces to saving the state of all objects, and restarting reduces to creating object with the states from the session file.

If Chimera were written completely in Python, we may have been able to use one of the persistent object packages (e.g., from Zope). However, since we have custom (C Python) objects, both from the OTF and in extensions, we need our own method of saving objects.

A primary consideration is whether to use a single file for the entire session, or to have a file for each model and a Chimera-specific file for all other objects. The former approach has the advantage of simplicity, but the latter has more flexibility. In particular, if the amount of data is very large (e.g., a molecular dynamics trajectory), it may be desirable to use the original data file rather than to save models explicitly to a single file. Thus, a saved session will consist of a session file containing Chimera-specific data and a number of model data file, in whatever format is most appropriate.

Even though the information in the session file is Chimera-specific, there is no reason to design a new format for storing the information. A number of extensible data description languages have been designed. Currently, the most promising appears to be the eXtensible Markup Language (XML), which uses Data-Type Dictionaries (DTDs) to define file formats. By defining a Chimera session file DTD, tools for processing Chimera sessions may be easily built using available XML tools.

Frequently, it would be impossible to use the original model data files because additional data is added to molecules or atoms during the session. In these cases, the model data files need to be explicitly saved. Model data file format candidates include Protein Data Bank (PDB), macromolecular Crystallographic Interchange File (mmCIF), Tripos (mol2), and Chemical Markup Language (CML). None of the four formats use binary data, and are therefore reasonably portable. The PDB specification does not allow extension of the format; adding information via USER records is clumsy and verbose. The mmCIF specification does allow extensions, but it is difficult to write a general parser that can handle all legal mmCIF format files. The mol2 specification comes from Tripos and is a proprietary standard. This leaves CML as the only viable Chimera session file format. CML defines a data-type dictionary (DTD) for use with XML. Since we already want to use XML for the session file format, CML is the clear winner for the model data file format.

Session Creation

The major steps in creating a session file are:

  1. Locating the set of all the objects that needs to be saved;
  2. Verify that the set is closed; and
  3. Convert each object into an XML representation that can be used to recreate the object when the session is restored.

Location

Chimera core objects are created by core code under our control. Hence, they are in known modules and may be easily identified. Chimera extension objects, on the other hand, are created by extension code about which we have little knowledge. To save extension objects, we need to define a protocol for locating and communicating with them. The protocol is described in Extension Management. The combined set of core and extension objects is called the save set.

Verification

Once the save set has been identified, we need to make sure that it is closed, i.e., the set contains all the information needed to recreate the entire set of objects when the session is restored. (Note that this falls short of verifying that the set is complete, i.e., the set contains all the information needed to identically reproduce the session on restoration. Closure may be verified using only the information within the set; completeness can only be verified by examining the entire universe of objects in the session.)

For implementation purposes, a set is closed when member objects only refer to Python primitives or other member objects. Closure is guaranteed by checking all references within each objects of the save set for references to non-member objects. The standard Python introspection mechanisms (e.g., type, dir, and __dict__) may be used to recursively traverse Python member objects to check for references outside of the save set. These same mechanisms are not guaranteed to work with Python extension objects implemented in C or C++; for these objects, we require them to supply a verification method. In fact, for objects that have a chimeraSessionVerification method, the method should be used rather than the standard introspection procedure. In addition to checking for closure, we need to check that each member object has a procedure for converting itself to XML (see below).

If a save set is incomplete, there are two approaches to make the set complete:

  1. Recursively add non-member objects, or
  2. Recursively delete offending member objects.

The former approach tries to generate a superset of the initial save set, but has the requirement that member objects must be able to identify referenced object not in the save set. The latter approach finds a subset of the initial save set, but has the drawback that some objects will not be saved in the session.

Conversion

Again, if Chimera were written completely in Python, conversion of a set of Python objects to XML would be easy. The marshall module of the XML package from the Python XML SIG already does this. Unfortunately, Chimera has a number of Python extension objects (wrappy-generated wrappers around Object Technology Framework C++ objects), so the standard code will not work as is. The division into session file and model data files further complicate the conversion procedure.

The first problem in saving a set of objects is how we save object references. Since object references have no obvious textual counterpart, we need to create unique identifiers for all objects and store object references as identifiers of the target objects. Python already provides such a mechanism: the built-in id function. The main hitch to using this function comes when we want to use the original model data file in the saved session. If the original model data file does not contain the Python id information (e.g., in a pristine PDB file), there is no guarantee that the same objects will retain the same Python id when the model is rebuilt in the restored session. Therefore the unique identifier problem needs to be solved in a Chimera-specific way: by creating a mapping from Chimera unique identifiers to Python objects for use in converting object references to XML.

The second problem is how we save Python extension objects written in C or C++. As with verification, standard Python objects may be saved by using the introspection mechanisms, but these mechanisms may not work with Python extension objects. As with verification, we require that extension objects supply a method for saving themselves in XML.

The last problem is the order in which the objects are saved. While XML processing may be done by reading the entire document into memory and processing it as a grove of data, it is still desirable to make it possible to recreate the Chimera session in a one-pass on-the-fly reconstruction. If the objects are saved in such a way that (most) object references refer to previously created objects, then a one-pass restoration algorithm is feasible. (Due to the session file-model data file division, model objects must be saved separately from other objects. Since many of the object references are expected to be by extension object to molecular data, the ordering problem is actually simplified by the multiple file constraint.)

The complete session saving procedure is then:

  1. Create the identifier-object mapping for saving object references;
  2. Save model objects in model data files (if original data files cannot be used);
  3. Sort the remaining objects to minimize forward object reference; and
  4. Convert sorted list of objects into XML and save in session file.

Session Restoration

Session restoration is the inverse of the session creation process:

  1. Recover the identifier-object mapping from the session file;
  2. Recreate model objects from model data files;
  3. Recreate objects from session file; and
  4. Repair any forward object references.

As with session creation, C/C++ Python extension objects are problematic. The session file must contain enough information to identify the Python extension, function and function arguments needed to create these objects.


conrad@cgl.ucsf.edu / Saved Session in Chimera / January 2000