Design Concepts

The OTF is a system for generating C++ classes useful to biochemical application developers. The major classes that the OTF currently offers built-in support for are:

Atom
Bond
Residue
Molecule

The functionality that the OTF can provide with the above classes is separated into coherent areas referred to as toolkits. The functionality each toolkit offers is further divided into one or more components that can be used on an individual basis. Application developers using the OTF run a program called genlib to compose exactly those components they want in their application into the above classes.

Each component has an interface and one or more implementations. The interface specifies what functionality the component offers to the application (classes, member functions, etc.) and each implementation contains all the code necessary to completely implement the interface. The reason that some components offer multiple implementations is so that if there is an important design tradeoff in the implementation (e.g. speed vs. memory use) then implementations embodying differing selections in the tradeoff can be provided. Application developers select a particular implementation when selecting a component for inclusion in their application.

In order to customize the generated classes for the application's specific needs, the developer writes C++ code that will be composed by genlib into the class at the same time as the selected components. The C++ code is written in the same fashion that the standard OTF components are written, which is regular C++ code embedded in simple delimiters that genlib can recognize. Consequently, if the customizations are of some general utility, they can be organized into components and toolkits which can be easily re-used in future applications.

The classes generated by genlib are designed to be used as is and should not be subclassed from. This is because the generated classes have containers referring to other generated classes (e.g. the Molecule class has containers of both Atoms and Bonds). In order for the containers to refer to subclasses, the base classes would need to be template classes. See the Why Didn't We... section for why this and other possible approaches were avoided. The genlib mechanism is provided so that the classes' functionality can be extended while avoiding subclassing.

Run-time Determined Member Data

The OTF is designed to work with extensible data sources such as mmCIF and CEX. A consequence of this is that at compile time it may not be possible to know what types of data the application will want to associate with the various classes at run time; the data source may include unanticipated data that nonetheless may be of interest to the end user. In the face of this, the traditional scheme where a class provides a separate access function for each of its data fields is too inflexible. Instead, an OTF toolkit component provides a base class offering a single indirect access function where the data to be retrieved is indicated by the argument to the access function. So, instead of a call such as:

double x = atom->radius();

the OTF access function would be:

double x = atom->propValue("radius");

For convenience, some direct access functions to commonly used data fields are also provided (and more can be provided by the developer via the genlib customization mechanism).

Why Didn't We...

... use C++ inheritance?: Trying to provide the OTF functionality in a set of simple base classes has two major problems. One is that the Molecule base class wants to contain instances of the class derived from the Bond base class, for example, but only knows about the base class. This means that Molecule member functions returning "Bonds" can only return the Bond base class instead of the derived class. This leads to tedious error-prone casts from the base class to the derived class throughout the application.; The other problem is that these base classes cannot create instances of the derived classes. This is crucial when attempting to read in a PDB file, for example, where Atoms, Bonds, etc. need to be created.; Okay, then why didn't we...
... use template base classes?: Using template base classes solves the problem of being unable to return derived classes from base class member functions, but suffers from serious deficiencies that genlib alleviates. All the deficiencies spring from a single source: the template class/function argument list.; This is probably best understood by example: the Molecule class needs to contain Atoms, Bonds, and Residues, and therefore needs these classes as template arguments. The Atom class, meanwhile, keeps a back pointer to its containing Molecule, and therefore needs not only Molecule as a template argument, but Atom, Bond, and Residue, since those are template arguments for Molecule. But wait, this means that the Molecule class needs Molecule as a template argument as well, since Molecule contains Atoms!; The upshot is that every template class winds up needing every other template class on its argument list. This destroys the advantage of encapsulation since adding a new class to the library, such as Ring, requires modification of every other class in the library, not just those classes that use the new class.; Since encapsulation is broken, the library is no longer extensible; two different developers can no longer extend the capabilities of the library in two different (but separate and distinct) ways since the changes propagate throughout the entire library and thereby make the libraries incompatible with one another. This means that re-use of functionality is destroyed.; Finally, (again since encapsulation is broken) the functionality offered by the library cannot easily be separated into functional groups. The application developer has to include the entire library's functionality in his/her application, since it is all interconnected.; Well, wouldn't it be nice, at least, to...
... offer precompiled libraries?: Very few components of the OTF are sufficiently separable from other parts of the OTF that they can be compiled down to object code while other components are still capable of assembly with genlib. Even though the application developer will typically not extend many of the OTF components, the unmodified components will still refer to classes (e.g. Atom) that are extended. And despite the fact that the component won't utilize or refer to the extended parts of the class, recompilation is still necessary since the granularity of C++ compilation dependence mechanisms is at the level of the entire interface file (because C++ needs to know the size of class instances), rather than at the level of particular components of the interface.; Well, then why not...
... provide a single library encompassing all functionality?: Because one doesn't apply a thumbtack with a pile driver. A small application needing a minimal subset of the OTF should not be required to pull in every single feature of the OTF into the application. This leads to bloated inefficient code. And while the "single library" approach is marginally feasible with the OTF at its current early stages of release, more mature releases of the OTF will include toolkits for NMR processing, molecular mechanics, etc. At that point, attempting to compile it as a single library would exhaust compiler limits, disk space, and any sane person's patience.

Next section