Sequences and Structures Tutorial

This tutorial focuses on looking at sequences and structures together using the Multalign Viewer extension. Members of the enolase superfamily of enzymes are used to illustrate these features of Chimera. The enolase superfamily has been described in several publications, including:

P.C. Babbitt, M.S. Hasson, J.E. Wedekind, D.R. Palmer, W.C. Barrett, G.H. Reed, I. Rayment, D. Ringe, G.L. Kenyon, and J.A. Gerlt, "The Enolase Superfamily: A General Strategy for Enzyme-Catalyzed Abstraction of the Alpha-Protons of Carboxylic Acids" Biochemistry 35:16489 (1996).

To follow along with the tutorial, you will first need to download the MSF file super8.msf into your working directory. This file contains a multiple sequence alignment of the barrel domains of eight enolase superfamily members.

On Windows/Mac, click the chimera icon; on UNIX, start Chimera from the system prompt:

unix: chimera

A basic Chimera window should appear after a few seconds. Choose the menu item Tools... Homology... Multalign Viewer. In the resulting dialog, locate and open super8.msf. A sequence window containing the alignment will appear. The sequence window has its own preferences; choose Preferences... Layout from the sequence window menu and change the font size and the sequence-wrapping behavior as desired. Size and place the sequence window and main Chimera window so that both are visible. If the sequence window becomes obscured at any point, it can be raised from the Chimera Tools menu (Tools... MAV - super8.msf... Raise).

When the cursor is in the sequence window, the Page Down key (or space) moves the alignment view down and Page Up (or Shift-space) moves the alignment view up.

The names of the sequences are shown on the left; a consensus sequence and conservation histogram are shown above the multiple alignment. The superfamily is very diverse, in that there are very few positions in the alignment where all the sequences have the same residue. The completely conserved positions are indicated with a red capital letter in the consensus sequence.

The first sequence in the alignment, named mr, corresponds to the barrel domain of mandelate racemase from Pseudomonas putida. The next-to-the last sequence in the alignment, named enolyeast, corresponds to the barrel domain of enolase from Saccharomyces cerevisiae. There are multiple structures in the Protein Data Bank for each of these sequences; 2mnr (mandelate racemase) and 4enl (enolase) are used in this tutorial.

To open the structures, choose File... Fetch by ID from the Chimera menu. In the resulting dialog, check the PDB ID option (if it is not already checked) and the option to Keep dialog up after Fetch. Enter 2mnr in the blank marked PDB ID and click Fetch. Repeat the process to fetch 4enl, and then click Close to dismiss the dialog. Chimera will attempt to find the files within a local installation of the Protein Data Bank. If a file is not found locally, Chimera will try to retrieve it from the Protein Data Bank web site. If for some reason this procedure does not work, go to the Protein Data Bank web site, download the two PDB files, and then open them as local files (with File... Open).

The view is initially centered on a protein colored white, the mandelate racemase structure. Off to the side (and possibly out of view) is the enolase structure, which is colored magenta. The structures are not matched in any way; the coordinates are taken straight from the Protein Data Bank. Rotate, translate, and scale the structures as needed to get a better look (see mouse manipulation to review how this is done). If you like, open the Side View; choosing Tools...Viewing Parameters... Side View is one of many possible ways to do this. Continue moving and scaling the structures as desired throughout the tutorial.

Notice that in the sequence window, the sequence names mr and enolyeast are now shown in bold within boxes colored white and magenta, respectively. This means that the sequences have been compared with the sequences in the structures, found to match sufficiently well, and automatically associated with the structures. The colors indicate which sequence has been associated with which structure.

The alignment includes just the barrel domains, so it will be simplest to display only a chain trace of these portions. First, open the Command Line; choosing Tools... Keyboard... Command Line is one of many possible ways to do this. Next, find out which residue numbers in the structures correspond to the beginning and end of the alignment. In general, these numbers will not be the same as the numbers marked on the alignment. However, placing the cursor over any residue in a structure-associated sequence in the alignment will report the corresponding structure residue number in the status area near the bottom of the Multalign Viewer sequence window. In each of the two structure-associated sequences, find out the starting and ending residue numbers by placing the cursor over the terminal residues in the alignment. Doing this reveals that the residue ranges are 134-317 in the mandelate racemase structure and 151-400 in the enolase structure. Display the alpha-carbon chain traces of the barrel domains and increase the linewidth (review atom specification syntax if desired):

Command: chain #0:134-317@ca
Command: chain #1:151-400@ca
Command: linewidth 3

The sequence alignment can be used to guide a structural match. From the sequence window menu, choose Structure... Match... and then make 2mnr the reference structure and 4enl the structure to match. Click OK without checking any boxes. This causes the match to use all pairs of alpha-carbons of residues aligned in the sequence alignment. The match is fairly rough, and the RMSD (reported in the status area) is 8.4 A: not a terrific match. It is evident that not all of the residues aligned in the sequence alignment are really structurally equivalent. Some loops in enolase (magenta) are much longer than those in mandelate racemase (white). Try the structure matching again, but this time, check the box marked Iterate by pruning... and edit the angstrom value to 1.0 before clicking OK. This will superimpose only the pairs that collectively match very well in space. Visually, the match is much improved; the dissimilar loop regions were not used in the match.

Next, we will see where some of the conserved residues are within the structures. In the sequence window, use the mouse to drag a column containing the first completely conserved residue in the alignment (the aspartate, D, at alignment position 99). This selects the residues in the associated structures, which then can be displayed:

Actions... Atoms/Bonds... show
Open the sequence window region browser (Tools... Region Browser in the sequence window menu). If any sequence regions are created by mistake, click on the line in the browser describing the region and then Delete. Create another region (this time, press Ctrl along with the mouse button to start a new region) for the next completely conserved residue. This is a glutamate, E, at alignment position 148; display it too:
Actions... Atoms/Bonds... show
The displayed residues point into the center of the barrel, ligating a catalytically important metal ion. Display the Mn++ ion in mandelate racemase:
Select... Chemistry... element... other... Md-Ni... Mn
Actions... Atoms/Bonds... show
Actions... Atoms/Bonds... sphere
Actions... Color... yellow
Select... Clear Selection

Sequence regions can also be created automatically. From the sequence window menu, choose Structure... Secondary Structure... show actual. This creates regions in the structure-associated sequences named structure helices and structure strands colored goldenrod and lime green, respectively (yes, these are really named colors). The region names are listed in the region browser. Clicking on a region name in the browser or on the region itself in the sequence window selects the residues in the structure. In the sequences not associated with structures, Structure... Secondary Structure... show predicted creates regions named predicted helices and predicted strands colored gold and light green, respectively. The GOR method is used for prediction. Compare the predictions with the actual secondary structure.

Try other operations as desired; see the Multalign Viewer documentation for a full description of its functions, including alignment editing. A number of sequence alignment formats can be read in and (with or without prior editing) written out.

When finished, end the Chimera session:

Command: stop


meng@cgl.ucsf.edu / October 2004