Elaine Meng, Eric Pettersen, Conrad Huang, and Thomas E. Ferrin;
Resource for Biocomputing, Visualization, and Informatics
University of California, San Francisco
Comparing multiple proteins is an essential part of research on protein structure and function. Even when a single protein is the primary focus of a study, additional related sequences can be used to reveal regions of high and low conservation. Depending on the set of sequences being compared, conservation can then be related to some aspect of structure or function, such as catalysis or recognition of a binding partner. In general, the number of sequences available for a set of proteins far surpasses the number of structures that have been solved. Alignments of hundreds of sequences provide much more information on conservation than can be obtained from the smaller set of known structures. Conversely, structural comparisons can provide information not available from sequences alone. Because overall structure tends to be more conserved than sequence, comparisons in three dimensions can provide residue equivalences between proteins even when the corresponding sequences are too divergent to align with confidence. Methods for combining the two types of data (sequence and structure) and for bridging between them are important for functional analyses.
Integrated sequence/structure analysis is a major area of technological research and development in our Resource Center. Several extensions to Chimera have been created in this area. The specific goals of this research project are to:
- Enhance Chimera for the display and analysis of sequence and structure data;
- Facilitate comparisons among related structures;
- Link sequence and structure information in intuitive and powerful ways.
The primary Chimera extensions in this area are Multalign Viewer, for viewing sequence alignments with associated structures, MatchMaker, which superimposes structures without pre-existing sequence alignments, and Match→Align, which generates a sequence alignment from the spatial proximities in a superposition of two or more structures.
![]()
![]()
![]()
Figure caption: Three pectate lyase structures (top left) associated with sequences in an alignment from Pfam, displayed with Multalign Viewer (bottom). Associations are indicated with color swatches behind the sequence names, and the structures have been superimposed using the alignment. Strands and helices are indicated with green and gold regions, respectively, and residues within 3.5 angstroms of the active site ion are shown in red in both sequence and structure. The Conservation header, shown as a histogram, was calculated with the AL2CO (Pei, 2001) entropy-based method and independent counts weighting. Top right: One of the three structures is shown as worm, with worm radius proportional to B-factor (an attribute automatically present for structures read in from PDB files). The worm is colored from blue to pink to yellow based on increasing values of the Conservation header. Gray segments in the worm represent alignment columns for which conservation was not calculated, due to a high proportion of gaps (0.5 or higher).
References:
- E.C. Meng, E.F. Pettersen, G.S. Couch, C.C. Huang, and T.E. Ferrin: Tools for integrated sequence-structure analysis with UCSF Chimera. BMC Bioinformatics 2006, 7, 339.
- Pei J, Grishin NV: AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics 2001, 17(8):700-712.
Laboratory Overview | Research | Outreach & Training | Available Resources | Visitors Center | Search