Online Structure Alignment Resources (April 2005;
later additions as noted)
This list is not meant to be comprehensive, but includes a
reasonably broad sample of sites I think are handy and convenient to use.
The test cases involve structures with low pairwise sequence identity:
- Pairwise - 2mnr (mandelate racemase) 4enl (enolase)
- Multiple - 2mnr 4enl 1nu5 (MLE II)
Default settings generally used.
Note that the
MatchMaker tool
in Chimera performs
a similar function: it constructs a sequence alignment
and superimposes the structures accordingly.
As of version 1.2129 (June 2005), secondary structure can be used
to help construct the alignment, which allows pairs with lower
percent identity to be superimposed correctly
(see
results from using MatchMaker).
Superposition Servers
- FAST
- pairwise; paper compares the method to DaliLite, CE, and K2.
Reference:
Zhu J, Weng Z.
FAST: a novel protein structure alignment algorithm.
Proteins. 2005 Feb 15;58(3):618-27.
Availability:
server;
executables for Linux, IBM, Sun, Alpha, Mac OS X can be downloaded
Performance:
~5 seconds for pairwise.
Shows a sequence alignment in the web page;
superimposed coordinates are available within the downloadable
RasMol script (puts both into chains into one model and makes them
A and B; Chimera complains about the bad PDB records
but will display the structures nonetheless).
- MultiProt
- truly multiple rather than assembly of pairwise alignments;
can handle circular permutation.
Method seems elaborate (opinion from reading the paper) but works well.
Reference:
Shatsky M, Nussinov R, Wolfson HJ.
A method for simultaneous alignment of multiple protein structures.
Proteins. 2004 Jul 1;56(1):143-56.
Availability:
server;
can be downloaded for Linux
Performance: ~28 seconds for 3-way using sequence order.
Reports several alignments: five of all three, five with various pairings.
I took the 3-way alignment with the most residues aligned.
Coordinates are combined in a single file but come out as separate
submodels in Chimera (#0.1, #0.2, #0.3).
- SuperPose
- multiple; authors claim the server is especially user-friendly.
Reference:
Maiti R, Van Domselaar GH, Zhang H, Wishart DS.
SuperPose: a simple server for sophisticated structural superposition.
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W590-4.
Availability:
server
(uses a number of other programs including VADAR and ClustalW)
Performance:
~25 seconds for pairwise
(not including waiting for the display applet).
Results page allows download of superposition in PDB format,
sequence alignment, and
RMSD information.
Coordinates are combined in a single file but come out as separate
submodels in Chimera (#0.1, #0.2, #0.3).
Only two files can be uploaded, so according to the instructions you have to
combine input files to superimpose more than two structures.
I had trouble getting this to work. The support person said
the structures must have chain IDs; however, adding chain IDs to
my input did not work. Only taking the advice to use the server to
superimpose two and then using that output (which, incidentally,
did NOT include chain IDs) as one of the inputs for
another round of superposition worked.
- SALIGN (added September 2005) - multiple
Reference:
Marti-Renom MA, Madhusudhan MS, Sali A.
Alignment of protein sequences by their profiles.
Protein Sci. 2004 Apr;13(4):1071-87.
Availability:
server
Performance:
~15 seconds for 3-way.
The resulting web page shows the superposition in a Jmol window
and a sequence alignment. The sequence alignment can be downloaded
in PIR, FASTA, or PAP format, and a
PDB file of the superimposed coordinates
can be downloaded.
Coordinates are combined in a single file but come out as separate
submodels in Chimera (#0.1, #0.2, #0.3).
These were not in the order of specification in the
input file, but the submodels
were also given chain IDs (A, B, C) that did reflect that order.
Rather than using that PDB file,
I find it easier to open the alignment in Chimera, autoload
the structures from the PDB (since the sequences are named with
the PDB IDs), and match the structures using the sequences,
because this retains the original chain IDs plus ligands, ions, etc.
- K2, K2SA
- pairwise; the program (but not the server?)
can handle circular permutation.
Reference:
Szustakowski JD and Weng Z.
Protein structure alignment using a genetic algorithm.
Proteins, 38(4):428-440, Mar 2000.
Availability:
server
Performance: ~30 seconds using sequential constraints.
Can download coordinates for second protein (transformed to
match first protein) and a log file.
- CE (pairwise), CE-MC (multiple, building upon
pairwise CE alignments)
 |
 |
| CE |
CE-MC |
References:
Shindyalov IN, Bourne PE.
Protein structure alignment by incremental combinatorial extension (CE)
of the optimal path.
Protein Eng. 1998 Sep;11(9):739-47.
Guda C, Lu S, Scheeff ED, Bourne PE, Shindyalov IN.
CE-MC: a multiple protein structure alignment server.
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W100-3.
Availability:
CE server ("software" available
but I can't tell what is in the tar file without downloading it),
CE-MC server (binaries and
source code available for Linux and Solaris)
Performance:
~55 seconds for pairwise (CE server). ~4 minutes for multiple alignment
(CE-MC); e-mail address required for this option.
Results include structure-based sequence alignment and
single downloadable PDB in which the structures
are given different chain IDs (Chimera complains about duplicate
atom serial numbers but displays the structures nonetheless).
In the CE-MC results, the residue numbers are shifted over one column
from where they should be, so the file has to be edited before viewing.
The support person said the problem would be fixed in the next update.
- DaliLite - pairwise
(the Dali server
instead
compares a single input structure against a representative set from the PDB,
like VAST Search)
Reference:
Holm L, Park J.
DaliLite workbench for protein structure comparison.
Bioinformatics. 2000 Jun;16(6):566-7.
Availability:
server;
download (Unix) and secondary server available from
another site
Performance: ~1 minute; reports several alignments and their stats;
for each, coordinates can be downloaded for the second protein.
The figure shows the superposition with the most residues aligned.
- Protein3Dfit (added July 2005) - pairwise
Reference:
Lessel U, Schomburg D.
Similarities between protein 3-D structures.
Protein Engineering. 1994 Oct;7(10):1175-87.
Availability:
server
Performance:
~25 seconds for pairwise (default settings; various parameters
can be adjusted by the user).
The resulting web page shows the superposition in a
Jmol window plus a sequence alignment and RMSD table. Links include
many different ways (convenient!) to download the results.
One way to save the superposition is by downloading the
transformed coordinates of the second structure, and
the sequence alignment can be saved in
PIR format.
- later additions (not evaluated)
- OPAAS
(Optimal, Permuted and other Alternative Alignments of protein Structures)
- 3D-SS (choice of method: STAMP or ProFit)
- ARTS
(for nucleic acid structures)
- Mammoth
(output coordinates are CA-only,
but the sequence alignment can be downloaded in various formats)
- FATCAT (Flexible structure AlignmenT by Chaining Aligned
fragment pairs allowing Twists; i.e., one of the
structures may be split into pieces)
- POSA
(flexible; multiple; requires e-mail address;)
- GANGSTA
(seq-order-independent; requires e-mail address)
- SCMBB pairwise protein superposition server
- CPalign
(seq-order-independent; requires e-mail address)
- TopMatch (successor of ProSup)
- PDBj
superposition servers (RASH and GASH) - seek to maximize NER (number
of equivalent residues) at 4 Å cutoff
- servers that don't provide output PDB coordinates (as far as I can tell)
Circularly Permuted Test System (May 2005)
Test structures: 1qq5 A chain (HAD) 3chy (CheY)
These are in different superfamilies according to SCOP,
but it has been noted that the structures appear to
be related by circular permutation (see
paper by Ridder and Dijkstra).
The active site
residues are in about the same positions in 3D, although
in different N->C orders. Selected active site residues
(those in a column are aligned in 3D):
-----------------------------------------------------------
loop 1 loop 2 loop 3 loop4
-----------------------------------------------------------
2-HAD X. autrophicus *
D8 S114 K147 171SS---D176
epoxide hydrolase M. musculus, H. sapiens *
D9 T123 K160 184DD---N189
CheY E. coli, S. typhimurium *
D57 T87 K109 12DD13
-----------------------------------------------------------
| superimposed manually |
MultiProt (~10-15 seconds)
(the choice with the most residues aligned
and as a tie-breaker, the lowest RMSD) |
K2SA (~15-20 seconds)
(wrong, see log file) |
 |
 |
 |
The Multiprot results correctly place the corresponding
active site residues together; the K2SA results are shifted
over by one strand in the central sheet.
NB: The K2SA server appears to be imposing sequential
constraints even when I set the option to NOT do so...
which would explain the failure on this test case.
I reported the problem and plan to try again if it is fixed.
Sources of Structure-Based Sequence Alignments
(which can then be used in Chimera's
Multalign Viewer
to match the corresponding structures)
- 3DCoffee
- multiple; primarily intended to generate
sequence alignments but can use 3D structures to do so;
can also combine multiple sequence alignments
 |
 |
| Multalign Viewer matching defaults |
with iteration (pruning) |
Reference:
Poirot O, Suhre K, Abergel C, O'Toole E, Notredame C.
3DCoffee: a web server for mixing Sequences and Structures into
multiple sequence alignments.
Nucleic Acids Research 2004 Jul 1;32(Web Server issue):W37-40.
Availability:
TCoffee server (includes 3DCoffee option)
Performance:
The three PDBs were submitted using the Advanced 3DCoffee page.
Results were returned in ~4 1/2 minutes and include an
alignment (clustal format, *.aln)
and dendogram information (*.dnd).
The superpositions were generated in Chimera by using the sequence
alignment with and without pruning of CA pairs more than 2.0 A apart.
- S4
database (Structure-Based Sequence Aligments of
SCOP Superfamilies) - 40% ID filtered subset (ASTRAL40).
A drawback is that there may not be sequences for (or sufficiently
similar to) the structures you want to superimpose.
Also, I had problems with browsing the database, so it was easier
to just download the whole thing (available as a tar file).
Alignments are in clustal format (*.aln) which can be used in
Chimera. Sequences are named with ASTRAL domain IDs, so you can auto-load
all domains (if not too many) or just open a subset in the standard way.
Example: S4 "Enolase C-terminal domain-like" superfamily page
Reference:
Casbon J, Saqi MA.
S4: structure-based sequence alignments of SCOP superfamilies.
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D219-22.
- HOMSTRAD
database (Homologous Structure Alignment
Database) - proteins grouped by family (narrower groupings,
likely also alignable with sequence methods including MatchMaker
in Chimera). The advantage of using the database alignments
even if the sequences are easily alignable is that they can contain many
sequences (avoids a step of combining pairwise alignments).
The database also includes superimposed coordinates, but
in the annoying RasMol way (a single file with different chain IDs
but many duplicate atom serial numbers).
I prefer to download the PIR-format alignment and use it in Chimera.
Sequences are named with PDB IDs, so you can auto-load all structures
(if not too many) or just open a subset in the standard way.
Example: HOMSTRAD enolase family page
References:
Mizuguchi K, Deane CM, Blundell TL, Overington JP. (1998)
HOMSTRAD: a database of protein structure alignments for homologous families.
Protein Science 7:2469-2471.
Stebbings LA, Mizuguchi K.
HOMSTRAD: recent developments of the Homologous Protein Structure
Alignment Database.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D203-7.
- DBAli -
contains an all-against-all comparison of protein structures
(September 2005: contains 888,695,103 pairwise structural alignments and
family-based multiple structure alignments for 23,987 non-redundant chains).
Multiple superpositions are presented just as described for SALIGN
above.
Pairwise superpositions are presented similarly,
with sequence alignments downloadable in PIR format and
superposition coordinates downloadable in PDB format
(the two chains are given different IDs but combined into a single model,
i.e., are not separated by END records).
Reference:
Marti-Renom, M.A. Ilyin, V.A. Sali, A.
DBAli:
A database of protein structure alignments.
Bioinformatics 17:746-7 (2001).
- later additions (not evaluated)
Elaine Meng / meng@cgl.ucsf.edu /
home page