Online Structure Alignment Resources

Online Structure Alignment Resources (April 2005; later additions as noted)

This list is not meant to be comprehensive, but includes a reasonably broad sample of sites I think are handy and convenient to use.

The test cases involve structures with low pairwise sequence identity:

Pairwise - 2mnr (mandelate racemase) 4enl (enolase)
Multiple - 2mnr 4enl 1nu5 (MLE II)

Default settings generally used.

Note that the MatchMaker tool in Chimera performs a similar function: it constructs a sequence alignment and superimposes the structures accordingly. Secondary structure can be used to help construct the alignment, which allows pairs with lower percent identity to be superimposed correctly (see results from using MatchMaker).

Superposition Servers

FAST - pairwise; paper compares the method to DaliLite, CE, and K2.
Reference: Zhu J, Weng Z. FAST: a novel protein structure alignment algorithm. Proteins. 2005 Feb 15;58(3):618-27.
Availability: server; executables for Linux, IBM, Sun, Alpha, Mac OS X can be downloaded
Performance: ~5 seconds for pairwise. Shows a sequence alignment in the web page; superimposed coordinates are available within the downloadable RasMol script (puts both into chains into one model and makes them A and B; Chimera complains about the bad PDB records but will display the structures nonetheless).
MultiProt - truly multiple rather than assembly of pairwise alignments; can handle circular permutation. Method seems elaborate (opinion from reading the paper) but works well.

Reference: Shatsky M, Nussinov R, Wolfson HJ. A method for simultaneous alignment of multiple protein structures. Proteins. 2004 Jul 1;56(1):143-56.
Availability: server; can be downloaded for Linux
Performance: ~28 seconds for 3-way using sequence order. Reports several alignments: five of all three, five with various pairings. I took the 3-way alignment with the most residues aligned. Coordinates are combined in a single file but come out as separate submodels in Chimera (#0.1, #0.2, #0.3).
SuperPose - multiple; authors claim the server is especially user-friendly.

Reference: Maiti R, Van Domselaar GH, Zhang H, Wishart DS. SuperPose: a simple server for sophisticated structural superposition. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W590-4.
Availability: server (uses a number of other programs including VADAR and ClustalW)
Performance: ~25 seconds for pairwise (not including waiting for the display applet). Results page allows download of superposition in PDB format, sequence alignment, and RMSD information. Coordinates are combined in a single file but come out as separate submodels in Chimera (#0.1, #0.2, #0.3). Only two files can be uploaded, so according to the instructions you have to combine input files to superimpose more than two structures. I had trouble getting this to work. The support person said the structures must have chain IDs; however, adding chain IDs to my input did not work. Only taking the advice to use the server to superimpose two and then using that output (which, incidentally, did NOT include chain IDs) as one of the inputs for another round of superposition worked.
SALIGN (updated June 2012) - multiple

Reference: Braberg H, Webb B, Tjioe E, Pieper U, Sali A, Madhsudhan MS. SALIGN: A Webserver for alignment of multiple protein sequences and structures. Bioinformatics. 2012 May 21. [Epub ahead of print]
Availability: server or DBAli lookup
Performance: Notification of server calculations is by email generally within an hour, depending on the server load, and the results page includes a link to “launch Chimera” to display the superimposed structures and sequence alignment. DBAli database lookup takes a few seconds and the resulting web page shows the superposition in a Jmol window and a sequence alignment. The sequence alignment can be downloaded in PIR, FASTA, or PAP format, and a PDB file of the superimposed coordinates can be downloaded. Coordinates are combined in a single file but come out as separate submodels in Chimera (#0.1, #0.2, #0.3). These were not in the order of specification in the input file, but the submodels were also given chain IDs (A, B, C) that did reflect that order. Rather than using that PDB file, I find it easier to open the alignment in Chimera, autoload the structures from the PDB (since the sequences are named with the PDB IDs), and match the structures using the sequences, because this retains the original chain IDs plus ligands, ions, etc.
K2, K2SA - pairwise; the program (but not the server?) can handle circular permutation.
Reference: Szustakowski JD and Weng Z. Protein structure alignment using a genetic algorithm. Proteins, 38(4):428-440, Mar 2000.
Availability: server
Performance: ~30 seconds using sequential constraints. Can download coordinates for second protein (transformed to match first protein) and a log file.
CE (pairwise), CE-MC (multiple, building upon pairwise CE alignments)

CE CE-MC

References: Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998 Sep;11(9):739-47.

Guda C, Lu S, Scheeff ED, Bourne PE, Shindyalov IN. CE-MC: a multiple protein structure alignment server. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W100-3.
Availability: CE and CE-MC servers used in test no longer available, but see jCE server and software download
Performance: ~55 seconds for pairwise (CE server). ~4 minutes for multiple alignment (CE-MC); e-mail address required for this option. Results include structure-based sequence alignment and single downloadable PDB in which the structures are given different chain IDs (Chimera complains about duplicate atom serial numbers but displays the structures nonetheless). In the CE-MC results, the residue numbers are shifted over one column from where they should be, so the file has to be edited before viewing. The support person said the problem would be fixed in the next update.
DaliLite - pairwise (the Dali server instead compares a single input structure against a representative set from the PDB, like VAST Search)

Reference: Holm L, Park J. DaliLite workbench for protein structure comparison. Bioinformatics. 2000 Jun;16(6):566-7.
Availability: server used in test no longer available, but see newer-version DaliLite server
Performance: ~1 minute; reports several alignments and their stats; for each, coordinates can be downloaded for the second protein. The figure shows the superposition with the most residues aligned.
Protein3Dfit (added July 2005) - pairwise
Reference: Lessel U, Schomburg D. Similarities between protein 3-D structures. Protein Engineering. 1994 Oct;7(10):1175-87.
Availability: server (URL updated Feb 2009)
Performance: ~25 seconds for pairwise (default settings; various parameters can be adjusted by the user). The resulting web page shows the superposition in a Jmol window plus a sequence alignment and RMSD table. Links include many different ways (convenient!) to download the results. One way to save the superposition is by downloading the transformed coordinates of the second structure, and the sequence alignment can be saved in PIR format.
CLICK (added Oct 2011) - pairwise, sequence-order-independent, multiple solutions returned
Reference: Nguyen MN, Tan KP, Madhusudhan MS. CLICK--topology-independent comparison of biomolecular 3D structures. Nucleic Acids Res. 2011 Jul;39(Web Server issue):W24-8.
Availability: server
Performance: ~35 seconds for pairwise (default settings; parameters can be adjusted). The resulting web page shows solutions in order of decreasing % structure overlap, each with a Jmol window and (fragmentary) sequence alignment. Results can be downloaded as PDB files of the superimposed structures.
later additions, not evaluated:
- ARTS (for nucleic acid structures)
- FATCAT (Flexible structure AlignmenT by Chaining Aligned fragment pairs allowing Twists; i.e., one of the structures may be split into pieces)
- POSA (flexible; multiple; e-mail address requested but not mandatory)
- RCSB PDB protein comparison tool (pairwise comparison with java-FATCAT, java-CE, FATCAT, Mammoth, TM-Align, TopMatch, or sequence alignment methods)
- GANGSTA+ (seq-order-independent; server options require e-mail address)
- TopMatch (successor of ProSup) (see also TopSearch for multiprotein complexes)
- PDBj superposition servers (RASH and GASH) - seek to maximize NER (number of equivalent residues) at 4 Å cutoff
- C-alpha match (pairwise, seq-order-independent)
- PDBeFold
- 3D-SS (choice of method: STAMP or ProFit)
see also Structural alignment software at Wikipedia

Circularly Permuted Test System (May 2005)

Test structures: 1qq5 A chain (HAD) 3chy (CheY)

These are in different superfamilies according to SCOP, but it has been noted that the structures appear to be related by circular permutation (see paper by Ridder and Dijkstra). The active site residues are in about the same positions in 3D, although in different N→C orders. Selected active site residues (those in a column are aligned in 3D):

   -----------------------------------------------------------
                loop 1     loop 2     loop 3     loop4       
   -----------------------------------------------------------
   2-HAD X. autrophicus                                  *
                 D8         S114       K147      171SS---D176
   epoxide hydrolase M. musculus, H. sapiens        *
                 D9         T123       K160      184DD---N189
   CheY E. coli, S. typhimurium                     *
                D57         T87        K109       12DD13
   -----------------------------------------------------------

superimposed manually	MultiProt (~10-15 seconds) (the choice with the most residues aligned and as a tie-breaker, the lowest RMSD)	K2SA (~15-20 seconds) (wrong, see log file)

The Multiprot results correctly place the corresponding active site residues together; the K2SA results are shifted over by one strand in the central sheet. NB: The K2SA server appears to be imposing sequential constraints even when I set the option to NOT do so... which would explain the failure on this test case.

Sources of Structure-Based Sequence Alignments

(which can then be used in Chimera's Multalign Viewer to match the corresponding structures) Listed here are Web servers that generate alignments from your input. Databases of precalculated alignments are listed elsewhere.

3DCoffee - multiple; primarily intended to generate sequence alignments but can use 3D structures to do so; can also combine multiple sequence alignments

Multalign Viewer matching defaults with iteration (pruning)

Reference: Poirot O, Suhre K, Abergel C, O'Toole E, Notredame C. 3DCoffee: a web server for mixing Sequences and Structures into multiple sequence alignments. Nucleic Acids Research 2004 Jul 1;32(Web Server issue):W37-40.
Availability: TCoffee server (includes 3DCoffee option)
Performance: The three PDBs were submitted using the Advanced 3DCoffee page. Results were returned in ~4 1/2 minutes and include an alignment (clustal format, *.aln) and dendogram information (*.dnd). The superpositions were generated in Chimera by using the sequence alignment with and without pruning of CA pairs more than 2.0 A apart.
PROMALS3D - web server takes input FASTA sequences and/or PDB structures, generates multiple sequence alignments in a few formats (see my Mar 08 journal club presentation)

meng[at]cgl.ucsf.edu / home page


Multalign Viewer matching defaults	with iteration (pruning)


CE	CE-MC