MatchMaker

MatchMaker superimposes protein or nucleic acid structures by first creating pairwise sequence alignments, then fitting the aligned residue pairs. Residue types and/or secondary structure information can be used to create the initial sequence alignments. Fitting uses one point per residue. Optionally, a structure-based multiple sequence alignment can be computed after the structures have been superimposed.

Note: if it is already known which residue numbers in one structure should be paired with which residue numbers in the other, another possibility is to use the command match. See superimposing structures for a discussion of the different methods available in Chimera. See also: Match -> Align, Multalign Viewer, the Superpositions and Alignments tutorial, and

Tools for integrated sequence-structure analysis with UCSF Chimera. Meng EC, Pettersen EF, Couch GS, Huang CC, Ferrin TE. BMC Bioinformatics. 2006 Jul 12;7:339.

There are several ways to start MatchMaker, a tool in the Structure Comparison category. MatchMaker is also implemented as the command mmaker (or matchmaker).

The MatchMaker dialog is organized by the main steps to be performed:

generating pairwise sequence alignments
matching, i.e., superimposing the structures according to those pairwise alignments
optionally, creating a multiple sequence alignment from the structural superposition

Save settings writes the current MatchMaker parameters to the preferences file. Reset to defaults resets the dialog to the factory default parameter settings without changing any preferences.

Clicking OK or Apply will start the calculations with or without closing the dialog, respectively. Sequence alignment scores, parameter values, and structure RMSDs will be reported in the Reply Log.

Cancel simply closes the dialog, while Help opens this manual page in a browser window.

← Initial Pairwise Sequence Alignments

Further restrict matching to current selection allows ignoring residues of the reference and/or match structures that are not selected. In general, restriction should only be used in specific cases to suppress results that would otherwise be obtained. For example, two chains that would otherwise align on their N-terminal domains can be forced to align on their C-terminal domains by selecting the C-terminal domains and using the restriction option. Otherwise, restriction is not recommended, because full-length alignments tend to be of higher quality, and iteration already serves to exclude poorly superimposed regions from the final fit. Although unselected parts of matched chains will appear in the resulting sequence alignment (if shown), they have simply been added back in as “filler,” without consideration of how the characters align, after alignment and matching of only the selected residues.

Chain pairing options:

Best-aligning pair of chains between reference and match structure (default) - One reference structure and one or more structures to match should be chosen. For each structure to be matched, the reference-match pair of chains with the highest sequence alignment score will be used.
Specific chain in reference structure with best-aligning chain in match structure - One reference chain and one or more structures to match should be chosen. Individual lines or blocks of lines can be chosen with the left mouse button; Ctrl-click toggles the status of a line. For each structure to be matched, the chain that aligns to the reference chain with the highest sequence alignment score will be used.
Specific chain(s) in reference structure with specific chain(s) in match structure - One or more reference chains should be chosen from the list. For each reference chain chosen, one chain to be matched should be chosen from the corresponding pulldown menu. If multiple chains are to be matched to the same reference chain, it is necessary to match them in separate steps (by choosing the chain to match and then clicking Apply). A given chain cannot be matched to two different reference chains simultaneously, and chains from the same structure (molecule model) cannot simultaneously serve as a reference chain and a chain to match.

Alignment algorithm:

Needleman-Wunsch (default) - global
Smith-Waterman - local

Sequence alignment scoring can include a residue similarity term, a secondary structure term, and gap penalties.

Matrix (default BLOSUM-62) - what substitution matrix to use for the residue similarity part of the score. If an amino acid matrix is chosen, only peptide sequences will be aligned; if a nucleic acid matrix is chosen, only nucleic acid sequences will be aligned. An error message will appear if there are no reference-match pairs of the appropriate type.
Gap penalties: When secondary structure scoring is included, the secondary-structure-specific Gap opening penalties (Intra-helix, Intra-strand, Any other) are used instead of the single Gap opening penalty. The same Gap extension penalty is used, however.
Include secondary structure score (N%) (default on and N=30) - whether to include a secondary structure term in the score, and at what weight relative to the residue similarity term. Show parameters reveals the secondary structure scoring parameters. N reflects the relative weights of the terms, which can be adjusted by moving the slider. If the weight is 30%, for example,
total score = 0.70(residue similarity score) + 0.30(secondary structure score) – gap penalties
Setting the weight to 0% is not the same as turning the option off, however. The values in the secondary structure Scoring matrix (for all pairwise combinations of H helix, S strand, and O other) and the secondary-structure-specific Gap opening penalties can be adjusted. Reset secondary structure scoring parameters to defaults can be used to restore the default values of all secondary structure scoring parameters.
Compute secondary structure assignments (available when secondary structure scoring is used; default on) - whether to first identify helices and strands by running the ksdssp algorithm, overwriting any pre-existing secondary structure assignments (except for CA-only structures, which are automatically skipped). This option may improve superposition results by generating consistent assignments, whereas pre-existing assignments may reflect the use of different criteria on different structures. Ksdssp parameter defaults can be adjusted with the compute SS dialog (opened from the Model Panel).
Show pairwise alignment(s) (default off) - whether to display the resulting pairwise reference-match sequence alignments; each will be shown in a separate Multalign Viewer window. When fit iteration is employed, the pairs used in the final fit will be shown in the alignment as a region (colored boxes) named matched residues. The header named RMSD is automatically shown and other headers hidden. This line shows the spatial variation among residues associated with a column. In the pairwise case, the value is simply the distance between atoms in the two residues associated with a column.
*When the fit has been restricted to selected residues, the unselected residues of matched chains will still appear in the alignment, but merely as a convenient compact representation; how they are aligned is not meaningful.
**These pairwise sequence alignments can be considered a by-product of superposition. Successful superposition only requires these alignments to be partly correct, as incorrect portions tend to be excluded from the fit during iteration. If the sequences are easy to align (highly similar), the sequence alignments are likely to be correct throughout. However, if the sequences are more distantly related, parts of the alignments may be incorrect even when a successful superposition is produced. In those cases, a structure-based alignment should be superior.

← Matching

Fitting uses one point per residue: CA atoms in amino acids and C4' atoms in nucleic acids. If a nucleic acid residue lacks a C4' atom (some lower-resolution structures are P traces), its P atom will be paired with the P atom of the aligned residue.

Iterate by pruning long atom pairs until no pair exceeds [x] angstroms (default on and x=2.0) - whether to iteratively remove far-apart residue pairs from the “match list” used to superimpose the structures. This does not change the initial sequence alignment, but restricts which columns of that alignment will be used in the final fit. Otherwise, all of the columns containing both sequences (i.e. without a gap) will be used. In each cycle of iteration, atom pairs are removed from the match list and the remaining pairs are fitted, until no matched pair is more than x Å apart. The atom pairs removed are either the 10% farthest apart of all pairs or the 50% farthest apart of all pairs exceeding the cutoff, whichever is the lesser number of pairs. Iteration tends to exclude sequence-aligned but conformationally dissimilar regions such as flexible loops, allowing a tighter fit of the best-matching "core" regions.

Regardless of which chain(s) in a model to be matched are aligned in sequence to the reference, the entire model will be reoriented.

← Final Structure-Based Sequence Alignment

If MatchMaker is used simply to superimpose structures, this step can be omitted. However, if one also wants a corresponding structure-based sequence alignment, this step is recommended, especially if the sequences are dissimilar.

After superposition, compute structure-based multiple sequence alignment (default off) - call Match -> Align to generate a sequence alignment consistent with the superposition. If not called with this option, Match -> Align can still be started later independently.

Calculating a structure-based alignment can take several minutes, depending on the number of structures, but there are advantages:

it can provide better statistics for describing structural similarity (RMSD, etc.) because more alignment columns are correct
it can produce a multiple sequence alignment, whereas the initial sequence alignments are only pairwise

The output sequence alignment is automatically shown in Multalign Viewer and can be saved to a file from that tool. The fully populated columns are highlighted as a region (colored boxes). Clicking the region will select the corresponding parts of the structures, in effect their common cores. The header named RMSD shows the spatial variation per column.

Notes

Meaning of 0% secondary structure score. Turning off Include secondary structure score is not the same as moving the slider to zero with the option turned on. When the option is on:

The secondary-structure-specific gap opening penalties are used regardless of the slider position.
If Compute secondary structure assignments is also turned on, ksdssp is run and may change pre-existing secondary structure assignments.

UCSF Computer Graphics Laboratory / November 2011