Elaine Meng meng at cgl.ucsf.edu
Fri Jul 20 17:44:24 PDT 2012

Hi Anand,
You should always use your own scientific judgement and knowledge of the system (your proteins of interest), rather than to expect perfect results from any algorithm.  There may not be any "perfect" set of parameters, and which settings are best will vary from system to system.

Even if the 3D superposition is as good as possible, there may be flexible regions that are not spatially aligned even though they are homologous.  Conversely, there can be regions that overlap in 3D even though they are not homologous, because of conformational changes or insertions/deletions in one protein relative to the other.

In other words, even given the best possible superposition from MatchMaker followed by a completely correct mapping from 3D to sequence alignment by Match->Align, there may be places in the alignment that you need to correct manually based on your knowledge of the relationships among the proteins. You can do some manual editing of the sequence alignment in Chimera:

Match->Align only pays attention to spatial proximity of the CA atoms, not the residue types. In this case, the CA atom of the W residue in 2E31 is a little closer in general to the CA atoms of the L residues of the other structures instead of their (in all likelihood homologous) W residues, even though the W residues are also very close.  This does not mean the structural superposition is wrong, only that the conformation of that peptide is a little different from the others.  This is more likely to occur near flexible termini and loops.

If you want the columns of the alignment to reflect homology, just manually edit it to make the LW motifs line up.

Another possibility is to use a program that incorporates both types of information (3D proximity and residue types) to generate a sequence alignment from a structural superposition, and see if it works better for you. I know of one such program, Staccato:

However, it looks like the Chimera approach was highly successful for the most part in generating your multiple sequence alignment.  I hope this helps,
Elaine C. Meng, Ph.D.                       
UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab
Department of Pharmaceutical Chemistry
University of California, San Francisco

On Jul 20, 2012, at 3:40 PM, Anand K S Rao wrote:

> Greetings to all CHIMERA Forum members / users / bug-fixers
> I am PhD student working on trying to identify F-box domains in plant proteomes, primarily via sequence based approaches like HMM profile based discovery. In order to build the profile HMM, I am building my seed alignment with the use of a structural alignment of solved crystal structures of human F-box proteins.
> Specifically, I seek some help from you or any of your lab members regarding what might possibly be a bug in your CHIMERA, and also help with protocol / parameter choice for both Matchmaker and Match->Align work I am doing. The PDBs trimmed to just the domain regions of my interest are attached as zipped folder #1.
> When I use default parameters for Matchmaker followed by default parameters for Match->Align, I obtain a multiple sequence alignment that appears well conserved except at the N and C terminals (based on RMSD track in the output EPS file) - However, for 2E31-A, the LW di-peptide (in the 2nd row of the alignment) is matched with the rest of the multipel sequence alignment. Please see attachment #2 - this mismatch in the MSA, is actually a structural match as seen via CHIMERA and PyMOL. 
> So my question #1 is how can I be sure that other residues have been accurately aligned in 3D and then accurately aligned in the 2D MSA? If this even an artifact, is it caused more by some error in my choice of methods / parameters or is this indeed a mis-alignment that should be all alignment LW at those two positions? (which is how it looked even in PyMol after structural superposition)
> And my question #2 is what are the most appropriate parameters to be used for the purposes for my Match -> Align work? Should I use iteration for Matchmaker or not, if so how many or until convergence, if yes or otherwise, what should the residue RMSD not exceed? Should I retain it as the default 5A or do down to a conservative 1A instead?
> Thanks for any help in this regard.
> Sincerely,
> Anand

