[Chimera-users] Restricting "select by conservation" to a subset of sequences in a multiple alignment
olibclarke at gmail.com
Wed Feb 18 09:43:36 PST 2015
Many thanks for the detailed explanation - it certainly looks like there are some useful features here that I didn’t know about, and I will definitely take advantage of them in future.
With regards to the conservation similarity features, I will read further, but I don’t quite know how to interpret the mavConservation attribute when conservation is calculated using the AL2CO method - it doesn’t seem to be a simple "percentage of sequences that are identical at this position” like it is otherwise - it does not range between 0 and 1, for example.
What I would like, in addition to the present options, is have the option to take into account similarity, not just identity, when calculating mavConservation (or perhaps have a separate attribute, mavSimilar) - so that rather than mavConservation being the percentage of sequences that are identical at a given position, instead it would be the percentage of residues that are similar at a given position.
> On Feb 18, 2015, at 12:11 PM, Elaine Meng <meng at cgl.ucsf.edu> wrote:
> Hi Oliver,
> There are some capabilities that at least partially cover these requests, which I'll try to outline here…
> (1) Subset analysis. If you have a tree being shown with your alignment, you can click a node in the tree and choose to open that branch of the alignment in a separate window. Then you can analyze that subset of sequences using the separate Multalign Viewer window. We don't have anything to limit analysis to a subset of what's in a sequence window, though. Trees to go with alignments are available from several sources, such as PFAM. You open the alignment, then from the alignment window menu choose Tree… Load, then browse to the tree file.
> (2) Conservation calculation options. Use Multalign Viewer menu: Preferences… Headers, and set Conservation style to AL2CO. Then you have a number of sophisticated choices for calculating conservation: entropy measure, variability measure, and sum-of-pairs with many choices of which AA similarity matrix to use, with or without sequence weighting and smoothing over windows. I'm continually wishing more people would make use of these, and try to mention them in all of my tutorials! The AL2CO options come from the program of that name provided by the Grishin group.
> Theoretically you could even make your own matrix and put it in the proper place for discovery by the code, although that would be a lot of work. Probably one of the existing matrices would be suitable, there are several BLOSUM, some PAM, and some structure-derived (SDM, HSDM). I added the latter fairly recently.
> You can also calculate your own custom values externally from Chimera and then load them in from a "header file," to be displayed above the alignment just like the built-in Conservation.
> See tutorial: Mapping Sequence Conservation on Structures with Chimera
> I hope this helps,
> Elaine C. Meng, Ph.D.
> UCSF Computer Graphics Lab (Chimera team) and Babbitt Lab
> Department of Pharmaceutical Chemistry
> University of California, San Francisco
> On Feb 18, 2015, at 7:39 AM, Oliver Clarke <olibclarke at gmail.com> wrote:
>> Hi, I very much like the sequence tools within Chimera - I find the capacity to load in multiple alignments and select by conservation particularly handy.
>> I have two suggestions that I would love to see in a future version of chimera (or maybe they are already in there and I don’t know about them!):
>> 1. Selection of sub-groups of sequences within a multiple alignment for selecting residues by conservation. This would enable the easy selection/identification of residues that are specifically conserved within individual families with different functions or substrates - for example, a position that is an arginine, and always an arginine, in one subgroup of proteins that we know have a particular function, and always a glutamate in another group of proteins that we know possess a different but related function. It is possible to do this currently by loading in multiple different sequence alignments and saving and intersecting selections, but having a popup menu in the select/render by attribute tab where you could select or deselect sequences that contribute to the analysis of conservation would make it much easier.
>> 2. Optional addition of sequence similarity, as well as identity, to the analysis of sequence conservation would make analysis of alignments including quite divergent sequences much more meaningful. Even better would be if the definition of similar groups for this purpose was user-adjustable and had allowance for overlapping groups of similar residues.
>> Oliver Clarke.
More information about the Chimera-users