Online Sources of Sequence Alignments

This is by no means an exhaustive list, but includes several sources of protein multiple sequence alignments for use in Chimera.** Why you might want such alignments:

I would then work with the sequences and structure(s) using Chimera, for example to: evaluate conservation, superimpose structures, evaluate conformational variability, morph between different structures. Related Chimera tutorials: Sequences and Structures, Superpositions and Alignments

**Chimera is not meant to handle large multiple sequence alignments (those containing several hundreds to thousands of sequences). A good tool for working with large alignments is Jalview. It allows redundancy-filtering and saving a smaller alignment in various formats (aligned FASTA, etc.) that can be read into Chimera. Jalview also calculates per-column conservation directly and via AACon web service; these annotation values can be exported as CSV, but it can be difficult to reformat them properly into a sequence alignment header file (with the same number of positions as the filtered alignment) or residue attribute file for use in Chimera. A direct connection between Jalview and Chimera is under development.

The list is organized as follows:

Descriptions here are minimal. I recommend consulting literature references and/or other documentation provided on the sites to better understand their contents and methodology.

Points to consider:

(A) Alignments containing proteins of known structure

These databases contain sequence alignments of proteins with experimentally determined 3D structures. Typically the names in the alignment are structure identifiers, which makes it easy to autofetch all the structures with a single step in Chimera (from the sequence alignment window, choose Structure... Load Structures). Of course, you can just fetch a subset of the structures individually with the open command or File... Fetch by ID.

(B) Alignments that do not necessarily contain proteins of known structure

If the corresponding tree in New Hampshire (aka Newick) format is available, it can be loaded after the sequence alignment has been opened.

(C) Server-generated multiple alignment from a single input

(D) DIY: Find sequences individually, use alignment server

Issues to consider are how diverse the set of sequences should be, alignment quality, and balance, i.e. an alignment could oversample some areas of the intended “sequence space” and undersample others. Imbalance can be reduced by filtering out sequences at some level of sequence identity, and in Chimera, using sequence-weighting options to calculate conservation.

I used the DIY approach to make the alignments in the Chimera “hormone-receptor complex” demo (under Tools... Demos in the menu) because I wanted to include sequences for the hormone and receptor from the same six species. The sequences were similar enough to align easily, so I didn't have to worry about tweaking parameters to improve the results.

Look up sequences (I usually save or text-edit the sequences into a single FASTA file):

Use a server to align them (order is merely alphabetical):
July 2014 / meng[at]cgl.ucsf.edu / home page