Online Sources of Sequence Alignments

This is by no means an exhaustive list, but includes several sources of protein multiple sequence alignments for use in Chimera. Why you might want such alignments:

I would then work with the sequences and structure(s) using Chimera, for example to: evaluate conservation, superimpose structures, evaluate conformational variability, morph between different structures. Be aware that different sources have different definitions of family and superfamily, but everyone agrees that a family is more closely related, a superfamily more diverse.

The list is organized as follows:

Descriptions here are minimal. I recommend consulting literature references and/or other documentation provided on the sites to better understand their contents and methodology.

Points to consider:

(A) Alignments containing proteins of known structure

These databases contain sequence alignments of proteins with experimentally determined 3D structures. Typically the names in the alignment are structure identifiers, which makes it easy to autofetch all the structures with a single step in Chimera (from the sequence alignment window, choose Structure... Load Structures). Of course, you can just fetch a subset of the structures individually with the open command or File... Fetch by ID.

(B) Alignments that do not necessarily contain proteins of known structure

If the corresponding tree in New Hampshire (aka Newick) format is available, it can be loaded after the sequence alignment has been opened.

(C) Server-generated multiple alignment from a single input

(D) DIY: Find sequences individually, use alignment server

Issues to consider are how diverse the set of sequences should be, alignment quality, and balance, i.e. an alignment could oversample some areas of the intended “sequence space” and undersample others. Imbalance can be reduced by filtering out sequences at some level of sequence identity, and in Chimera, using sequence-weighting options to calculate conservation.

I used the DIY approach to make the alignments in the Chimera “hormone-receptor complex” demo (under Tools... Demos in the menu) because I wanted to include sequences for the hormone and receptor from the same six species. The sequences were similar enough to align easily, so I didn't have to worry about tweaking parameters to improve the results.

Look up sequences (I usually save or text-edit the sequences into a single FASTA file):

Use a server to align them (order is merely alphabetical):
November 2013 / meng[at] / home page