Multalign Viewer (Sequence)

Multalign Viewer (Sequence )

Multalign Viewer shows amino acid and nucleotide sequences:

multiple sequence alignments
- read from files (several formats)
- created by other tools in Chimera (MatchMaker, Match -> Align)
- pseudo-multiple alignments from Blast Protein
individual sequences of structures in Chimera (when started as Sequence)

The sequences are compared to structures in Chimera and automatically associated as appropriate. Associated structures are not required, but when they are present, selecting in the sequence with the mouse selects the corresponding parts of the structures and vice versa, the structures can be matched based on the sequence alignment, and so on. Measures of sequence conservation and spatial variability (of associated structures) can be shown, and there are several options for adjusting the display and saving results. Multalign Viewer status and alignment data are included in saved sessions.

See also: Blast Protein, the Sequences and Structures tutorial, the Alignments tutorial, and the following reference:

Tools for integrated sequence-structure analysis with UCSF Chimera. Meng EC, Pettersen EF, Couch GS, Huang CC, Ferrin TE. BMC Bioinformatics. 2006 Jul 12;7:339.

STARTUP AND INPUT

There are several ways to start Multalign Viewer, a tool in the Sequence category. Explicitly starting Multalign Viewer brings up a dialog for opening a sequence alignment. In fact, simply opening a file in any of the registered sequence alignment formats (without starting Multalign Viewer first) will automatically use it to show the alignment. Several alignments can be open at the same time, each in a separate sequence window.

Individual sequences of structures in Chimera can be shown:

by starting the Sequence tool (Sequence category)
by choosing one or more molecule models in the left side of the Model Panel and clicking the sequence... button in the right side of the panel

If there is more than one bonded multiresidue chain in the model(s), the chains will be listed in a dialog for designating which sequences to Show. The sequence of each chosen chain will be shown in a separate window, including any residues not in the structure itself but in SEQRES records in the input structure file. Residues with standard names are assigned the corresponding standard one-letter codes, while certain nonstandard residues are assigned the same one-letter codes as closely related standard residues:

ASH (protonated ASP) = D
CYX (disulfide-bonded CYS) = C
GLH (protonated GLU) = E
HID/HIE/HIP (different protonation states of HIS) = H
HYP (hydroxyproline) = P
MSE (selenomethionine) = M
+N (methylated nucleic acid) = N, where N = A/C/G/T/U
additional relationships described in MODRES records in PDB input files

THE SEQUENCE WINDOW

The figure shows part of the sequence window contents for the input file apoex.fa. In addition to sequences and their names, numbering and header lines may be displayed. Font size, sequence wrapping behavior, and the coloring of residue one-letter codes can be controlled in the Appearance preferences. Boxes enclosing one or more residues are called alignment regions. One can search for occurrences of a particular string or pattern of residues in one or all of the sequences using Edit... Find Subsequence or Edit... Find PROSITE Pattern in the Multalign Viewer menu.

When the cursor is in the sequence window, the Page Down key (or space) moves the view down to start with the block below the topmost block whose beginning is currently visible; Page Up (or Shift-space) moves the view up to start with the block above the topmost block whose beginning is currently visible.

The Hide button closes the sequence window without changing the state of Multalign Viewer. The sequence window can be reinvoked using the Raise option for the Multalign Viewer instance in the Tools menu (abbreviated MAV). This is also useful when the window has become obscured by other windows. Help opens this manual page in a browser window, and Quit exits from Multalign Viewer.

NUMBERING

Numbering can be displayed over the alignment and to the left and/or right of the sequences. Numbering displays can be controlled with the Numberings menu.

Alignment numbering (for the alignment as a whole) includes gap positions and always starts at 1. Whether it should be shown initially can be set in the the Headers preferences. When present, alignment numbering is shown above any headers.

Sequence numbering (for individual sequences) includes only non-gap positions and can start with numbers other than 1. Sequence start numbers are set to 1 when an alignment is opened, except:

when a sequence is generated from a structure chain in Chimera (for example, with Match -> Align), its start number is set to the number of the first residue in the structure chain
when all sequence names in the alignment are of the form string/N–M, the trailing /N–M is stripped from the names and the sequence start numbers are set to the respective values of N

Sequence start numbers can be changed arbitrarily with Numberings... Adjust Sequence Numberings. Even with a correct start number, however, a sequence's numbering will be incorrect following any missing internal residues (for example, in a sequence derived from a structure missing a loop). When an alignment is saved, /N–M can be appended to each sequence name, where N is the sequence's start number and M is calculated from N and the total number of residues in the sequence.

ALIGNMENT HEADERS

Headers are lines of information shown above an alignment in the sequence window. They can be hidden/shown with the checkboxes in the Multalign Viewer Headers menu. How headers are calculated and which are shown initially (for alignments, not single sequences) can be controlled in the Headers preferences. Note that header usefulness will depend on what sequences are included in an alignment and whether they are aligned correctly.

The contents of a header can be character, a single character per column, or numeric, displayed as a histogram. If the values in a numeric header fall within the range of 0 to 1, they will be used directly as histogram bar heights. If the range extends below 0 or above 1, the values will be converted into histogram bar heights ranging from 0 to 1 according to:

for values ≥ 0, 1 – ½(e^–
value)

for values < 0, ½(e^value)

However, the original values (not those used for histogram display) are assigned as residue attributes and written out when header contents are saved.

Headers may be dynamic, with values that update automatically when independent variables such as the sequence alignment or structure associations are changed, or static, with values that stay the same.

The following dynamic headers are available by default:

Consensus - consensus sequence as defined in the Headers preferences
Conservation - conservation as specified in the Headers preferences
Charge variation - range of residue formal charges, assuming –1 for D/E, +1 for H/K/R, and 0 for all other types
RMSD - spatial variation among the associated structures; root-mean-square distance using one point per residue associated with the column. No value is calculated for columns with fewer than two associated structure residues.

Arbitrary, user-defined headers can also be shown:

Static headers can be added by reading in a header definition file with Headers... Load in the Multalign Viewer menu. Loading values for a header with the same name as an existing header will create an additional header line, not replace the values in the existing header line. GC (Generic per-Column) annotations in alignments in Stockholm format are also treated as static headers.
Dynamic headers can be defined with Python code. For examples, see the file share/MAVHeader/ChimeraExtension.py within the Chimera installation (it defines the Charge variation and RMSD headers described above). Rather than changing this file in the MAVHeader module, which might be overwritten whenever a new version of Chimera is installed, users should create an analogous file named ChimeraExtension.py in a different directory. The Tools preferences can be used to add that directory's location (the directory above it) to the places Chimera should look for extensions.

The following apply to headers other than alignment numbering:

Headers can be reordered (a header line is placed below the others when turned on with the Headers menu).
Header line contents can be saved to a file (Headers... Save in the Multalign Viewer menu).
Clicking within a header selects any structure residues associated with that column of the alignment.
Values of headers are available as residue attributes named mavHeaderName in any associated structures. Numerical attributes will appear in the attribute lists of Render/Select by Attribute; character attributes will be listed only in the Select by Attribute portion. It may be necessary to Refresh the attribute menus or values in these tools to update them with any changes in Multalign Viewer header information.

ALIGNMENT REGIONS

Alignment regions can be defined manually or by any of several operations within Multalign Viewer. A region is shown using a colored rectangle or outline in the sequence window; a single region can contain any number of disjoint and/or abutting rectangular blocks. Regions should be created as desired after any sequence reordering, as that operation will delete all existing regions.

A region can be created manually by dragging with the left mouse button within the sequence window. Dragging downward into the following block highlights to the end of the preceding block. The initial region colors can be controlled in the Regions preferences. Shift-dragging with the left mouse button adds to the active region. Ctrl-dragging creates a new region and makes it the active region.

The active region is the region most recently created manually, clicked on in the sequence window, or designated as Active in the Region Browser. The corresponding parts of any associated structures are highlighted in the main Chimera window by becoming selected. Only one region can be active at a time. In the sequence window, the active region is indicated with a dashed outline; clicking the active region deactivates it, and clicking a different region deactivates the former active region and makes the new region active. A region with no interior color is only responsive to clicks on its borders. Where regions overlap, only the highest is responsive to clicks.

The Region Browser (Tools... Region Browser in the Multalign Viewer menu) can be used to manage the existing regions. Checkboxes reflect and control which region is Active (if any) and which are Shown in the sequence window. Clicking an entry in the Region Browser highlights the line and chooses the region. More than one region can be chosen at a time. The chosen region(s) are not necessarily active.

The Border color and Interior color of the chosen region(s) can be changed using the adjacent color wells. One can specify whether the chosen region(s) can Include gaps; when this setting is false, only residues and not gap positions will be enclosed. Although a given region may then appear as disjoint blocks, it will still be a single region.

The Delete button on the Region Browser deletes the chosen region(s). The region named Chimera selection cannot be deleted, but it can include zero residues. Since regions may overlap, a region can be considered higher or lower than another. Raise puts the chosen region(s) in front of any other regions in the sequence window and moves the corresponding entries in the Region Browser to the top. Lower puts the chosen region(s) behind any overlapping regions in the sequence window and moves the corresponding entries in the Region Browser to to the bottom. Copy copies the chosen region(s) and requests a new name for each copy, while Rename requests a new name for each chosen region.

Close dismisses the Region Browser and Help opens this manual page in a browser window.

Alternatively, when the cursor focus is in the sequence window, the up arrow and down arrow keys can be used to raise and lower the active region, respectively; pressing the Delete key will delete the active region.

Besides manual creation, there are several ways to generate regions or change their contents:

by searching for occurrences of a particular string or pattern of residues
by reading a sequence coloring format (SCF) file or seqsel file (Tools... Load SCF/Seqsel File) generated by the program JEvTrace. The two formats are slightly different, but both contain information for coloring regions in the sequence and in any associated structures. The file reader automatically determines which format is being used. The formats are simple enough that users wishing to color residues alike in sequence and structure (for any reason) may wish to create a file manually and read it in by this route.
as a by-product of sequence-structure association
by selection within sequence-associated structures
by secondary structure (actual in structure-associated sequences, predicted in sequences not associated with structures)

TREES

A Newick-format tree corresponding to the alignment can be read and displayed to the left of the sequences with Tree... Load in the Multalign Viewer menu. A newly loaded tree will replace any previously shown tree. The tree can be hidden/shown with Tree... Show Tree.

The sequence names in the tree must be the same as those in the alignment, and neither set of names can be a subset of the other. If the order of sequences in the tree differs from that in the alignment, a dialog will appear, requesting permission to reorder the alignment sequences for clearer display (to avoid tree branch crossings). Reordering will delete all region information, although a region reflecting the current Chimera selection (if any) will subsequently (re)appear.

When the tree includes branch lengths, both solid and dotted lines are used for display, with horizontal solid line lengths proportional to branch lengths. When the tree does not include branch lengths, only solid lines are used.

Clicking a node in the tree makes it red and chooses it for subsequent operations. Tree... Extract Subalignment copies the sequences defined by the chosen node into a separate sequence window. The alignment of the sequences is kept exactly the same as in the full alignment, including any all-gap columns, so that the subalignment is initially the same length as the full alignment. If desired, all-gap columns can be deleted with the Edit menu.

EDITING AND SAVING THE ALIGNMENT

Alignment editing allows rearranging residues and gaps without changing the component sequences. Residues cannot be created, deleted, mutated, or reordered within a sequence, but gap positions can be created and deleted.

Edit... Show Editing Keys in the Multalign Viewer menu brings up a dialog summarizing Ctrl-key editing functions. The active region can be moved one column at a time with Ctrl-left arrow and Ctrl-right arrow, or as far as possible in one step with Shift-Ctrl-arrow (until the active region abuts either some other residues or the end of the alignment). If the active region already abuts the alignment start or end, new columns will be created as needed to accommodate the movement. Such movements can be undone step-by-step with Ctrl-down arrow or collectively with Ctrl-Esc. Only Ctrl-key editing can be undone, not the larger-scale operations described below (sequence addition, deletion, and reordering). Such larger-scale operations clear the editing history, but otherwise all Ctrl-key editing operations are retained. A previously undone movement can be redone with Ctrl-up arrow. Undo/redo repositions residues but not the active region box, since the active region could have been changed during editing.

Sequences can be added, reordered, and deleted and gap columns added and deleted using the Multalign Viewer Edit menu. Sequence reordering will delete all region information, although a region reflecting the current Chimera selection (if any) will subsequently (re)appear.

The Edit menu also allows copying a sequence as plain text so it can be pasted into some other application window.

Choosing File... Save As from the Multalign Viewer menu brings up a dialog for saving the sequence alignment to a file. It is possible to save just the active region to a file; the region may consist of disjoint sections, but each sequence (row) included must contain the same set of columns. All-gap columns (for example, arising when a region to be saved includes only a subset of the sequences) can be omitted from the output, and numbering can be appended to the sequence names. The sequence alignment formats available for saving are the same as those that can be read. An alignment saved in Stockholm format will automatically include annotations describing the secondary structure of any associated structures.

The sequence window contents can also be saved as an EPS file (File... Save EPS in the Multalign Viewer menu).

PERCENT IDENTITY

The percent identity between any pair of sequences in the alignment can be calculated. Tools... Percent Identity in the Multalign Viewer menu opens a dialog for specifying the sequences (by name from the pulldown menus labeled Compare and with) and what to divide by:

shorter sequence length (default)
longer sequence length
non-gap columns in common

The order in which the sequences are specified is not important.

Apply calculates the percent identity without dismissing the dialog, while OK calculates the percent identity and dismisses the dialog. The percent identity is reported in the status area near the bottom of the sequence window and the Chimera Reply Log. Close simply dismisses the dialog, and Help opens this manual page in a browser window.

ANNOTATIONS

Annotations and comments associated with an alignment as a whole (as opposed to individual columns or sequences) can be viewed and edited using Edit... Alignment Annotations in the Multalign Viewer menu. Changes are not assessed for correctness.

Alignment annotations are shown in the upper part of the dialog. Each has two parts, a name and a value (generally text). For example, MSF length is a possible name and 366 is a possible associated value. A new annotation can be added by clicking New, specifying a name, and then entering a value in the adjacent field. Values can have multiple lines even though only one is shown; pressing return in a value field starts a new line, but the previous information is retained. Clicking Delete and then choosing an annotation name removes the corresponding annotation.

Alignment comments, shown in the lower part of the dialog, may consist of multiple lines of free-form text.

Not all sequence alignment formats can accommodate annotations. Thus, saved files may not include this information or reflect any changes that have been made. Only the Stockholm format accommodates arbitrary annotations, whereas both Stockholm and RSF formats allow for comments.

Markups in Stockholm format are handled as follows:

GF (Generic per-File) markups are interpreted as alignment annotations or comments (GF CC indicates the latter).
GC (Generic per-Column) markups are treated as headers; they can be shown above the alignment and are available as residue attributes in any associated structures.
GS (Generic per-Sequence) markups are not displayed by Multalign Viewer.
GR (Generic per-Sequence and per-Column) markups are not displayed by Multalign Viewer, but when an alignment is saved in Stockholm format, the file will automatically include GR markup lines describing the secondary structure of any associated structures.
The GR markup line for a sequence and its associated structure starts with
```
#=GR seq-name Chimera_actual_SS_struct-name
```
The structure name is included because there can be more than one structure associated with a given sequence. Chimera-generated GR secondary structure markups may include the symbols:

symbol meaning

. gap in sequence

H helix

E strand

C other structure

X sequence residue not associated with structure residue

If a sequence did not have a GR SS markup in the original input file and is associated with only one structure, another markup line with the same contents but named SS instead of Chimera_actual_SS_struct-name will also be included.

symbol	meaning
.	gap in sequence
H	helix
E	strand
C	other structure
X	sequence residue not associated with structure residue

SEQUENCE-STRUCTURE ASSOCIATION

A sequence in the alignment will be associated automatically with a structure chain if their sequences can be aligned without too many mismatches. The allowable number of mismatches is user-specified as a proportion of the total number of residues in the structure chain (1/10 by default; see the Structure preferences). For automatic association, gaps in the structure sequence relative to the sequence in the alignment file can only occur where residues are missing from the structure (for example, a flexible loop with insufficient density for coordinates to be determined). The order in which the sequence and structure files are opened does not matter. Associations are reported in the status area near the bottom of the sequence window and the Chimera Reply Log. A structure (even if it has multiple chains) cannot be associated with more than one sequence, but a single sequence can be associated with more than one structure. If more than one sequence matches a given structure chain, the single best-matching sequence is associated.

Associations can be changed or added if the automated procedure does not give the desired result.

The names of structure-associated sequences are shown in bold over a box indicating the model-level color of the structure. For example, APE_HUMAN in the sequence window figure is associated with a structure whose model-level color is white. If multiple models are associated with the same sequence, the box is shown as a dark green dashed outline. When the cursor is placed over the name of a sequence, information on the sequence and any associated structure(s) is shown in the status area near the bottom of the sequence window. When the cursor is placed over a sequence residue that is associated with a residue in one or more structures, information on the structure residue(s) is given. Association information can be saved to a file.

Sequence-structure association may create new regions. If all residues of a sequence are matched, no region is created. Otherwise, regions are created containing the matched segment(s), any mismatches, and any gaps in the structure relative to the sequence. The region names report the associated structure chain and the number of gaps or mismatches. The initial colors and display status (shown or hidden) of these regions can be controlled in the Regions preferences.

Association enables several types of sequence-structure crosstalk:

Clicking/dragging on sequence → structure selection.
- When a region is made the active region (for example, when newly created by dragging), any associated structure residues will be selected.
- Clicking within a header line will select any structure residues associated with that column of the alignment.
Structure selection → sequence region. By default, making a selection in one or more sequence-associated structures will create a region named Chimera selection in the associated sequences. Every residue with at least one atom or bond selected will be included in the region. The color of the region and whether it should be created can be controlled in the Regions preferences. The region will not be created when the selection merely reflects the current active region.
Focusing the view.
- Clicking with the right mouse button on a structure-associated residue in the sequence window will focus the graphics window view on the corresponding structure residue(s).
- Shift-right-clicking within a region (or right-clicking within the region but not on a residue) will focus the view on all of the structure residues associated with the region.

When the current selection includes one or more sequence-associated residues, Structure... Expand Selection to Columns will expand the selection to include all residues associated with the same column(s) of the sequence alignment. This can be reversed with Select... Undo in the main Chimera menu.

Changing Associations

Associations can be changed or added using the Structure/Sequence Associations panel (Structure... Associations in the Multalign Viewer menu). The panel lists the molecule models open in Chimera. An association is specified by choosing a chain (if the model has more than one chain) and the name of the sequence to associate with that chain. The choices for association include the sequences in the alignment displayed by Multalign Viewer and none. When the setting is none, there is an option to associate with best match, i.e. to compare the structure chain with all of the sequences in the alignment and associate it with the one that yields the fewest mismatches.

Associations will be made as specified, no matter how inappropriate. Apply performs the associations without dismissing the panel, while OK performs the associations and dismisses the panel. Close dismisses the panel without changing the associations. Help opens this manual page in a browser window.

Automatic Structure Loading

Structure... Load Structures in the Multalign Viewer menu can be used to open structure files corresponding to sequences that are not already structure-associated. The corresponding structure files are determined from the sequence names using the rules given in the Structure preferences, then retrieved and opened (as described for Fetch by ID). If Automatically load Structures is turned on (also in the Structure preferences), this will occur as soon as an alignment is read.

Regions Defined by Secondary Structure

In sequences associated with structures, Structure... Secondary Structure... show actual in the Multalign Viewer menu creates regions named structure helices (pale yellow with gold outline) and structure strands (pale green with darker outline). Protein helix and strand assignments are taken from the input structure file or generated with ksdssp. (Regardless of whether the regions exist, however, an alignment saved in Stockholm format will automatically include annotations describing the secondary structure of any associated structures.)

Regions showing secondary structure assignments (if any) are automatically included when the individual sequence of a structure is displayed with the Sequence tool.

In sequences not associated with any structures, Structure... Secondary Structure... show predicted in the Multalign Viewer menu creates regions named predicted helices (gold outline) and predicted strands (green outline). The prediction is done with GOR.

RESIDUE ATTRIBUTES

Structure residues associated with residues in a sequence alignment shown by Multalign Viewer are assigned attributes:

mavPercentConserved - the value for a residue is the percent conservation of the most prevalent residue type at the corresponding position in the alignment.
mavHeader, where Header is the name of a header line - the value for a residue is the character or number (shown with bar height in a histogram) in the header line at the corresponding position of the alignment. If Conservation is shown as a histogram, for example, there will be a mavConservation attribute with numerical values; if it is shown with characters, the attribute values will be characters.

Superposition assessment generates an additional residue attribute. Structure residues not associated with residues in the sequence alignment are not assigned attribute values.

Structure... Select (or Render) by Conservation in the Multalign Viewer menu opens the corresponding portion of Render/Select by Attribute, set to the residue attribute mavConservation if numerical, otherwise mavPercentConserved.

Numerical attributes will appear in the attribute lists of Render/Select by Attribute; character attributes will be listed only Select by Attribute portion. It may be necessary to Refresh the attribute menus or values in these tools to update them with any changes in Multalign Viewer header information.

STRUCTURAL SUPERPOSITION

Structure... Match in the Multalign Viewer menu allows structures to be superimposed based on the alignment of their associated sequences. The dialog contains two subpanels, each listing the structures associated with sequences in the alignment. One reference structure should be chosen in the left side, but any number of structures to be matched (superimposed) with it can be chosen in the right side. Residues in each Structure to match are paired with the aligned (in the sequence alignment) residues of the Reference structure. If both structures are associated with the same sequence, the correspondence is even more obvious. Fitting uses one point per residue: CA in amino acid residues and C4' in nucleic acid residues. If a nucleic acid residue lacks a C4' atom (some lower-resolution structures are P traces), its P atom will be paired with the P atom of the aligned residue. The number of atom pairs fitted and the resulting RMSD are reported in the Chimera Reply Log and the status area near the bottom of the sequence window.

Match highly conserved residues only causes only the well-conserved (at least 80%) positions in the alignment to be used for the least-squares fit. These are the positions shown as capital letters in the consensus sequence.

Match active region only causes only the positions in the current active region of the alignment to be used for matching.

Use pseudobonds to show matched atoms indicates that lines (pseudobonds) should be drawn between the matched atoms. For each matched pair of structures, a pseudobond group is created and colored uniquely (in order, the named colors dark green, dodger blue, sienna, yellow, spring green, purple, gray, and coral are used). Each group is named matches of..., where the rest of the name indicates the structures and chains that were matched. The PseudoBond Panel can be used to change the appearance of or delete the pseudobonds.

Iterate by pruning long atom pairs until no pair exceeds [x] angstroms refers to an iterative fitting procedure: in each cycle, atom pairs are removed from the match list and the remaining pairs are fitted, until no matched pair is more than x angstroms apart (x=2.0 by default). The atom pairs removed are either the 10% farthest apart of all pairs or the 50% farthest apart of all pairs exceeding the cutoff, whichever is the lesser number of pairs. The result is that the best-matching "core" regions are maximally superimposed; conformationally dissimilar regions such as flexible loops are not included in the final fit, even though they may be aligned in the sequence alignment.

Create region showing matched residues indicates that the residues paired to generate the final fit should be shown as a region named matched residues.

Apply performs the matching (superposition) without dismissing the dialog. OK performs the matching and dismisses the dialog, Cancel dismisses the dialog without performing a match, and Help opens this manual page in a browser window.

SUPERPOSITION ASSESSMENT

The spatial variation among multiple superimposed structures can be shown and analyzed using the RMSD header and corresponding residue attribute, mavRMSD. All of the structure residues associated with a column in the alignment are assigned the same mavRMSD value.

Pairwise assessments of spatial variation can be performed by choosing Structure... Assess Match from the Multalign Viewer menu. Each pairwise comparison creates an attribute of the residues of one structure containing the distances from the sequence-aligned residues of a reference structure.

A Reference structure should be chosen from the pulldown menu of structures associated with sequences in the alignment. One or more Structures to evaluate should be chosen from the list of remaining structures. The structures should already be superimposed, but the superposition can have been generated in any way, including manually (not necessarily using Structure... Match).

One point per residue is used for the comparison. The distance between each reference-evaluation pair of residues aligned in the sequence alignment (or corresponding to the same residue, if the reference and evaluation structures are associated with the same sequence) is measured. The distances are assigned as a residue attribute of the evaluation structure(s), named matchDist by default. After attribute values have been assigned, Select by Attribute will appear, set to the new residue attribute.

OK performs the comparison and dismisses the dialog, while Close just dismisses the dialog. Help opens this manual page in a browser window.

PREFERENCES

The Preferences section of the Multalign Viewer menu controls preferences specific to Multalign Viewer. Settings are saved to the Chimera preferences file as soon as they are changed.

Multalign Viewer preferences are grouped together in sections shown as index cards:

Appearance - sequence display parameters
Structure - structure-loading rules
Headers - creation and display of sequence alignment headers
Regions - creation and display of regions (colored boxes)

Close dismisses the Multalign Viewer preferences tool, and Help opens this manual page in a browser window.

Note that a typed-in value will not be applied until Apply, OK (which also dismisses the preferences tool), or the Enter (return) key has been pressed. Changes in other types of settings take effect and are saved immediately.

The Appearance section of the Multalign Viewer preferences controls sequence text arrangement and coloring. Most of the preferences in this section can be set separately for Multiple alignments and Single sequences.

Line wrapping - whether sequences should be shown as a single horizontal bar or wrapped to form successive blocks
- Wrap to new line every [n]0 residues (default for single sequences) - wrap at the specified interval (default n=5, or 50 positions) regardless of the number of sequences
- Wrap if [k] or fewer sequences (default for multiple alignments, not relevant to single sequences) - wrap at the interval specified above if the alignment contains no more than k sequences (default 8)
- Never wrap
- Space between wrapped blocks - whether to pad blocks vertically (default true for alignments, false for single sequences)
Font
- Use [m] point (Courier/Helvetica/Times), optionally Bold (m=12 by default)
Spacing
- Column separation (pixels) (default 0 for alignments, –2 for single sequences)
- Line separation (pixels) (default 1 for alignments, 8 for single sequences)
- Space after every 10 residues (default true for alignments, false for single sequences)
Residue letter coloring
- Color scheme
  - black (default for single sequences) - color all letters black
  - Clustal X (default for alignments) - use Clustal X coloring, which depends on both residue type and the pattern of conservation within a column
  - ribbon - color one-letter codes to match the associated structure residues, the average color where multiple structures are associated, and black where no structures are associated; ribbons will match the lettering, but need not be displayed for this option to work. Possible uses include making the letters match a structure that has been colored by attribute, by domain, or over a rainbow range of hues.
  - Kyte-Doolittle hydrophobicity - color one-letter codes by Kyte-Doolittle hydrophobicity, ranging from red for the most hydrophobic amino acids to blue for the most hydrophilic, with black in the middle
  - custom coloring schemes can be listed here by describing them in the Clustal X parameter file format and reading them in with Tools... Load Color Scheme
Sequence names
- Use ellipsis for names longer than [j] (j=30 characters by default)

The Structure section of the Multalign Viewer preferences sets rules for automatic sequence-structure association and structure loading.

Auto-association - criteria for automatic sequence-structure association
- Allow one mismatch per [n] structure residues (n=10 by default)
Structure loading - whether/how sequence names relate to the names of the corresponding structure files; whether to retrieve and open such files (as described for Fetch by ID) as soon as a sequence alignment is read. No more than one structure per sequence will be loaded. If a sequence name corresponds to both a SCOP file and a PDB file, the SCOP file will be loaded (unless the Load SCOP... option is turned off).
- Load SCOP file for each unassociated sequence whose name appears to be a SCOP ID (sid) (on by default)
- Load PDB file for each unassociated sequence whose name...
  - is < pdb code >
    (e.g."1gcn")
  - starts with < pdb code >
    (e.g."1k6w_A")
  - contains "pdb|" + < pdb code >
    (NCBI-style; e.g. "gi|1421399|pdb|1FTZ")
  - starts with [string] + < pdb code >
    where string is user-specified
  These correspondence rules are not mutually exclusive; all but the last are on by default.
- Automatically load structures (off by default) - whether to retrieve and open structures as soon as a sequence alignment is read; if not, the procedure can still be invoked with Structure... Load Structures in the Multalign Viewer menu
- Do not autoload if more than [N] structures would load (on and N=10 by default)

The Headers section of the Multalign Viewer preferences controls how sequence alignment headers are calculated and which are shown initially (for alignments, not single sequences).

Consensus style (majority in column including gaps/majority in column ignoring gaps) - what to show in the header line named Consensus. The consensus sequence shows the residue type most prevalent (not necessarily >50%) at each position in the alignment; highly conserved residues (80% or greater) are capitalized and shown in purple, except that the completely conserved residues are shown in red. If several residue types are equally the most prevalent, one is chosen at random to appear in the consensus sequence. When gaps are included, however, the consensus will show a gap at positions where a gap is the most prevalent or equally the most prevalent as one or more residue types.
Conservation style (AL2CO/Clustal histogram/Clustal characters/identity histogram) - what to show in the header line named Conservation and assign as the mavConservation attribute of residues in associated structures. The values can be saved to a file.
AL2CO refers to the program described in:
AL2CO: calculation of positional conservation in a protein sequence alignment. Pei J, Grishin NV. Bioinformatics. 2001 Aug;17(8):700-12.
Several AL2CO parameters can be adjusted:
- Frequency estimation method (unweighted/modified Henikoff & Henikoff/independent counts) - whether/how to weight the sequences
- Conservation measure (entropy-based/variance-based/sum of pairs) - which of the formulas available in AL2CO to use for conservation
- Averaging window (1 by default) - window width, or how many alignment positions to average in a sliding window. The result is assigned to the central column of the window when the width is odd, the column left of center when the width is even (e.g., the fourth position in a window of 8), with various adjustments at the termini. The authors of AL2CO recommend a width of 3 for motif analyses.
- Gap fraction (0.5 by default) - no value is calculated for columns in which gap fraction or more of the sequences have gaps
- Sum-of-pairs matrix [choices: BLOSUM-30, BLOSUM-35, BLOSUM-40, BLOSUM-45, BLOSUM-50, BLOSUM-55, BLOSUM-60, BLOSUM-62 (default), BLOSUM-65, BLOSUM-70, BLOSUM-75, BLOSUM-80, BLOSUM-85, BLOSUM-90, BLOSUM-100, BLOSUM-N, identity, PAM-40, PAM-120, PAM-150, PAM-250] - which amino acid similarity matrix to use with the AL2CO sum of pairs Conservation measure
- Matrix transformation (none/normalization/adjustment) - whether/how to modify the matrix used with the AL2CO sum of pairs Conservation measure
  normalization: S'_ab = S_ab / (S_aaS_bb)^½
  adjustment: S''_ab = 2S_ab – ½(S_aa + S_bb)
The AL2CO paper should be consulted for further details and cited if the results are used in a publication. The conservation values from AL2CO are in standard deviations from the mean (sometimes called Z-scores) and can range from –∞ (least conserved) to +∞ (most conserved).
In a Clustal histogram, full bar height indicates complete identity, 2/3 bar height indicates Clustal strong group conservation, and 1/3 bar height indicates Clustal weak group conservation.
Clustal characters are the characters used in Clustal format to indicate conservation: "*" for complete identity, ":" for strong group conservation, and "." for weak group conservation.
An identity histogram shows what proportion of the sequences have the most prevalent nongap character per alignment column. Histogram bar heights are proportional to (M-1)/(N-1), where M is the number of occurrences of the most prevalent residue type at a position in the alignment and N is the number of sequences in the alignment. Thus, if every sequence has a different residue at a given position, the bar height is zero, not 1/N. However, M/N values (not the values used for display purposes) are used for the mavConservation attribute and written out when conservation values are saved.
Show numbering initially (true/false) - whether to show alignment numbering when an alignment is opened (does not apply to single sequences)
Show consensus initially (true/false) - whether to show the Consensus header when an alignment is opened (does not apply to single sequences)
Show conservation initially (true/false) - whether to show the Conservation header when an alignment is opened (does not apply to single sequences)

The Regions section of the Multalign Viewer preferences controls whether a selection will be shown as a region and sets initial region colors.

Show Chimera selection as region (true/false) - whether to show a selection within structures as a region named Chimera selection within associated sequences (if any); a residue will be included in the region if any of its atoms are selected
Chimera selection region border color (a color well, No color by default) - initial border color for the region showing the Chimera selection
Chimera selection region interior color (a color well, green by default) - initial interior color for the region showing the Chimera selection
New region border color (a color well, No color by default) - initial border color for manually created regions
New region interior color (a color well, white by default) - initial interior color for manually created regions
Use ellipsis for names longer than [j] (j=25 characters by default)

Structure association - whether and how to show regions created during association
- Draw region depicting successful matches (on/off; color wells for the border and interior, default black and No color, respectively)
- Draw region depicting mismatches (on/off; color wells for the border and interior, default No color and red, respectively)
- Draw region depicting structure gaps (on/off; color wells for the border and interior, default red and No color, respectively)

MENU LISTING

The Multalign Viewer menu includes the following sections:

File
Edit
Structure
Headers
Numberings
Tree
Tools
Preferences

File

Save As... bring up a dialog for saving the alignment to a file. The sequence alignment formats available for saving are the same as those that can be read. Options:
- Restrict save to active region (off by default)
- Omit all-gap columns (default on)
- Append sequence numberings to sequence names (default on if sequence numbering is displayed, otherwise off)
Save EPS... bring up a dialog for saving an Encapsulated PostScript file of the sequence window contents. Options:
- color mode (color/grayscale/black & white)
- rotate 90 (true/false)
- extent (visible region/entire alignment)
- hide tree control nodes (true/false) - whether to omit the small squares from a tree display
If the alignment is wrapped (or would wrap if it were longer), one EPS file will be saved with the specified name. Otherwise, the sequence names and alignment will be saved in two separate EPS files with "-names" and "-alignment" appended to the specified name (before the ".eps" suffix, if present).
Save Association Info... save a text file that states which structure is associated with which sequence and lists the correspondences between alignment positions and structure residues. The Residue naming style indicates how residues will be specified in the file:
- simple (for example, PHE 16)
- command-line specifier (for example, :16)
Hide - close the sequence window without changing the state of Multalign Viewer (the window can be reopened using the Raise option for the Multalign Viewer instance in the Chimera Tools menu, abbreviated MAV)
Quit - exit from Multalign Viewer

Edit

Copy Sequence... copy a sequence as text that can be pasted into another application window. Options:
- Remove gaps from copy (default on) - exclude gap characters (typically present when the sequence is part of a multiple alignment)
- Restrict copy to active region (default off) - only include residues that are within the active region, even if they form disjoint segments
Reorder Sequences - open a dialog for reordering sequences. The dialog lists the names of the sequences in their current order. The order can be changed by clicking on a name and moving it to the top, bottom, one higher, or one lower. Any order can be achieved through combinations of these operations. OK and Apply perform the reordering in the sequence window with and without dismissing the dialog, respectively. Sequence reordering will delete all existing regions, although a region reflecting the current Chimera selection (if any) will subsequently (re)appear.
Insert All-Gap Columns - open a dialog for inserting all-gap columns into the alignment (presumably to facilitate subsequent editing that moves residues into the new space). The dialog allows specification of how many gap columns to add, where to insert them, and the character to use for gap positions. OK and Apply insert the gap columns with and without dismissing the dialog, respectively.
Delete Sequences/Gaps... open a dialog for deleting sequences and/or all-gap columns. One or more sequences can be slated for deletion by choosing their names from the list. Ctrl-click toggles the status of a sequence. To choose a block of sequences without dragging, click on the first (or last) and then Shift-click on the last (or first) in the desired block. Delete all-gap columns indicates that columns containing only gaps (evaluated after sequence deletion, if any) should be removed. Clicking Apply (or OK, which also closes the dialog) performs the indicated deletions.
Add Sequence - open a dialog for adding a new sequence to the alignment. A Sequence name should be specified. The new sequence will be aligned to the others using the Needleman-Wunsch algorithm (unless plain text is simply appended). The existing alignment will not be changed, except that gap positions may be inserted between columns of the existing alignment. Clicking Apply (or OK, which also closes the dialog) adds the sequence to the alignment.

Sequence input options:
- Plain Text - add a sequence that has been entered or pasted in the Sequence field as plain text. It will be converted to upper case and any spaces removed automatically. The sequence should not contain gap characters, unless (with gaps) it is exactly the same length as the existing alignment and the option to Simply append... is on.
- From Structure - add the sequence of a structure chain open in Chimera
Alignment parameters:
- Matrix (default BLOSUM-62) - substitution matrix for scoring the alignment. The score for aligning a residue in the new sequence with a column in the existing alignment is the sum of the matrix values for the residue versus each of the residues in the column (and zero versus a gap position) divided by the number of sequences (rows) in the existing alignment. This contribution over the entire alignment is the residue similarity score. In some cases there will also be a secondary structure score.
- Opening penalty (default 12) - penalty for opening a gap in the existing alignment, regardless of whether it is already flanked by gaps in any of the sequences. For opening a gap in the new sequence, this penalty is scaled by the fraction of sequences in the existing alignment not gapped at that position. When secondary structure scoring is used, secondary-structure-specific gap opening penalties apply instead.
- Extension penalty (default 1) - penalty for extending a gap between columns of the existing alignment. For extending a gap in the new sequence, this penalty is scaled by the fraction of sequences in the existing alignment not gapped at that position.
- Character (default .) - character for displaying gap positions created during alignment of the new sequence
- Include secondary structure score (N%) - include a secondary structure term in the alignment score and use secondary-structure-specific gap opening penalties. This option only applies when one is adding a sequence from a structure and at least one other sequence in the alignment is already associated with a structure. Detailed settings can be shown by clicking Show parameters, hidden again by clicking Hide parameters. N reflects the relative weights of the terms, which can be adjusted by moving the slider. If N is 30, for example,
  total score = 0.70(residue similarity score) + 0.30(secondary structure score) – gap penalties
  The values in the secondary structure Scoring matrix (for all pairwise combinations of H helix, S strand, and O other) and the secondary-structure-specific Gap opening penalties can be adjusted. The secondary structure score is computed in the same way as described for the residue similarity score, except that sequences without associated structures contribute neither to the score nor to the sequence count used for normalization. When secondary structure scoring is used (even if the slider is set to zero),
  - the secondary-structure-specific Gap opening penalties are used instead of the single Opening penalty
  - the Extension penalty remains the same
  - the total penalty is computed as described above, except that sequences without associated structures contribute neither to the penalty nor to the sequence count used for normalization
  Contributions from multiple structures associated with the same sequence in the alignment are averaged so that the sequence only counts "once." The secondary structure scoring parameters can be reset to their default values.
Alignment Annotations... open a dialog for viewing and editing alignment annotations
Show Editing Keys... bring up a dialog summarizing Ctrl-key editing functions

Find Subsequence... find occurrences of a string of residues in one or all sequences and make them into a region named search result
Find PROSITE Pattern... find all matches to a pattern specified with PROSITE pattern syntax in one or all sequences and make them into a region named search result (residue codes must be capitalized; extended syntax involving an asterisk, e.g. <{C}*>, is not supported)

Structure

Load Structures - for sequences not already associated with a structure, retrieve and open the corresponding structure files (as described for Fetch by ID); correspondence is determined from sequence names according to the rules given in the Structure preferences
Match... open a dialog for superimposing structures based on the sequence alignment
Assess Match... open a dialog for evaluating how well the sequence-aligned residues of structures are superimposed in space
Associations... open a dialog for adding or changing associations between structures and sequences
Secondary Structure
- show actual - create regions named structure helices and structure strands colored goldenrod and lime green (see the named colors) and corresponding to residues in associated structures that are in helices and strands, respectively. Protein helix and strand assignments are taken from the input structure file or generated with ksdssp.
- show predicted - create regions named predicted helices and predicted strands colored gold and light green (see the named colors) in sequences without associated structures. Helices and strands in these sequences are predicted using GOR.
Select by Conservation... open the Select by Attribute tool to select residues in sequence-associated structures by conservation
Render by Conservation... open the Render by Attribute tool to color or otherwise change the appearance of residues in sequence-associated structures by conservation
Expand Selection to Columns - if any sequence-associated residues are selected, expand the selection to include all residues associated with the same column(s) of the sequence alignment

Headers

Save... save a header definition file containing alignment positions and header values. Header values are the characters or numbers (shown with bar heights in a histogram) in a header line such as Conservation in the sequence window. No-value positions (for example, when the AL2CO Conservation style is used, positions with a gap in more than gap fraction of the sequences) can be omitted from the file.
Load... read a header definition file to show a new header line and have its values available as residue attributes in any associated structures

The available headers are listed in the bottom part of the menu; checkboxes control which are shown in the sequence window:

Consensus
Conservation
another_header_name
(etc.)

Numberings

Overall Alignment - whether to show alignment numbering

Left Sequence - whether to show sequence numbering to the left of each row of sequences
Right Sequence - whether to show sequence numbering to the right of each row of sequences
Adjust Sequence Numberings... open a dialog for specifying sequence start numbers. One or more sequences can be chosen with the mouse. Ctrl-click toggles the status of a sequence. To choose a block of sequences without dragging, click on the first (or last) and then Shift-click on the last (or first) in the desired block. Clicking Apply (or OK, which also closes the dialog) applies the new Start number to the chosen sequence(s).

Tree

Load... read a tree in Newick format and display it to the left of the sequences in the sequence window
Show Tree - hide/show the previously loaded tree

Extract Subalignment - copy the subalignment defined by the chosen node of the tree into a separate sequence window

Tools

Percent Identity... calculate the percent identity between a pair of sequences in the alignment
Region Browser - manage the existing regions
Load SCF/Seqsel File... read a JEvTrace file; color sequence regions and the corresponding parts of any associated structures accordingly
Load Color Scheme... read a parameter file to set the coloring of residue codes in the sequence window. When a scheme is loaded, the residue codes are colored accordingly and the name of the custom coloring file is added to the list of choices for Residue letter coloring in the Appearance preferences. The dialog for loading the file includes an option to Make this scheme the default. The location of each custom coloring file is saved in the preferences file. If a custom coloring file is later renamed, moved, or deleted, trying to load the corresponding scheme will generate an error.

Preferences

UCSF Computer Graphics Laboratory / November 2009