Multalign Viewer

Multalign Viewer allows viewing and manipulation of multiple sequence alignments. If the corresponding structures are displayed in Chimera, there can be crosstalk showing which regions on the structures are those highlighted in the sequences, and vice versa. In addition, structures can be superimposed based on the sequence alignment. The state of Multalign Viewer (along with sequence alignment data) is included in saved sessions. For an informal introduction, see the Sequences and Structures tutorial.

MinRMS, a stand-alone program available from the UCSF Computer Graphics Lab, produces structural superpositions and other data that can be displayed with AlignPlot, as well as sequence alignments that can be displayed with Multalign Viewer (and MSF Viewer). However, many other programs can generate sequence alignments suitable as input to Multalign Viewer. Likewise, structural alignments can be generated in a multitude of ways.

Multalign Viewer is under development. New features will continue to be added.

STARTUP AND INPUT

There are several ways to start Multalign Viewer, a tool in the Homology category. Explicitly starting Multalign Viewer brings up a dialog for opening files. File formats that can be read and written are:

Format	Prefix	Suffix
Aligned FASTA	afasta:	.afasta .afa .fasta .fa
Aligned NBRF/PIR	pir:	.pir
Clustal ALN	aln:	.aln .clustal .clustalw .clustalx
GCG MSF	msf:	.msf
GCG RSF	rsf:	.rsf
Stockholm	pfam: hmmer:	.pfam .sto

One of these types can be chosen from the File type: pulldown menu near the bottom of the dialog, or the type can be inferred from the suffix(es) of the chosen file(s). Once a file has been read, a sequence window containing the alignment appears; if multiple files are read, each is shown in a separate sequence window.

Multalign Viewer menu items are explained within the following sections, and a full listing with short descriptions is included at the end of this page.

Note that Multalign Viewer can be started automatically from AlignPlot (see the show sequence alignment feature).

THE SEQUENCE WINDOW

Multalign Viewer sequence window

This figure shows part of the sequence window contents for the input file apoex.fa. A Consensus sequence and Conservation information are shown above the multiple alignment. The consensus sequence shows the residue types most prevalent (not necessarily >50%) at each position in the alignment; highly conserved residues (80% or greater) are capitalized and shown in purple, except that the completely conserved residues are shown in red. The Analysis preferences control residue coloring and how consensus and conservation are determined and represented.

Font size and sequence wrapping behavior can be controlled with the Layout preferences. The regions (boxes enclosing one or more residues) are described below. One can search for occurrences of a particular string or pattern of residues in one or all of the sequences using Find Subsequence and Find PROSITE Pattern (under Tools in the Multalign Viewer menu).

When the cursor is in the sequence window, the Page Down key (or space) moves the view down to start with the block below the topmost block whose beginning is currently visible; Page Up (or Shift-space) moves the view up to start with the block above the topmost block whose beginning is currently visible.

The Hide button closes the sequence window without changing the state of Multalign Viewer. The sequence window can be reinvoked using the Raise option for the Multalign Viewer instance in the Tools menu (abbreviated MAV). This is also useful when the window has become obscured by other windows. Help opens this manual page in a browser window, and Quit exits from Multalign Viewer.

REGIONS

A sequence region can be defined manually or by any of several operations within Multalign Viewer. A region is shown using a colored rectangle or outline in the sequence window; a single region can contain any number of disjoint and/or abutting rectangular blocks.

A region can be created manually by dragging with the left mouse button within the sequence window. Dragging downward into the following block highlights to the end of the preceding block. The initial region colors can be controlled in the Regions preferences. Shift-dragging with the left mouse button adds to the active region. Ctrl-dragging creates a new region and makes it the active region.

The active region is the latest region created manually, or the region last clicked on (either in the sequence window or in the listing of regions in the Region Browser). The corresponding parts of any associated structures are highlighted in the main Chimera window by becoming selected. Only one region can be active at a time. In the sequence window, the active region is indicated with a dashed outline; clicking the active region deactivates it, and clicking a different region deactivates the former active region and makes the new region active. A region with no interior color is only responsive to clicks on its borders. Where regions overlap, only the highest is responsive to clicks.

The Region Browser (Tools... Region Browser in the Multalign Viewer menu) controls the appearance of regions within the sequence window. Clicking an entry in the Region Browser makes that region the active region; any pre-existing active region is deactivated. The properties shown always apply to the active region. Border color and Interior color can be changed using the adjacent color wells.

The Delete button on the Region Browser deletes the active region; the Delete key has the same effect. Since regions may overlap, a region can be considered higher or lower than another. Raise puts the active region in front of any other regions in the sequence window and moves its listing in the Region Browser to the top. Lower puts the active region behind any overlapping regions in the sequence window and moves its listing in the Region Browser to to the bottom. Alternatively, when the cursor is in the sequence window, the up arrow and down arrow keys can be used to raise and lower the active region, respectively. Cancel dismisses the Region Browser, and Help opens this manual page in a web browser.

Other ways to create regions:

by searching for occurrences of a particular string or pattern of residues
by reading a sequence coloring format (SCF) file or seqsel file (Tools... Load SCF/Seqsel File) generated by the program JEvTrace. The two formats are slightly different, but both contain information for coloring regions in the sequence and in any associated structures. The file reader automatically determines which format is being used. The formats are simple enough that users wishing to color residues alike in sequence and structure (for any reason) may wish to create a file manually and read it in by this route.
as a by-product of sequence-structure association
by selection within sequence-associated structures
by secondary structure (actual in structure-associated sequences, predicted in sequences not associated with structures)
by superposition assessment

EDITING AND SAVING THE ALIGNMENT

A region newly created by dragging within the sequence window can be moved using Ctrl-left arrow and Ctrl-right arrow. Gaps can be created, extended, reduced, or removed, but residues cannot be added or removed. Flanking sequences will be pushed by moving the region until all the gap positions in the direction of movement have been exhausted.

Choosing File... Save As... from the Multalign Viewer menu brings up a dialog for saving the sequence alignment to a file. It is possible to save just the active region to a file; although the region may consist of disjoint sections, each sequence (row) included must contain the same set of columns.

The formats available for saving are the same as those that can be read. An alignment saved in Stockholm format will automatically include annotations describing the secondary structure of any associated structures. In addition, the sequence window contents can be saved as an EPS file.

ANNOTATIONS

Sequence alignment files may contain various annotations.

Alignment annotations (those describing the alignment as a whole) can be viewed and altered by choosing Tools... Alignment Annotations from the Multalign Viewer menu. The dialog shows any annotations from the input file, each of which can be changed by typing in the adjacent field. Changes are not assessed for correctness.

Each annotation in the top section of the panel has two parts, a name and a value (generally text). For example, MSF length is a name and 366 is a possible associated value. A new annotation can be added by clicking New and specifying a name. The name will appear in the top section of the panel, and a value can be entered by typing in the adjacent field. Values can have multiple lines even though only one is shown; hitting return when typing into a value field retains the previous information and starts a new line. Clicking Delete and then choosing an annotation name removes the corresponding annotation. Multiple lines of free-form text can be placed in the Comments area in the lower section of the panel.

Not all formats can accommodate annotations. Thus, saved files may not include this information or reflect any changes that have been made. Only the Stockholm format accommodates arbitrary annotations, whereas both Stockholm and RSF formats can include comments. The Stockholm format GF (Generic per-File) markups are interpreted as alignment annotations, whereas the GC (Generic per-Column) markups are treated as headers and displayed similarly to the consensus and conservation information.

When an alignment is saved in Stockholm format, the saved file will automatically include GR (Generic per-sequence and per-column) markup lines describing the secondary structure of any associated structures. The GR markup line for a sequence and its associated structure starts with

#=GR seq-name Chimera_actual_SS_struct-name

The structure name is included because there can be more than one structure associated with a given sequence. Chimera-generated GR secondary structure markups may include the symbols:

symbol	meaning
.	gap in sequence
H	helix
E	strand
C	other structure
X	sequence residue not associated with structure residue

If a sequence did not have a GR SS markup in the original input file and is associated with only one structure, another markup line with the same contents but named SS instead of Chimera_actual_SS_struct-name will also be included. GR markups are not displayed by Multalign Viewer.

SEQUENCE-STRUCTURE ASSOCIATION

A sequence in the alignment will be associated automatically with a structure chain if their sequences can be aligned without too many mismatches. The allowable number of mismatches is user-specified as a proportion of the total number of residues in the structure chain (1/10 by default; see the Structure section of the Multalign Viewer preferences). For automatic association, gaps in the structure sequence relative to the sequence in the alignment file can only occur where residues are missing from the structure (for example, a flexible loop with insufficient density for coordinates to be determined). The order in which the sequence and structure files are opened does not matter. Associations are reported in the status area near the bottom of the sequence window and the Chimera Reply Log. A structure (even if it has multiple chains) cannot be associated with more than one sequence, but a single sequence can be associated with more than one structure. If more than one sequence matches a given structure chain, the single best-matching sequence is associated.

The names of structure-associated sequences are shown in bold over a box indicating the model-level color of the structure. For example, APE_HUMAN in the sequence window figure is associated with a structure whose model-level color is white. If multiple models are associated with the same sequence, the box is shown as a dark green dashed outline. When the cursor is placed over the name of a sequence, information on the sequence and any associated structure(s) is shown in the status area near the bottom of the sequence window. When the cursor is placed over a sequence residue that is associated with a residue in one or more structures, information on the structure residue(s) is given.

Sequence-structure association may create new regions. If all residues of a sequence are matched, no region is created. Otherwise, by default

the matched part(s) of a sequence become(s) a region named matches of... with a black border and no interior color
any gap(s) in the structure relative to the sequence become(s) a region named gaps... with a red border and no interior color
any mismatch(es) become(s) a region named mismatches... with a red interior and no border

The region names also specify the structure and chain that were sequence-associated and the number of gaps or mismatches (indicated by "..." above).

Adding or Changing Associations

Associations can be added or changed using the Structure/Sequence Associations panel (Structure... Associations... in the Multalign Viewer menu). The name of each open structure model is shown next to pulldown menus for specifying chain (if applicable) and sequence name. An association is indicated by choosing the name of the sequence to be associated with the model and (if applicable) with which chain the association should be made. The choices for sequence to associate include those in the alignment displayed by Multalign Viewer and none. The specified associations will be made no matter how inappropriate they are.

Apply performs the indicated association(s) without dismissing the panel. OK performs the indicated association(s) and dismisses the panel, Close dismisses the panel without changing the associations, and Help opens this manual page in a browser window.

Automatic Structure Loading

Structure... Load Structures... in the Multalign Viewer menu can be used to open structure files corresponding to sequences that are not already structure-associated. The corresponding structure files are determined from the sequence names using the rules given in the Structure preferences, then retrieved and opened (as described for Fetch by ID). If Automatically load Structures is turned on (also in the Structure preferences), this will occur as soon as an alignment is read.

Focusing the View

Clicking with the right mouse button on a structure-associated residue in the sequence window will focus the view in the graphics window on the corresponding structure residue (or residues, since there could be more than one structure associated with the sequence). Shift-right-clicking within a region (or right-clicking within the region but not on a residue) will focus the view on all of the structure residues associated with the sequence region, if any. A region with no interior color is only responsive to clicks on its borders.

Regions Defined by Selection

By default, making a selection within one or more structures creates a region named Chimera selection in the associated sequences (if any). Each residue with at least one atom or bond selected will be included in the region. Whether selection creates a region and the colors of the resulting region can be controlled in the Regions preferences. The Chimera selection region is not created when the structure selection merely reflects the current active region.

Regions Defined by Secondary Structure

Regions can be defined within structure-associated sequences based on helices and sheets in the corresponding structures. Structure... Secondary Structure... show actual in the Multalign Viewer menu creates regions named structure helices and structure strands colored goldenrod and lime green (see the named colors), respectively. Helices and strands are defined by HELIX and SHEET records in the input file, or if these are not present, using ksdssp. (Regardless of whether the regions exist, however, an alignment saved in Stockholm format will automatically include annotations describing the secondary structure of any associated structures.)

In sequences not associated with any structures, Structure... Secondary Structure... show predicted in the Multalign Viewer menu creates regions named predicted helices and predicted strands colored gold and light green (see the named colors) that correspond to helices and strands, respectively, predicted using GOR.

Residue Conservation Attribute

Structure residues associated with residues in a sequence alignment shown by Multalign Viewer are assigned an attribute named mavPercentConserved. The attribute value equals the percent conservation of the most prevalent residue type at the corresponding position within the sequence alignment. Structure residues not associated with residues in the sequence alignment are not assigned mavPercentConserved values.

Structure... Select by Conservation... in the Multalign Viewer menu opens the Select by Attribute tool to select residues in structures by their mavPercentConserved values; similarly, Structure... Render by Conservation... opens the Render by Attribute tool for coloring or otherwise changing the appearance of residues based on their mavPercentConserved values.

STRUCTURAL SUPERPOSITION

Structure... Match... in the Multalign Viewer menu allows structures to be superimposed based on the alignment of their associated sequences. The dialog contains two subpanels, each listing the structures associated with sequences in the alignment. One reference structure should be chosen in the left side, but any number of structures to be matched (superimposed) with it can be chosen in the right side. Residues in each Structure to match are paired with the aligned (in the sequence alignment) residues of the Reference structure. If both structures are associated with the same sequence, the correspondence is even more obvious. One point per residue is used for the least-squares fitting: CA in amino acid residues and C4' in nucleic acid residues. The number of atom pairs fitted and the resulting RMSD are reported in the Chimera Reply Log and the status area near the bottom of the sequence window.

Match highly conserved residues only causes only the well-conserved (at least 80%) positions in the alignment to be used for the least-squares fit. These are the positions shown as capital letters in the consensus sequence.

Match active region only causes only the positions in the current active region of the alignment to be used for matching.

Use pseudobonds to show matched atoms indicates that lines (pseudobonds) should be drawn between the matched atoms. For each matched pair of structures, a pseudobond group is created and colored uniquely (in order, the named colors dark green, dodger blue, sienna, yellow, spring green, purple, gray, and coral are used). Each group is named matches of..., where the rest of the name indicates the structures and chains that were matched. The PseudoBond Panel can be used to change the appearance of or delete the pseudobonds.

Create region showing matched residues indicates that the residues paired to generate the fit should be shown as a region named matched residues.

Iterate by pruning long atom pairs until no pair exceeds [x] angstroms refers to an iterative fitting procedure: in each cycle, atom pairs are removed from the match list and the remaining pairs are fitted, until no matched pair is more than x angstroms apart (x=5.0 by default). The atom pairs removed are either the 10% farthest apart of all pairs or the 50% farthest apart of all pairs exceeding the cutoff, whichever is the lesser number of pairs. The result is that the best-matching "core" regions are maximally superimposed; conformationally dissimilar regions such as flexible loops are not included in the final fit, even though they may be aligned in the sequence alignment.

Apply performs the matching (superposition) without dismissing the dialog. OK performs the matching and dismisses the dialog, Cancel dismisses the dialog without performing a match, and Help opens this manual page in a browser window.

SUPERPOSITION ASSESSMENT

Structure... Assess Match... in the Multalign Viewer menu opens a dialog for evaluating how well the sequence-aligned residues of structures are superimposed in space. The dialog has two subpanels, each listing the structures associated with sequences in the alignment. One reference structure should be chosen in the left side, but any number of structures to be compared with it can be chosen in the right side. The structures to evaluate should already be superimposed on the reference structure, but the superposition can have been generated in any way, including manually (not necessarily using Structure... Match). One point per residue is used for the comparison: CA in amino acid residues and C4' in nucleic acid residues. The distance between each reference-evaluation pair of residues aligned in the sequence alignment (or corresponding to the same residue, if the reference and evaluation structures are associated with the same sequence) is measured. Using the specified Cutoff distance, a region is created in the sequence(s) associated with the evaluation structure(s):

named further than... when the pulldown option is set to beyond
named closer than... when the pulldown option is set to within

The Region color is specified by a color well.

OK performs the comparison and dismisses the dialog, whereas Apply performs the comparison without dismissing the dialog. Close dismisses the dialog, and Help opens this manual page in a browser window.

PREFERENCES

The Preferences section of the Multalign Viewer menu controls preferences specific to Multalign Viewer; settings are saved with other preferences in the Chimera preferences file. Multalign Viewer preferences are grouped together in sections shown as index cards:

Layout
Structure
Analysis
Regions

Close dismisses the Multalign Viewer preferences tool, and Help opens this manual page in a browser window.

Note that a typed-in value will not be applied until Apply, OK (which also dismisses the preferences tool), or the Enter (return) key has been pressed. Other types of settings take effect right away.

The Layout section of the Multalign Viewer preferences controls the font and arrangement of text in the sequence window.

Alignment wrapping - whether the alignment should be continuous from left to right or automatically wrap to successive blocks; there are three mutually exclusive settings:
- Wrap to new line every [n]0 residues - wrap every 10n positions in the alignment no matter how many sequences are aligned (n=5 by default)
- Never wrap
- Wrap if [k] or fewer sequences (the default) - wrap every 10n positions in alignments that contain k or fewer sequences (n=5 and k=8 by default)
Font - the font size and type to use for the aligned sequences, their names, the consensus sequence, and the alignment numbering
- Use [m] point (Courier/Helvetica/Times) (m=12 by default)
Spacing
- Column separation (pixels) (0 by default)
- Line separation (pixels) (1 by default)

The Structure section of the Multalign Viewer preferences allows adjustment of criteria for automatic sequence-structure association and automatic loading of structures corresponding to the sequences in an alignment.

Auto-association - criteria for automatic sequence-structure association
- Allow one mismatch per [n] structure residues (n=10 by default)
Structure loading - whether/how sequence names relate to the names of the corresponding structure files; whether to retrieve and open such files (as described for Fetch by ID) as soon as a sequence alignment is read. No more than one structure per sequence will be loaded. If a sequence name corresponds to both a SCOP file and a PDB file, the SCOP file will be loaded (unless the Load SCOP... option is turned off).
- Load SCOP file for each unassociated sequence whose name appears to be a SCOP ID (sid) (on by default)
- Load PDB file for each unassociated sequence whose name...
  - is < pdb code >
    (e.g."1gcn")
  - starts with < pdb code >
    (e.g."1k6w_A")
  - contains "pdb|" + < pdb code >
    (NCBI-style; e.g. "gi|1421399|pdb|1FTZ")
  - starts with [string] + < pdb code >
    where string is user-specified
  These correspondence rules are not mutually exclusive; all but the last are on by default.
- Automatically load structures (off by default) - whether to retrieve and open structures as soon as a sequence alignment is read; if not, the procedure can still be invoked with Structure... Load Structures in the Multalign Viewer menu
- Do not autoload if more than [N] structures would load (on and N=10 by default)

The Analysis section of the Multalign Viewer preferences controls how consensus and conservation are determined and represented.

Consensus style (majority in column including gaps/majority in column ignoring gaps/none) - what is shown for the row marked Consensus in the sequence window. If several residue types are equally the most prevalent, one is chosen at random to appear in the consensus sequence. When gaps are included, however, the consensus will show a gap at positions where a gap is the most prevalent or equally the most prevalent as one or more residue types. None means that the row will not be shown.
Conservation style (Clustal histogram/Clustal characters/identity histogram/none) - what is shown for the row marked Conservation in the sequence window. None means that the row will not be shown.
Identity histogram bars show what proportion of the residues at a position are the same type as in the Consensus sequence; bar height is proportional to (M-1)/(N-1), where M is the number of occurrences of the consensus residue type at a position in the alignment and N is the number of sequences in the alignment. Thus, if every sequence has a different residue at a given position, the bar height is zero, not 1/N.
In a Clustal histogram, full bar height indicates complete identity, 2/3 bar height indicates Clustal strong group conservation, and 1/3 bar height indicates Clustal weak group conservation.
Clustal characters are the characters used in Clustal format to indicate conservation: "*" for complete identity, ":" for strong group conservation, and "." for weak group conservation.
Residue letter coloring (black/Clustal X) - how one-letter residue codes are colored in the sequence window. Clustal X coloring depends on both residue type and the pattern of conservation within a column.

The Regions section of the Multalign Viewer preferences controls whether a selection will be shown as a region and sets initial region colors.

Show Chimera selection as region (true/false) - whether to show a selection within structures as a region named Chimera selection within associated sequences (if any); a residue will be included in the region if any of its atoms are selected
Chimera selection region border color (a color well)
Chimera selection region interior color (a color well)
New region border color (a color well) - initial border color for manually created regions
New region interior color (a color well) - initial interior color for manually created regions

MULTALIGN VIEWER MENU

File

Save As... bring up a dialog for saving the sequence alignment to a file. It is possible to save just the active region to a file; although the region may consist of disjoint sections, each sequence (row) included must contain the same set of columns. The formats available for saving are the same as those that can be read. An alignment saved in Stockholm format will automatically include annotations describing the secondary structure of any associated structures.
Save EPS... bring up a dialog for saving an Encapsulated PostScript file of the sequence window contents. Options:
- color mode - (color/grayscale/black & white)
- rotate 90 - (true/false)
- extent - (visible region/entire alignment)
If the alignment is wrapped (or would wrap if it were longer), one EPS file will be saved with the specified name. Otherwise, the sequence names and alignment will be saved in two separate EPS files with "-names" and "-alignment" appended to the specified name (before the ".eps" suffix, if present).
Hide - close the sequence window without changing the state of Multalign Viewer (the window can be reopened using the Raise option for the Multalign Viewer instance in the Chimera Tools menu, abbreviated MAV)
Quit - exit from Multalign Viewer

Structure

Load Structures - for sequences not already associated with a structure, retrieve and open the corresponding structure files (as described for Fetch by ID); correspondence is determined from sequence names according to the rules given in the Structure preferences
Match... open a dialog for superimposing structures based on the sequence alignment
Assess Match... open a dialog for evaluating how well the sequence-aligned residues of structures are superimposed in space
Associations... open a dialog for adding or changing associations between structures and sequences
Secondary Structure
- show actual - create regions named structure helices and structure strands colored goldenrod and lime green (see the named colors) and corresponding to residues in associated structures that are in helices and strands, respectively. Helices and strands are defined by HELIX and SHEET records in the input file, or if these are not present, using ksdssp.
- show predicted - create regions named predicted helices and predicted strands colored gold and light green (see the named colors) in sequences without associated structures. Helices and strands in these sequences are predicted using GOR.
Select by Conservation... open the Select by Attribute tool to select residues in sequence-associated structures by conservation
Render by Conservation... open the Render by Attribute tool to color or otherwise change the appearance of residues in sequence-associated structures by conservation

Tools

Find Subsequence... find occurrences of a string of residues in one or all sequences and make them into a region named search result
Find PROSITE Pattern... find all matches to a pattern specified with PROSITE pattern syntax in one or all sequences and make them into a region named search result (residue codes must be capitalized; extended syntax involving an asterisk, e.g. <{C}*>, is not supported)
Region Browser - control the appearance of regions in the Multalign Viewer sequence window
Alignment Annotations... show alignment annotations from the sequence alignment file, allow changes and additions
Load SCF/Seqsel File - read file from JEvTrace with information on creating/coloring regions in sequences and any associated structures

Preferences

Layout - control the font and and arrangement of text in the Multalign Viewer sequence window
Structure - adjust criteria for automatic sequence-structure association and automatic loading of structures corresponding to the sequences in an alignment
Analysis - control how consensus and conservation are determined and represented
Regions - control whether a selection will be shown as a region; set initial region colors

UCSF Computer Graphics Laboratory / October 2004