mda

Usage:
mda input-sequence output-folder options

The command mda (for MultiDomain Assembler) performs several steps toward creating a model of a multidomain protein:

Obtains the protein sequence specified by input-sequence, which can be either of the following:
- a UniProt accession number (example: P02751); UniProt's ID mapping service can be used to obtain UniProt accession numbers from other sequence database identifiers
- a pathname to a local file containing the sequence in FASTA format (example: /Users/miller/Desktop/mybpc3.fa); if the file contains multiple sequences, only the first will be used
With that sequence as the query, searches the Protein Data Bank (PDB) sequences to find known structures of similar sequence, using a BLAST web service hosted by the UCSF Resource for Biocomputing, Visualization, and Informatics (RBVI).
Imports the corresponding PDB structures into Chimera and lays them out from left to right according to sequence matches along the query from N- to C-terminus. Multiple hits to the same or overlapping segments are stacked vertically; gaps in structural coverage are shown as transparent dark gray spheres proportional in volume to the number of missing residues. See the display options.
Performance depends on the number of structures imported into Chimera, which can be very large especially for runs with low filtering (permissive BLAST score and percent ID cutoffs, little or no winnowing) on queries with many known structures. Repeated uses of mda will reuse any open models that still meet the criteria and close any others. The command can be aborted during structure import by clicking the red stop icon in the status line to cancel the foreground task.
The center of rotation method is set to independent so that models rotate about their individual centers rather than a single collective center. The initial layout including scale and model orientations is saved as a position named stacked. Positions can be restored from the Rapid Access interface or with the command reset (e.g., reset stacked). Another position named overlay, better suited to the modeling step, is also saved.
In the specified output-folder, writes output files:
- the pseudo-multiple sequence alignment from BLAST in aligned FASTA format
- a text file containing information from the run
Opens the pseudo-multiple sequence alignment from BLAST in Multalign Viewer; this can be suppressed (along with the Modeller dialog) using showAlignment false.
Opens the interface to Modeller comparative (homology) modeling with the query sequence as the target, all of the imported hit structures designated as templates, and other options set as recommended for use with mda, including thorough optimization. The overlay position should be restored before comparative modeling to decrease the likelihood of generating models with knots. Comparative modeling relies on an accurate sequence alignment. If necessary, the sequence alignment can be edited manually or realigned using Multalign Viewer, or the output FASTA file can be edited outside of Chimera and reopened. Note that the BLAST alignment may omit parts of the hit chains (more...).
On first use, mda also generates three database files (MDA_seqs.db, MDA_blast.db and MDA_uniprot.db) to speed up subsequent uses of the command, but these are not human-readable. These are placed in the user's Chimera download directory, in an MDA subdirectory.

Multidomain assembly is under development; please send any comments and suggestions to chimera-users@cgl.ucsf.edu.

Examples:

mda Q14896 ~/Desktop/MDA percentId 50 showAlignment false coloring blastscore skip 3CX2
mda Q14896 ~/Desktop/MDA limit 1,20; reset overlay

A smaller example (as of October 2014, imports five structures):

mda p45379 ~/Desktop/MDA percent 60 group false color blast

Options

Option keywords can be truncated to unique strings and their case does not matter. A vertical bar “|” designates mutually exclusive options, and default values are indicated with bold. Synonyms for true: True, 1. Synonyms for false: False, 0.

Initial search and filtering
Display
Suppressing dialogs
Further limiting the hits

← Initial search and filtering

winnow max-per-region
Use the BLAST option to winnow the results to no more than max-per-region hits per region, as described in Berman et al., J Comput Biol. 7:293 (2000). Lower values correspond to more aggressive winnowing and fewer hits returned. All other filters (options below) are applied after winnowing. If the option is not specified or set to zero, no winnowing is done.

minScore score
Keep only BLAST hits with scores of at least score (default 50). The BLOSUM62 scoring matrix is used. Lowering the cutoff may increase structural coverage of the query, but low-scoring hits may be poorly aligned with the query sequence.

percentId percent
Keep only BLAST hits with percent sequence identity (%ID) of at least percent. %ID is based only on the region of hit-query alignment from BLAST. If the option is not specified, no filtering by %ID is done.

includeNative true | false
Whether to keep all hits (PDB entries) that are annotated with the same UniProt ID as the query, regardless of minScore and percentIdThreshold settings.

suppressDoubles true | false
Whether to keep only the highest-scoring hit to a given PDB entry. An entire PDB structure will be imported if any part is kept as a hit, but redundant chains will be hidden and its position in the mda layout will be based on that match to the query. If suppressDoubles is true (default), a PDB entry will not be imported more than once. If false, it is possible for the same PDB entry to be imported more than once and laid out in different places corresponding to different matches along the query. Regardless of this setting, however, only the PDB chain with the most structure residues will be kept as a hit when BLAST finds multiple identical chains from the same PDB entry.

forceBlast true | false
Whether to re-run BLAST instead of using any previously cached results. Even if this option is false, previously cached results will only be used when the query (the uniprotID) and winnowing level are the same as in an earlier use of mda. Whenever mda runs BLAST, the new results will be added to the cache, and if there were earlier results for the same query and winnowing level, they will be overwritten. Using cached results will speed mda execution at the possible cost of missing any newer PDB entries added to the sequence database since the time of caching.

← Display

group true | false
Whether to group sets of structures that cover similar parts of the query. A structure will be included in such a set if, in the alignment according to BLAST, it overlaps any other member of the set by at least 25 residues, but extends less than 25 residues beyond the alignment region of the member with the longest such region. If grouping is true (default), the structures in each set will be superimposed, their models will be grouped in the Model Panel, and only the representative (the member with the longest region of alignment) will be shown. If this option is false, the models in a set will still be laid out in the same “column,” but vertically separated from one another.

hideSubmodels true | false
Whether to hide submodels (except for the first) of multi-model PDB entries, namely NMR ensembles.

hideAltChain true | false
Whether to hide additional chains of a PDB entry that are identical to (same molecular entity as) the BLAST hit chain.

hideComplex true | false
Whether to hide nearby chains (nonidentical to the BLAST hit) and small molecules.

deleteHidden true | false
Whether to delete submodels or chains hidden by the three preceding options (improves performance).

coloring scheme
The scheme for coloring the structures can be one of the following:

mutations (default) - residues identical to those in the query sequence gray, residues that differ from the query red, unaligned parts transparent gray, complexed molecules blue (see named colors)
percentid - aligned residues a single color (per model) ranging from gray for the highest %ID to red for the lowest %ID (note the highest and lowest values might not be very far apart), unaligned residues the same color but transparent, complexed molecules blue
blastscore - aligned residues a single color (per model) ranging from gray for the highest BLAST score to red for the lowest BLAST score (note the highest and lowest values might not be very far apart), unaligned residues the same color but transparent, complexed molecules blue

In addition, three attributes are assigned to residues in the hit structures to allow custom coloring. The first two are quantitative, suitable for showing with Render by Attribute or the rangecolor command:

mda_blastscore - BLAST score, with the same value assigned to all residues in the same model as the BLAST hit
mda_percentid - %ID of BLAST hit, with the same value assigned to all residues in the same model as the hit
Although the third has numerical values, it is a classification rather than a continuously varying quantity:

mda_alignment, with values:

1 - in the BLAST-aligned region, same amino acid as as the target
2 - in the BLAST-aligned region, different amino acid than the target
3 - in the hit chain but not in the BLAST-aligned region
4 - in a co-complexed chain (a different chain than the hit)

Residues in each class above can be specified directly in commands, e.g.: color yellow :/mda_alignment=1

← Suppressing dialogs (helpful during scripted or repetitive use)

noConfirm true | false
Whether to simply import the hit structures (no matter how many) rather than asking the user for permission when more than ten are found.

showAlignment true | false
Whether to open Multalign Viewer and the Modeller interface. Not opening these dialogs allows faster execution.

← Further limiting the hits (structures to be used as templates)

limit min-per-residue[,max-gap-size]
Limit the number of hits (with priority given to those with higher %ID) such that every residue in the target sequence is covered by the smallest number of hits without going below min-per-residue and without consecutive stretches of target residues lacking structural coverage greater than max-gap-size (default 20 residues), if possible. (It may not be possible, depending on the initial set of hits and other filtering criteria.) Exclusions by this option can be overridden with keepPDB.

keepPDB pdb1[,pdb2,...]
Retain hits with the specified PDB IDs even if they would have been removed by the limit option.

skipPDB pdb1[,pdb2,...]
Omit hits with the specified PDB IDs.