ChimeraX docs icon

Tool: ESMFold

ESMFold (Evolutionary Scale Modeling) is an artificial intelligence method for predicting protein structures. The method is described in:

Evolutionary-scale prediction of atomic-level protein structure with a language model. Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Smetanin N, Verkuil R, Kabeli O, Shmueli Y, Dos Santos Costa A, Fazel-Zarandi M, Sercu T, Candido S, Rives A. Science. 2023 Mar 17;379(6637):1123-1130.

The ChimeraX ESMFold tool:

ESMFold-predicted structures vary in confidence levels (see coloring) and should be interpreted with caution. The related tool ESMFold Error Plot plots residue-residue alignment errors for ESMFold structures. The ESMFold tool is also implemented as the esmfold command. See the ChimeraX ESMFold example. See also: AlphaFold, Blast Protein, Modeller Comparative, Model Loops, Rotamers

ESMFold Dialog
ESMFold Coloring Dialog
ESM Fold Error Plot

ESMFold Dialog

The ESMFold tool can be opened from the Structure Prediction section of the Tools menu and manipulated like other panels (more...).

The Sequence can be specified by UniProt name or accession number, pasted in as plain text, or chosen from the menu of currently open protein structure chains.

Fetch
Search
Predict
Options

Fetch gets the most sequence-similar model available from the ESM Metagenomic Atlas for each specified chain. Specifying a whole model specifies all of its protein chains. For each chain, a model is obtained for the single top hit identified by K-mer search of the ESM Metagenomic Atlas. The corresponding command is esmfold match. If the sequence was specified by structure chain, then:

  1. the chain ID of the predicted structure is made the same as the corresponding chain of the existing model
  2. the predicted structure is superimposed onto the existing chain using matchmaker, and the following are reported in a table in the Log:
  3. by default, the predicted structure is trimmed to the same residue range as the existing chain (details...)
  4. the following attributes are assigned to the residues of the predicted structure: These attributes can be used for coloring and other purposes.

The fetched models are stored locally in ~/Downloads/ChimeraX/ESMFold/, where ~ indicates a user's home directory. If a file specified for opening is not found in this local cache, the file will be fetched and cached.

Search uses a BLAST web service hosted by the UCSF RBVI to search the ESM Metagenomic Atlas using default parameters: BLOSUM62 amino acid similarity matrix for scoring the hits, similarity score cutoff e-value 1e-3, returning a maximum of 100 of unique sequences. However, different values of these parameters can be specified using the corresponding command, esmfold search. Search differs from Fetch in that it uses BLAST instead of fast (but low-sensitivity) K-mer searching, accepts only a single chain or sequence as input, and returns a list of hits for the user to inspect, rather than fetching the single top hit per chain automatically. When results are returned, the hits are listed in a Blast Protein window. Double-clicking a hit uses esmfold fetch to retrieve the model, or multiple chosen hits can be retrieved at once by using the results panel context menu or Load Structures button (details...).

Predict runs a calculation on the prediction server provided by the from the ESM Metagenomic Atlas. The corresponding command is esmfold predict.

The Options button shows/hides additional options:

The model for a sequence that was specified by structure chain will be superimposed on that chain and assigned structure-comparison attributes for further analysis (details...).

Please note the following caveats of running a prediction:

Coloring shows the ESMfold Coloring dialog for applying different color schemes to the predicted structures, as well as hiding, showing, and selecting their residues based on attribute value.

Error plot draws the ESMFold Error Plot, in which color gradations show (for each pairwise combination of residues) the expected error in position of one residue when the true and predicted structures are aligned based on the other residue.

ESMFold Coloring Dialog

Clicking the Coloring button on the main ESMFold tool shows the ESMFold Coloring dialog for applying different color schemes to the predicted structures, as well as hiding, showing, and selecting their residues based on attribute value.

When first opened, ESMFold-predicted structures are automatically colored by the pLDDT confidence measure (same as for AlphaFold except mapped to 0-1 instead of 0-100) in the B-factor field:

...in other words, using

color bfactor palette esmfold
The Color Key graphical interface or a command can be used to draw a corresponding color key, for example:

key red:low orange: yellow: cornflowerblue: blue:high  [other-key-options]

In the ESMFold Coloring dialog, the Residues to act on are specified by using the menus to choose an ESMFold-predicted model and one of the following:

Buttons act on the designated residues:

The ESMFold Coloring dialog does not color continuously along a gradient to show the attribute values. For coloring along a gradient, see Render by Attribute and/or the commands color bfactor (for the confidence value, which is read from the B-factor field of the PDB file) and color byattribute (for other numerical attributes).

ESMFold Error Plot

Besides the per-residue pLDDT confidence measure, ESMFold gives for each pair of residues (X,Y) the expected position error at residue X if the predicted and true structures were aligned on residue Y. These residue-residue “predicted aligned error” or PAE values are not provided by the prediction server, but are available for structures already in the ESM Metagenomic Atlas and can be shown with ESMFold Error Plot, which can be opened from the Structure Prediction section of the Tools menu and manipulated like other panels (more...). See also: esmfold contacts

Choosing the corresponding ESMFold structure from the menu of open atomic models associates it with the plot. This association allows coloring the structure as described below, and for selections on the plot to highlight the corresponding parts of the structure.

The PAE values can be either:

The PAE plot can also be shown by clicking the Error plot button on the ESMFold dialog or by using the command esmfold pae, or the command esmfold fetch or esmfold match with the option pae true.

When the mouse cursor is over the plot, the residue pair and PAE value at its current position are reported in the bottom right corner of the window.

Clicking Color PAE Domains clusters the residues into coherent domains (sets of residues with relatively low PAE values) and uses randomly chosen colors to distinguish these domains in the structure (details...). Clicking Color pLDDT returns the structure to the default confidence coloring.

The plot's context menu includes:

The Color Key graphical interface or a command can be used to draw (in the main graphics window) a color key for the PAE plot. For example, to make a color key that matches the pae or paegreen scheme, respectively:

key pae :0 : : :15 : : :30  showTool true
key paegreen :0 : : :15 : : :30  showTool true

A title for the color key (e.g., “Predicted Aligned Error (Å)”) would need to be created separately with 2dlabels.


UCSF Resource for Biocomputing, Visualization, and Informatics / March 2023