ChimeraX docs icon

Tool: AlphaFold

AlphaFold is an artificial intelligence method for predicting protein structures that has been highly successful in recent tests. The method is described in:

Highly accurate protein structure prediction with AlphaFold. Jumper J, Evans R, Pritzel A, et al. Nature. 2021 Aug;596(7873):583-589.
Protein complex prediction with AlphaFold-Multimer. Evans R, O'Neill M, Pritzel A, et al. bioRxiv 2021.

The ChimeraX AlphaFold tool:

AlphaFold-predicted structures vary in confidence levels (see coloring) and should be interpreted with caution. The related tool AlphaFold Error Plot plots the predicted errors in interactions between different parts of an AlphaFold structure.

The AlphaFold tool is also implemented as the alphafold command. Several ChimeraX presentations and videos show modeling with AlphaFold and related analyses. See also: ESMFold, Blast Protein, Modeller Comparative, Model Loops, Rotamers

AlphaFold Dialog
AlphaFold Coloring Dialog
AlphaFold Error Plot

AlphaFold Dialog

The AlphaFold tool can be opened from the Structure Prediction section of the Tools menu and manipulated like other panels (more...).

The Sequence can be specified by UniProt name or accession number, pasted in as plain text, or chosen from the menu of currently open protein structure chains.

Fetch
Search
Predict
Options

Fetch gets the most sequence-similar model available from the AlphaFold Database for each specified chain. Specifying a whole model specifies all of its protein chains. For each chain, a model is obtained for the exact UniProt entry if available, otherwise the single top hit identified by K-mer search of the AlphaFold Database (details...). The corresponding command is alphafold match. If the sequence was specified by structure chain, then:

  1. the chain ID of the predicted structure is made the same as the corresponding chain of the existing model
  2. the predicted structure is superimposed onto the existing chain using matchmaker, and the following are reported in a table in the Log:
  3. by default, the predicted structure is trimmed to the same residue range as the existing chain (details...)
  4. the following attributes are assigned to the residues of the predicted structure: These attributes can be used for coloring and other purposes.

The fetched models are stored locally in ~/Downloads/ChimeraX/AlphaFold/, where ~ indicates a user's home directory. If a file specified for opening is not found in this local cache, the file will be fetched and cached.

Search uses a BLAST web service hosted by the UCSF RBVI to search the AlphaFold Database using default parameters: BLOSUM62 amino acid similarity matrix for scoring the hits, similarity score cutoff e-value 1e-3, returning a maximum of 100 of unique sequences. However, different values of these parameters can be specified using the corresponding command, alphafold search. Search differs from Fetch in that it uses BLAST instead of fast (but low-sensitivity) K-mer searching, accepts only a single chain or sequence as input, and returns a list of hits for the user to inspect, rather than fetching the single top hit per chain automatically. When results are returned, the hits are listed in a Blast Protein window. Double-clicking a hit uses alphafold fetch to retrieve the model, or multiple chosen hits can be retrieved at once by using the results panel context menu or Load Structures button (details...).

Predict runs a calculation on Google Colab using ColabFold, an open-source, optimized version of AlphaFold 2. The corresponding command is alphafold predict. Users should cite:

ColabFold: making protein folding accessible to all. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. Nat Methods. 2022 Jun;19(6):679-682.

For predicting a complex (multimer), the sequences of all chains in the complex must be given. The same sequence must be given multiple times if it occurs in multiple copies in the complex. The sequences can be specified either collectively as a model number chosen from the menu of currently open models (e.g. when that model contains multiple chains), or individually within a comma-separated list of UniProt identifiers or pasted-in amino acid sequences. Prediction may only be feasible for smaller complexes (details...).

A warning will appear saying that this Colab notebook is from github (was not authored by Google), with a button to click to run anyway. Users will need to have a Google account and to sign into it via a browser. Once that is done, the sign-in may be remembered depending on the user's browser settings; it is not kept in the ChimeraX preferences. See the example video for an explanation of the images/plots from ColabFold that appear in the Colab window and where to find downloaded files.

The Options button shows/hides additional options:

The model for a sequence that was specified by structure chain will be superimposed on that chain and assigned structure-comparison attributes for further analysis (details...).

Please note the following caveats of running a prediction:

Coloring shows the Alphafold Coloring dialog for applying different color schemes to the predicted structures, as well as hiding, showing, and selecting their residues based on attribute value.

Error plot draws the AlphaFold Error Plot, in which color gradations show (for each pairwise combination of residues) the expected error in position of one residue when the true and predicted structures are aligned based on the other residue.

See also: batch predictions

AlphaFold Coloring Dialog

Clicking the Coloring button on the main AlphaFold tool shows the AlphaFold Coloring dialog for applying different color schemes to the predicted structures, as well as hiding, showing, and selecting their residues based on attribute value.

When first opened, AlphaFold-predicted structures are automatically colored by the pLDDT confidence measure in the B-factor field:

...in other words, using

color bfactor palette alphafold

The Color Key graphical interface or a command can be used to draw a corresponding color key, for example:

key red:low orange: yellow: cornflowerblue: blue:high  [other-key-options]

In the AlphaFold Coloring dialog, the Residues to act on are specified by using the menus to choose an AlphaFold-predicted model and one of the following:

Buttons act on the designated residues:

The AlphaFold Coloring dialog does not color continuously along a gradient to show the attribute values. For coloring along a gradient, see Render by Attribute and/or the commands color bfactor (for the confidence value, which is read from the B-factor field of the PDB file) and color byattribute (for other numerical attributes).

AlphaFold Error Plot

Besides the per-residue pLDDT confidence measure, AlphaFold gives for each pair of structural entities (X,Y) the expected position error at entity X if the predicted and true structures were aligned on Y. Structural entities include standard biopolymer residues as well as the individual atoms of other types of residues: ligands, ions, glycans, and post-translationally modified residues. Only AlphaFold 3 (not earlier versions) generates predictions that include these other types of residues. The “predicted aligned error” or PAE values can be shown with AlphaFold Error Plot, which can be opened from the Structure Prediction section of the Tools menu and manipulated like other panels (more...). See also: the AlphaFold Error Estimates example and video, alphafold contacts

Choosing the corresponding AlphaFold structure from the menu of open atomic models associates it with the plot. This association allows coloring the structure as described below, and for selections on the plot to highlight the corresponding parts of the structure.

The PAE values can be either:

The PAE plot can also be shown by clicking the Error plot button on the AlphaFold dialog or by using the command alphafold pae, the command alphafold fetch or alphafold match with the option pae true, or the open command.

When the mouse cursor is over the plot, the residue pair and PAE value at its current position are reported in the bottom right corner of the window.

Clicking Color PAE Domains clusters the entities into coherent domains (sets with relatively low PAE values) and uses randomly chosen colors to distinguish these domains in the structure (details...). Clicking Color pLDDT returns the structure to the default confidence coloring.

The plot's context menu includes:

The Color Key graphical interface or a command can be used to draw (in the main graphics window) a color key for the PAE plot. For example, to make a color key that matches the pae or paegreen scheme, respectively:

key pae :0 : : :15 : : :30  showTool true
key paegreen :0 : : :15 : : :30  showTool true

A title for the color key (e.g., “Predicted Aligned Error (Å)”) would need to be created separately with 2dlabels.


UCSF Resource for Biocomputing, Visualization, and Informatics / May 2024