Tool: AlphaFold

AlphaFold is an artificial intelligence method for predicting protein structures that has been highly successful in recent tests. The method is described in:

Highly accurate protein structure prediction with AlphaFold. Jumper J, Evans R, Pritzel A, et al. Nature. 2021 Aug;596(7873):583-589.
Protein complex prediction with AlphaFold-Multimer. Evans R, O'Neill M, Pritzel A, et al. bioRxiv 2021.

The ChimeraX AlphaFold tool:

finds and retrieves existing models from the AlphaFold Database:

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Varadi M, Anyango S, Deshpande M, et al. Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444.
The database contains models for sequences (as single chains, not complexes) in UniProt:
Version 1 (Jul 2021, used by ChimeraX 1.3): ~360,000 sequences, reference proteomes of 21 species including Homo sapiens
Version 2 (Dec 2021 and Jan 2022 releases combined, default in ChimeraX 1.4): ~1 million sequences, v1 + most of SwissProt + sequences relevant to neglected tropical disease or antimicrobial resistance
Version 3 (Jul 2022): >200 million sequences
Version 4 (Nov 2022; default in ChimeraX 1.5 and later): bugfix of version 3, updating the coordinates of ~4% of the entries
runs new AlphaFold predictions on Google Colab using ColabFold, an open-source, optimized version of AlphaFold 2:
ColabFold: making protein folding accessible to all. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. Nat Methods. 2022 Jun;19(6):679-682.

AlphaFold-predicted structures vary in confidence levels (see coloring) and should be interpreted with caution. The related tool AlphaFold Error Plot plots the predicted errors in interactions between different parts of an AlphaFold structure.

The AlphaFold tool is also implemented as the alphafold command. Several ChimeraX presentations and videos show modeling with AlphaFold and related analyses. See also: Boltz, ESMFold, Blast Protein, Modeller Comparative, Model Loops, Rotamers

AlphaFold Dialog
AlphaFold Coloring Dialog
AlphaFold Error Plot

[back to top: AlphaFold]

AlphaFold Dialog

The AlphaFold tool can be opened from the Structure Prediction section of the Tools menu and manipulated like other panels (more...).

The Sequence can be specified by UniProt name or accession number, pasted in as plain text, or chosen from the menu of currently open protein structure chains.

Fetch
Search
Predict
Options

← Fetch gets the most sequence-similar model available from the AlphaFold Database for each specified chain. Specifying a whole model specifies all of its protein chains. For each chain, a model is obtained for the exact UniProt entry if available, otherwise the single top hit identified by K-mer search of the AlphaFold Database (details...). The corresponding command is alphafold match. If the sequence was specified by structure chain, then:

the chain ID of the predicted structure is made the same as the corresponding chain of the existing model
the predicted structure is superimposed onto the existing chain using matchmaker, and the following are reported in a table in the Log:
- Chain – chain ID
- UniProt Name and UniProt Id (accession number)
- RMSD – Cα root-mean-square deviation between the predicted and experimental structures, over all residues of the latter
- Length – number of residues in the predicted structure
- Seen – number of residues with atomic coordinates in the experimental structure
- % Id – percent identity in the sequence alignment generated by matchmaker for superposition; the number of positions with identical residues divided by the length of the shorter sequence
by default, the predicted structure is trimmed to the same residue range as the existing chain (details...)
the following attributes are assigned to the residues of the predicted structure:
- c_alpha_distance – Cα distance between corresponding positions of the predicted and existing chains after their superposition (step 2 above)
- missing_structure – positions missing from the coordinates of the existing chain
- same_sequence – positions with different residue types than the existing chain
These attributes can be used for coloring and other purposes.

The fetched models are stored locally in ~/Downloads/ChimeraX/AlphaFold/, where ~ indicates a user's home directory. If a file specified for opening is not found in this local cache, the file will be fetched and cached.

← Search uses a BLAST web service hosted by the UCSF RBVI to search the AlphaFold Database using default parameters: BLOSUM62 amino acid similarity matrix for scoring the hits, similarity score cutoff e-value 1e-3, returning a maximum of 100 of unique sequences. However, different values of these parameters can be specified using the corresponding command, alphafold search. Search differs from Fetch in that it uses BLAST instead of fast (but low-sensitivity) K-mer searching, accepts only a single chain or sequence as input, and returns a list of hits for the user to inspect, rather than fetching the single top hit per chain automatically. When results are returned, the hits are listed in a Blast Protein window. Double-clicking a hit uses alphafold fetch to retrieve the model, or multiple chosen hits can be retrieved at once by using the results panel context menu or Load Structures button (details...).

← Predict runs a calculation on Google Colab using ColabFold, an open-source, optimized version of AlphaFold 2. The corresponding command is alphafold predict. Users should cite:

ColabFold: making protein folding accessible to all. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. Nat Methods. 2022 Jun;19(6):679-682.

For predicting a complex (multimer), the sequences of all chains in the complex must be given. The same sequence must be given multiple times if it occurs in multiple copies in the complex. The sequences can be specified either collectively as a model number chosen from the menu of currently open models (e.g. when that model contains multiple chains), or individually within a comma-separated list of UniProt identifiers or pasted-in amino acid sequences. Prediction may only be feasible for smaller complexes (details...).

A warning will appear saying that this Colab notebook is from github (was not authored by Google), with a button to click to run anyway. Users will need to have a Google account and to sign into it via a browser. Once that is done, the sign-in may be remembered depending on the user's browser settings; it is not kept in the ChimeraX preferences. See the example video for an explanation of the images/plots from ColabFold that appear in the Colab window and where to find downloaded files.

← The Options button shows/hides additional options:

Results directory (default ~/Downloads/ChimeraX/AlphaFold/prediction_[N]) – the pathname (name and location) of a folder or directory in which to store prediction results. Clicking Browse brings up a file browser window for choosing it interactively. The directory does not need to exist already, as it will be created by running the prediction. As shown in the default, the pathname can include [N] to indicate substitution with the smallest positive integer that makes a new directory. If the specified pathname does not include [N] but a directory of that name and location already exists and contains a results.zip file, _[N] will be appended automatically to avoid overwriting the existing directory.
Use PDB templates when predicting structures (default off) – AlphaFold can use up to four structures as templates; when this option is on, ColabFold will search the PDB sequences for similarity to the target and report in the Colab log which entries (if any) are used as templates
Energy-minimize predicted structures (default off) – turning this off allows for faster job completion and/or avoidance of failures that may occur during minimization
Trim fetched structure to the aligned structure sequence (default on) – whether to trim a fetched structure to the same residue range as the structure to which it is aligned (details...)

The model for a sequence that was specified by structure chain will be superimposed on that chain and assigned structure-comparison attributes for further analysis (details...).

Please note the following caveats of running a prediction:

Results may be lost if the local computer goes to sleep. Google intends Colab to be for interactive use. Even if the Colab job completes, the results may fail to download to the local computer if it has gone to sleep. It is recommended to turn off the option to enter sleep mode (meant to conserve power after some amount of idle time) before running a prediction.
The process includes installing various software packages on a virtual machine, searching sequence databases, generating a multiple sequence alignment, predicting atomic coordinates, and optionally, energy-minimizing the best structure. In addition, predicting a multimer (complex) structure may take longer than predicting the structure of a monomer with the same total number of residues. The free version of Colab limits jobs to 12 hours and may terminate them at shorter times at Google's discretion (see the FAQ). Those who want to run longer and/or more frequent calculations may need to sign up for one of the paid Colab plans.
Each chain must contain at least 16 residues. Shorter sequences are not accepted because they cannot be used to generate a reliable multiple sequence alignment.
Total sequence length cannot be very large. AlphaFold runs out of graphics memory for long sequences (~1200 amino acids on old Google Colab GPUs with 16 GB memory). Multimer predictions face the same limit on the total number of residues, so only smaller complexes can be predicted. As mentioned above, paid Colab plans provide more computational resources than the free plan. Structures with up to 3000 amino acids can be predicted using an Nvidia A100 GPU on Google Colab, costing about $1.50 for a 2000-residue prediction (May 2023); this video explains how.

Coloring shows the Alphafold Coloring dialog for applying different color schemes to the predicted structures, as well as hiding, showing, and selecting their residues based on attribute value.

Error plot draws the AlphaFold Error Plot, in which color gradations show (for each pairwise combination of residues) the expected error in position of one residue when the true and predicted structures are aligned based on the other residue.

AlphaFold Coloring Dialog

Clicking the Coloring button on the main AlphaFold tool shows the AlphaFold Coloring dialog for applying different color schemes to the predicted structures, as well as hiding, showing, and selecting their residues based on attribute value.

When first opened, AlphaFold-predicted structures are automatically colored by the pLDDT confidence measure in the B-factor field:

100

to 90

– high accuracy expected
90

to 70

– backbone expected to be modeled well
70

to 50

– low confidence, caution
50

to 0

– should not be interpreted, may be disordered

...in other words, using

color bfactor palette alphafold

The Color Key graphical interface or a command can be used to draw a corresponding color key, for example:

key red:low orange: yellow: cornflowerblue: blue:high [other-key-options]

In the AlphaFold Coloring dialog, the Residues to act on are specified by using the menus to choose an AlphaFold-predicted model and one of the following:

all – all residues
confidence below [N] – based on the bfactor atom attribute (the confidence value is read from the B-factor field of the PDB file)
C-alpha distance greater than [d] – based on the c_alpha_distance residue attribute of AlphaFold models fetched by existing structure chain; Cα distance between corresponding positions of the predicted and existing chains after their automatic superposition
missing structure – based on the missing_structure residue attribute of AlphaFold models fetched by existing structure chain; positions missing from the coordinates of the existing chain
different sequence – based on the same_sequence residue attribute of AlphaFold models fetched by existing structure chain; positions with different residue types than the existing chain
confidence above [N] – based on the bfactor atom attribute (the confidence value is read from the B-factor field of the PDB file)
C-alpha distance less than [d] – based on the c_alpha_distance residue attribute of AlphaFold models fetched by existing structure chain; Cα distance between corresponding positions of the predicted and existing chains after their automatic superposition
paired structure – based on the missing_structure residue attribute of AlphaFold models fetched by existing structure chain; positions present in the coordinates of the existing structure chain
same sequence – based on the same_sequence residue attribute of AlphaFold models fetched by existing structure chain; positions with identical residue types as the existing chain

Buttons act on the designated residues:

Color buttons:
- Custom for choosing a color interactively using the system color picker
- a series of square buttons for specific colors:
Hide – hide the specified residues
Show – show the specified residues
Select – select the specified residues

The AlphaFold Coloring dialog does not color continuously along a gradient to show the attribute values. For coloring along a gradient, see Render by Attribute and/or the commands color bfactor (for the confidence value, which is read from the B-factor field of the PDB file) and color byattribute (for other numerical attributes).

[back to top: AlphaFold]

AlphaFold Error Plot

Besides the per-residue pLDDT confidence measure, AlphaFold gives for each pair of structural entities (X,Y) the expected position error at entity X if the predicted and true structures were aligned on Y. Structural entities include standard biopolymer residues as well as the individual atoms of other types of residues: ligands, ions, glycans, and post-translationally modified residues. Only AlphaFold 3 (not earlier versions) generates predictions that include these other types of residues. The “predicted aligned error” or PAE values can be shown with AlphaFold Error Plot, which can be opened from the Structure Prediction section of the Tools menu and manipulated like other panels (more...). PAE and other pairwise metrics associated with ModelArchive entries can also be plotted (see modelcif pae). See also: the AlphaFold Error Estimates example and video, alphafold contacts

Choosing the corresponding AlphaFold structure from the menu of open atomic models associates it with the plot. This association allows coloring the structure as described below, and for selections on the plot to highlight the corresponding parts of the structure.

The PAE values can be:

fetched from the AlphaFold Database by giving the UniProt name or accession number of an entry in that database
read from a file in one of the following formats: json, pkl, npy (from Chai-1), or npz (from Boltz-1)

The PAE plot can also be shown by clicking the Error plot button on the AlphaFold dialog or by using the command alphafold pae, the command alphafold fetch or alphafold match with the option pae true, or the open command.

When the mouse cursor is over the plot, the residue pair and PAE value at its current position are reported in the bottom right corner of the window.

Clicking Color PAE Domains clusters the entities into coherent domains (sets with relatively low PAE values) and uses randomly chosen colors to distinguish these domains in the structure (details...). Clicking Color pLDDT returns the structure to the default confidence coloring.

The plot's context menu includes:

Dragging box colors structure (initial default checked on) – whether dragging a box on the plot highlights the corresponding parts of the 3D structure with bright colors and makes everything else gray; if this option is unchecked, highlighting will be done with selection instead of coloring
Color plot from structure – color the plot to match the 3D structure where the pair of entities represented by an X,Y point have the same ribbon color; show the rest of the plot in shades of gray
Color plot rainbow – use the pae palette (default) to color the plot, with colors assigned to values as follows:
0 5 10 15 20 25 30
Color plot green – use the paegreen palette to color the plot:
0 5 10 15 20 25 30
Show chain divider lines (initial default checked on) – for multimer predictions, draw lines on the plot demarcating the end of one chain and the start of another; the lines may obscure a few chain-terminal residues in the plot, and can be hidden if this is problematic. For predictions that include nonstandard residues and/or covalent modifications, divider lines also segregate the entire set of such entities from the biopolymer chain(s).
Save image – save the plot as a PNG file

The Color Key graphical interface or a command can be used to draw (in the main graphics window) a color key for the PAE plot. For example, to make a color key that matches the pae or paegreen scheme, respectively:

key pae :0 : : :15 : : :30 showTool true
key paegreen :0 : : :15 : : :30 showTool true

A title for the color key (e.g., “Predicted Aligned Error (Å)”) would need to be created separately with 2dlabels.

UCSF Resource for Biocomputing, Visualization, and Informatics / May 2025