AlphaFold is an artificial intelligence method for predicting protein structures that has been highly successful in recent tests. The method is described in:
Highly accurate protein structure prediction with AlphaFold. Jumper J, Evans R, Pritzel A, et al. Nature. 2021 Aug;596(7873):583-589.
Protein complex prediction with AlphaFold-Multimer. Evans R, O'Neill M, Pritzel A, et al. bioRxiv 2021.
The alphafold command:
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Varadi M, Anyango S, Deshpande M, et al. Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444.
The database contains models for protein sequences in UniProt (single chains only, not complexes); see the version option.
ColabFold: making protein folding accessible to all. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. Nat Methods. 2022 Jun;19(6):679-682.
AlphaFold-predicted structures vary in confidence levels (see coloring) and should be interpreted with caution. The alphafold command is also implemented as the tools AlphaFold and AlphaFold Error Plot. Several ChimeraX presentations and videos show modeling with AlphaFold and related analyses. See also: esmfold, blastprotein, modeller, swapaa
Usage: alphafold fetch uniprot-id [ alignTo chain-spec [ trim true | false ]] [ colorConfidence true | false ] [ ignoreCache true | false ] [ pae true | false ] [ version 1 | 2 | 3 | 4 ]
Usage: alphafold match sequence [ search true | false ] [ trim true | false ] [ colorConfidence true | false ] [ ignoreCache true | false ] [ pae true | false ]
Usage: alphafold search sequence [ matrix similarity-matrix ] [ cutoff evalue ] [ maxSequences M ] [ version 1 | 2 | 3 | 4 ]
alphafold fetch p29474
– OR – (equivalent)
open p29474 from alphafold
alphafold match #1Alternatively, the sequence can be given as any of the following:
alphafold match #3/B,D trim false
For a specified structure chain, a model is obtained for its exact UniProt entry if available, otherwise the single top hit identified by K-mer search of the AlphaFold Database (details...). For each model with a corresponding structure chain from the alphafold match command or the alignTo option of alphafold fetch:
The matrix option indicates which amino acid similarity-matrix to use for scoring the hits (uppercase or lowercase can be used): BLOSUM45, BLOSUM50, BLOSUM62 (default), BLOSUM80, BLOSUM90, PAM30, PAM70, PAM250, or IDENTITY. The cutoff evalue is the maximum or least significant expectation value needed to qualify as a hit (default 1e-3). Results can also be limited with the maxSequences option (default 100); this is the maximum number of unique sequences to return.
When results are returned, the hits are listed in a Blast Protein window. Double-clicking a hit uses alphafold fetch to retrieve the model, or multiple chosen hits can be retrieved at once by using the results panel context menu or Load Structures button (details...).
alignTo chain-spec
Superimpose the predicted structure from alphafold fetch onto a single chain in an already-open structure, and make its chain ID the same as that chain's. See also the trim option.
colorConfidence true | false
Whether to color the predicted structures by the pLDDT confidence measure in the B-factor field (default true):
- 100
to 90
– high accuracy expected
- 90
to 70
– backbone expected to be modeled well
- 70
to 50
– low confidence, caution
- 50
to 0
– should not be interpreted, may be disordered
...in other words, using
color bfactor palette alphafoldThe Color Key graphical interface or a command can be used to draw a corresponding color key, for example:
key red:low orange: yellow: cornflowerblue: blue:high [other-key-options]
directory results-folder
For alphafold predict, specify a results-folder pathname (name and location) or the word browse to specify it interactively in a file browser window. The directory or folder does not need to exist already, as it will be created by the command. The pathname can include [N] to indicate substitution with the smallest positive integer that makes a new directory (default ~/Downloads/ChimeraX/AlphaFold/prediction_[N]). If the specified pathname does not include [N] but a directory of that name and location already exists and contains a results.zip file, _[N] will be appended automatically to avoid overwriting the existing directory.
ignoreCache true | false
The fetched models are stored locally in ~/Downloads/ChimeraX/AlphaFold/, where ~ indicates a user's home directory. If a file specified for opening is not found in this local cache or ignoreCache is set to true, the file will be fetched and cached.
search true | false
When fetching models with alphafold match, whether to search the database for the most similar sequence if the UniProt accession number for a chain is not provided in the experimental structure's input file, or is provided but not found in the AlphaFold Database (true, default). A fast but low-sensitivity (requiring high % identity) K-mer search of the the database is performed. The closest sequence match for which a model is available will be retrieved. With search false, only the experimental structure's input file will be used as a potential source of UniProt accession numbers. When present, these are given in DBREF records in PDB format and in struct_ref and struct_ref_seq tables in mmCIF.
minimize true | false
This option allows skipping energy-minimization of the result from alphafold predict, for faster job completion and/or to avoid failures that may occur during minimization.
templates true | false
Whether a calculation run with alphafold predict should use known structures from the PDB as templates. AlphaFold can use up to four structures as templates; when this option is true, ColabFold will search the PDB sequences for similarity to the target and report in the Colab log which entries (if any) are used as templates.
trim true | false
Whether to trim a predicted protein structure to the same residue range as the corresponding experimental structure given with the alphafold match command or the alignTo option of alphafold fetch. With trim true (default):
- Predictions with UniProt identifier determined by alphafold match from the experimental structure's input file will be trimmed to the same residue ranges as used in the experiment. These ranges are given in DBREF records in PDB format and in struct_ref and struct_ref_seq tables in mmCIF.
- Predictions retrieved with alphafold fetch or found by alphafold match searching for similar sequences in the AlphaFold Database will be trimmed to start and end with the first and last aligned positions in the sequence alignment calculated by matchmaker as part of the superposition step.
Using trim false indicates retaining the full-length models for the UniProt sequences, which could be longer.
version 1 | 2 | 3 | 4
The AlphaFold Database contains single-chain models for protein sequences in UniProt. This option specifies which version of the database to use with alphafold fetch, alphafold pae, or alphafold search (as well as blastprotein with database alphafold):
- version 1 (Jul 2021): ~360,000 sequences, reference proteomes of 21 species including Homo sapiens; used by Chimera 1.3, which does not have a version option
- version 2 (Dec 2021 and Jan 2022 releases combined): ~1 million sequences, v1 + most of the manually curated entries (SwissProt) + sequences relevant to neglected tropical disease or antimicrobial resistance; default in ChimeraX 1.4
- version 3 (Jul 2022): >200 million sequences
- version 4 (Nov 2022): bugfix of version 3, updating the coordinates of ~4% of the entries; default in ChimeraX 1.5 and later
The alphafold match command always uses version 4 and does not have this option.
The alphafold predict command runs a calculation on Google Colab using ColabFold, an open-source, optimized version of AlphaFold 2. Users should cite:
ColabFold: making protein folding accessible to all. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. Nat Methods. 2022 Jun;19(6):679-682.
For monomer prediction:
Usage: alphafold predict sequence [ minimize true | false ] [ templates true | false ] [ directory results-folder ]
The protein sequence to predict can be given as any of the following:
These methods specify the entire sequence only, although a truncated version could be pasted.
For multimer (protein complex) prediction:
Usage: alphafold predict chain-spec [ minimize true | false ] [ templates true | false ] [ directory results-folder ]
– or – Usage: alphafold predict sequence1,sequence2[,sequence3...,sequenceN] [ minimize true | false ] [ templates true | false ] [ directory results-folder ]
The sequences of two or more protein chains can be specified either collectively as a chain-spec (for atomic-structure chains already open in ChimeraX), or individually within a comma-separated list using any combination of specifier types #2-4 listed above. The comma-separated list should not contain any spaces. If the same protein chain occurs multiple times in the complex, its sequence should be repeated that number of times. For example, to predict a homodimer, the same sequence (or its specifier) would need to be given twice. Prediction may only be feasible for smaller complexes (details...).
A warning will appear saying that this Colab notebook is from github (was not authored by Google), with a button to click to run anyway. Users will need to have a Google account and to sign into it via a browser. Once that is done, the sign-in may be remembered depending on the user's browser settings; it is not kept in the ChimeraX preferences. See the example video for an explanation of the images/plots from ColabFold that appear in the Colab log and where to find downloaded files.
The model will be opened automatically and colored by confidence value. The model for a sequence that was specified by structure chain will be superimposed on that chain and assigned structure-comparison attributes for further analysis (details...).
Besides the per-residue pLDDT confidence measure, AlphaFold gives for each pair of residues (X,Y) the expected position error at residue X if the predicted and true structures were aligned on residue Y. These residue-residue “predicted aligned error” or PAE values can be shown as a 2D plot using the command alphafold pae (details below), the AlphaFold Error Plot tool, or alphafold fetch or alphafold match with the option pae true. See also the AlphaFold Error Estimates example and video.
Usage: alphafold pae [ model-spec ] ( uniprotId uniprot | file filename ) [ palette palette ] [ range low,high | full ] [ plot true | false ] [ colorDomains true | false ] [ minSize M ] [ connectMaxPae N ] [ cluster resolution ] [ dividerLines true | false ] [ version 1 | 2 | 3 | 4 ]
With alphafold pae, the matrix of PAE values can be either:
The corresponding AlphaFold structure (already open) can be given as a model-spec to associate it with the plot. This association allows coloring by domain as described below, and for selections on the plot to highlight the corresponding parts of the structure.
By default, the PAE plot is drawn when domain coloring is not done (plot is default true when colorDomains is false) and vice versa.
Setting colorDomains to true clusters the residues into coherent domains (sets of residues with relatively low pairwise PAE values) and uses randomly chosen colors to distinguish these domains in the structure. The residues are assigned an integer domain identifier (starting with 1) as an attribute named pae_domain that can be used to specify them in commands (for example, to recolor or select specific domains). Residues not grouped into any domain are assigned a pae_domain value of None. The clustering uses the NetworkX greedy_modularity_communities algorithm with parameters:
The dividerLines option (default true) indicates whether, for multimer predictions, to draw lines on the plot demarcating the end of one chain and the start of another. The lines may obscure a few chain-terminal residues in the plot, and dividerLines false can be used if this is problematic.
The default palette for coloring the PAE plot is pae, with colors assigned to values as follows:
0 | 5 | 10 | 15 | 20 | 25 | 30 |
Another palette with value range suitable for PAE plots is paegreen:
0 | 5 | 10 | 15 | 20 | 25 | 30 |
The plot window has a context menu with options for recoloring the plot and associated structure, hiding/showing the divider lines, and saving the plot as an image, as well as buttons for recoloring the structure:
The Color Key graphical interface or a command can be used to draw (in the main graphics window) a color key for the PAE plot. For example, to make a color key that matches the pae or paegreen scheme, respectively:
key pae :0 : : :15 : : :30 showTool true
key paegreen :0 : : :15 : : :30 showTool true
A title for the color key (e.g., “Predicted Aligned Error (Å)”) would need to be created separately with 2dlabels.
Residue-residue PAE values can also be shown with colored pseudobonds in the predicted structure:
Usage: alphafold contacts res-spec1 [ toResidues res-spec2 [ flip true | false ] [ distance d ] [ palette palette ] [ range low,high | full ] [ radius r ] [ dashes N ] [ name model-name ] [ replace true | false ] [ outputFile pae-file ]
See the AlphaFold Contacts example and video. See also: size, style, contacts, crosslinks, rename
A PAE plot containing the specified residues must already be shown. The PAE matrix is not symmetrical. The first specification res-spec1 gives the aligned residues, whereas toResidues res-spec2 gives the residues whose error values are reported, except that using flip true swaps the meaning of res-spec1 and res-spec2. If one set of residues is higher-confidence (lower in pLDDT) than the other, it is usually best to specify them as the aligned residues so that the coloring will show the error values of the lower-confidence set.
Omitting the toResidues option defines res-spec2 as all residues covered by the PAE plot except for those in res-spec1; however, if toResidues is omitted and res-spec1 includes all residues in the plot, res-spec2 will also be defined as all residues in the plot.
The distance option allows limiting the number of pseudobonds by only drawing them between pairs of residues with any interresidue distance ≤ d Å (default 3.0). These pseudobonds are drawn between α-carbons regardless of which atoms were within the distance cutoff.
The default palette for coloring the pseudobonds by PAE value is paecontacts, with colors assigned to values as follows:
0 | 5 | 10 | 15 | 20 |
Although this palette includes value-color pairs, it may be helpful to give a value range if a colors-only palette is used instead. A range can also be used to override the values in a value-color palette, instead spacing the colors evenly across the specified range.
The pseudobond stick radius (default 0.2 Å) and number of dashes (default 1, meaning a solid stick) can also be specified.
The name option allows specifying the pseudobond model-name (default PAE Contacts). If a model by that name already exists, any pre-existing pseudobonds will be removed from that model and replaced by the new ones (replace true, default) unless replace false is used.
The outputFile option allows saving a list of the residue pairs (those meeting the distance criterion) and their PAE values to a plain text file. The pae-file argument is the output file pathname, enclosed in quotation marks if it includes spaces, or the word browse to specify it interactively in a file browser window.
alphafold contacts #1The following would select all pseudobonds and label them with the names and numbers of the residues that they connect:
alphafold contacts /A to /B distance 8
alphafold contacts sel palette blue:red range 1,5
sel pbonds
label sel pseudobonds text "{0.atoms[0].residue.name} {0.atoms[0].residue.number} to {0.atoms[1].residue.name} {0.atoms[1].residue.number}"