CryoDRGN
CryoDRGN: reconstruction of heterogeneous cryo-EM structures using
neural networks.
Zhong ED, Bepler T, Berger B, Davis JH.
Nat Methods. 2021 Feb;18(2):176-185.
- Cryo-EM single-particle analysis has proven powerful in determining
the structures of rigid macromolecules. However, many complexes
exhibit conformational and compositional heterogeneity that pose
a major challenge to existing 3D reconstruction methods.
- cryoDRGN uses deep neural networks to reconstruct continuous distributions
of 3D density maps reflecting the heterogeneity within single-particle
cryo-EM datasets.
- CryoDRGN is open-source software freely available from
http://cryodrgn.csail.mit.edu
[back to paper list]
Existing Methods: Homogeneous vs. Heterogeneous
- Homogeneous reconstruction considers the data to represent
a single conformational (and compositional) state
- Many methods require classifying particles into quasi-homogeneous subsets,
then using the different classes for separate reconstructions
Problems with this approach:
- The number of states/classes is not known a priori and may
be difficult to determine accurately
- Using a discrete number of classes does not account for continuous
variations in conformation
More-Advanced Heterogeneous Methods
- RELION multibody refinement
– models the structure as the sum of user-defined rigid bodies
that can rotate relative to one another
- cryoSPARC 3D variability analysis
– PCA-based, represents variability within a linear subspace
(which may be misleading when structural deformations are not
well described by linear interpolations between basis vectors)
- manifold embedding
– binning particles along the data manifold followed by
traditional homogeneous reconstruction
- others...
CryoDRGN (Deep Reconstructing Generative Networks) Method
- unsupervised learning by two neural networks, encoder and decoder
- encodes 2D images into a latent space from which 3D maps are reconstructed
- assumes structures fall along a low-dimensional manifold in the latent space
- dimensionality specified by the user, 10 dimensions generally effective
- density maps can be generated from arbitrary values of the latent variable
- regions of latent space rich in impurities or artifacts can be filtered out
- initial filtering run can be done on downsampled data (Fourier clipping)
What does CryoDRGN allow researchers to do?
- get latent-space encoding of particle distribution
(which can be embedded in 2D for visualization, e.g. with PCA or UMAP)
- generate density maps for exploratory analysis
- extract particle subsets for use with other tools
- generate trajectories of molecular motions (as maps)
- mainly, to characterize cryoEM dataset heterogeneity and
generate maps of the various states without having to classify the
particles ahead of time
“Homogeneous” High-Resolution Data
EMPIAR (Electron Microscopy Public Image ARchive)
– raw data for some EMDB entries. 2022-08-02: 975 datasets, 2.0 PB.
For RAG1-RAG2 (EMPIAR 10049), P. falciparum 80S ribosome
(EMPIAR 10028):
- A.
cryoDRGN
(1024 x 3, "no latent variable input")
vs. traditional (cryoSPARC)
- B. FSC (Fourier shell correlation) curves for NN architectures (E),
resln <4Å at FSC 0.5
- C. FSC evolution over training epochs, D. similar for loss
- E. Architecture is tradeoff between resolution and speed
- F. CryoDRGN map with RAG1-RAG2 atomic structure (PDB 3jbx)
Simulated Heterogeneity
- A. Conformational:
Uniform sampling along 1D reaction coordinate of rotating
a single dihedral, Cooperative biased to specific states,
Noncontiguous with unobserved transition states
- B. Compositional from mixing bacterial 30S, 50S,
and 70S ribosomal complexes
- C. for the Uniform data, sample of cryoDRGN maps
and per-image FSCs (output maps vs. ground-truth maps
across the reaction coordinate)
- D. CryoDRGN maps from the Compositional data
- E-H. Latent-space encodings (using 1D model)
consistent with ground truth
Heterogeneity in “Homogeneous” Data
- A. RAG1-RAG2 maps from original authors based on 2 particle classes:
signal end (only core resolved) and paired
(other parts visible)
- B,C. In the signal end data, cryoDGRN (10D) finds
positions of formerly unresolved parts (ribbons show the previous
paired structure for comparison) possibly important for
function [video]
- D,E. In the Pf80S ribosome data,
cryoDRGN (10D) identifies a rotated-subunit substate
Highly Heterogeneous Data
- –
E. coli large ribosomal subunit (50S) undergoing assembly
(EMPIAR-10076)
- –
Original analysis found 13 classes grouped into 4 major assembly states
- A-D. CryoDRGN 1D and 10D latent space representations and filtering
(details in suppl fig 5)
- E-G. CryoDRGN finds the 4 states without prior classification
(dashed outline shows mature subunit)
- F,G. Latent space UMAP colored/labeled by states from
original analysis
- H. Some particles ("A" in panel G)
are whole ribosome (70S = 30S + 50S)
- I. C4 is previously unreported assembly intermediate
Large Continuous Conformational Changes
- –
Pre-catalytic spliceosome
(EMPIAR-10180)
- –
Original expert classification suggested a continuum of conformations
with large motions of the SF3b subcomplex
- A,B. CryoDRGN (10D) finds several clusters;
filter to remove those with likely imaging artifacts (iii),
missing the SF3b subcomplex (iv), or that have the additional U2 core (v)
- C,D. Maps generated along principal components show large
movements of the SF3B subcomplex relative to the rest of the structure
[video]
Can We Use CryoDRGN in ChimeraX?
- ...or its outputs after training?
- related ticket
#6106: Efficiently select maps by visual inspection