ScanNet

ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Tubiana J, Schneidman-Duhovny D, Wolfson HJ. Nat Methods. 2022 Jun;19(6):730-739.

ScanNet: A web server for structure-based prediction of protein binding sites with geometric deep learning. Tubiana J, Schneidman-Duhovny D, Wolfson HJ. J Mol Biol. 2022 Oct 15;434(19):167758.

  • ScanNet (Spatio-Chemical Arrangement of Neighbors Network) is an interpretable deep learning model for predicting protein-protein binding sites in atomic structures.

  • ScanNet web server: http://bioinfo3d.cs.tau.ac.il/ScanNet/

[back to paper list]

Method Overview

  • each atom is characterized by the spatial arrangement (in local coords) and properties of its neighbors
  • atom vector: how well the atom fits each “filter” (trained salient pattern)
  • atom vectors for a residue are pooled and combined with residue properties (residue type, sequence conservation, ...)
  • each residue is characterized by the spatial arrangement (in local coords) and properties of its neighbors
  • residue vector: how well the residue fits each trained residue filter
  • values are locally smoothed and converted to probabilities

Atom Patterns and Exemplars

  • a. NH at center, O in quadrant opposite the -NH- covalent bonds (N-H-O hydrogen bond)
  • c. C near methyl group and aromatic ring
  • e. Backbone C or O without nearby NH (gray means absence)
  • f. Solvent-exposed methionine sidechain

Residue Patterns and Exemplars

  • Sequence logo height by strength of match to pattern
  • Residues colored by category: negative-charge red, hydrophobic black, aromatic gold, etc.
  • Oval-to-circular pie chart: blue exposed (a. hydrophobic surface, positive correlate of binding), gray buried (b. hydrophobic core, negative correlate of binding)

Predicting Protein-Protein Binding Sites

  • Investigated four splits of 20K proteins with annotated binding sites from Dockground DB (ad: ablated datasets with at least one training-set example with 70% sequence ID to the test sequence, or in same CATH “H” group, or in same “T” group, or none of those)
  • Comparator 1: handcrafted set of geometric, chemical and evolutionary features for each amino acid (machine learning training in minutes, vs. ScanNet training 1-2 hours)
  • Comparator 2: structural homology, aggregate probabilities in structural superposition of homologs (1 month!)

Predicting Protein-Protein Binding Sites

  • Results for full dataset
  • ScanNet is best overall
  • Comparator 3: Masif, a surface-based geometric deep learning model trained on different dataset
    [paper: Nat Methods 2020] [github]

Predicting B-Cell Epitopes

  • SAbDab structural antibody database of 3,756 protein chains with annotated BCEs (796 clusters at 95% sequence identity)
  • applied to MD snapshot of SARS-CoV-2 spike trimer with one monomer in the RBD-open conformation
  • representative antibodies from solved complexes are superimposed
  • RBD-open monomer surface colored white → blue by increasing BCE probability
  • high probability assigned to 8 epitopes previously described, but also around Asn residues expected to be glycosylated: N657 (mislabeled Asp in figure) and two others when predicting for the NTD domain alone (not shown)

ScanNet v. AlphaFold Multimer

  • AlphaFold-Multimer outperformed ScanNet when using partner-protein sequence information, but was comparable when that was omitted (upper right, 6pnq with increasing ScanNet probability blue → red, AF predictions in green)
  • ScanNet identified all the main epitopes of the SARS-CoV-2 spike RBD, whereas AlphaFold-Multimer predictions for six antibody complexes only identified one epitope (lower right, AF predictions in green)

ScanNet Server Options

  • Fetch from PDB or AlphaFold DB, or upload local file (PDB, mmCIF)
  • Choose one of three separately trained classes of binding site:
    • protein-protein
    • protein-antibody
    • protein-disordered protein
  • Monomer or multimer prediction
  • Whether to use a multiple sequence alignment
    (slowest step, but recommended except for epitope prediction or a designed protein)

Examples colored blue → red by increasing probability:

ScanNet Server Results

  • Results are emailed to the user as a zip file containing:
    • the PDB file with binding site probabilities in the B-factor field
    • csv file containing the binding site probabilities
    • ChimeraX command script (.cxc)
    • Chimera python script (.py)

[Chimera-Related Software] [ChimeraX-Related Software]

ScanNet Server Results

  • 121p (Ras) took <15 seconds, and the resulting email reported runtime 0.0s
  • ScanNet finds the expected switch regions for effector binding
  • results zip, the ChimeraX session I made for the figure on the right
  • monomer #1.1, biounit #1.1 and #1.2
  • surf hide
  • example complex 1bkd, chain R ras, chain S Son of Sevenless
  • open 1bkd; mm #2 to #1.1
  • select using chain table in Log, seq features "short sequence motif"

Application Paper

Reduced B cell antigenicity of Omicron lowers host serologic response. Tubiana J et al., Cell Rep. 2022 Oct 18;41(3):111512.
  • 15 sequence differences (Omicron vs. Wuhan) modeled individually and in combination(s)
  • most of the mutations decreased antigenicity
  • agreement with expt (antibody titers, pseudovirus neutralization)