Overview
- RESTful
web service
receives the job request in JSON via HTTP POST, runs parKVFinder;
the client sends HTTP GET with job ID to get status/results
- web server module developed in
Rust
using the Actix framework
- queue module uses Ocypod
- each module is packaged in a Docker container
- web portal developed using Shiny R
- also provided: HTTP client with example Python3 script,
Pymol2 plugin allowing (among other things)
interactively drawing a rectangular box to define the search space
- PDB file or entry, click Load to detect nonstandard residues
- default or customized parameters
- inner (smaller) "rolling sphere" probe defines molecular surface
- outer (larger) probe defines the exterior
- a pocket is where the smaller probe can go but the larger cannot,
except trimmed further from the exterior by the removal distance
- volume cutoff is minimum size (smaller pockets ignored)
- around target molecule – within specified distance of ligand
- around target residues – limit calculation to a box around
specified residues + padding distance
Web Service Output
- the viewer uses the NGL engine for R
- "Download Results" gives a TOML file containing run parameters,
pocket statistics, and lists of the pocket-defining residues
- "Download Structures" gives a zipped folder containing the input PDB and
a pocket-dots PDB
Web Service Performance
- time increases approximately linearly with the number of atoms
- larger "Probe Out" increases calculation time
- no clear relationship with "Removal Distance" since its effect on
time reflects number/shape of cavities rather than the number of atoms
- using smaller grid spacing would give higher-quality results
but take longer to run (presumably; time vs. grid spacing was not presented)...
I'm guessing the benchmarking results are for grid spacing of 0.6 Å
- their website doesn't expose the grid spacing,
and the TOML file only says resolution_mode = 'Low',
but I measured the distance between pocket dots in ChimeraX as 0.6 Å...
KVFinder Results in ChimeraX
(PDB 2gbp)
- pockets are output as a PDB file of H and HA "atoms" representing
grid points within the pockets, with a different residue name for each pocket
- B-factor column: depth, the shortest distance from the point to the
pocket-bulk boundary
- occupancy column: Eisenberg & Weiss amino acid hydropathy
- unfortunately all the residue names have the same residue number
- I manually added MODEL/ENDMDL records to the PDB file to make each
residue (pocket) a separate model so that it would be easier to work with
in ChimeraX (rainbow structures, show, hide, label attr name, etc.)
[ChimeraX session]
- size H atomrad 0.2; surf close; surf #2
- color bfactor #2
- color byattr occupancy #2 palette lipophilicity
- show #1 & :kae :<4
- color byattr occupancy #2 palette ^lipophilicity
- mlp #1 surf #2
Area/Volume Are Relative!
Quantitative comparisons are only valid within-method because
different methods define the boundaries differently.
- KVFinder results, ChimeraX display, ChimeraX surface area/volume
- SES with H default radius:
KAB area 163 vol 161, KAE area 211 vol 266
- SES with radius 0.4 Å:
KAB area 106 vol 77, KAE area 150 vol 155
- SES with radius 0.3 Å:
KAB area 97 vol 67, KAE area 140 vol 140
- SES with radius 0.2 Å:
KAB area 89 vol 56, KAE area 167 vol 106
- CASTp uses α-shapes (Voronoi tesselation);
Chimera does not show the CASTp boundaries, just the SES of CASTp pocket atoms
Seems OK on Nucleic Acids
- finds the minor-groove pocket in
PDB 6bna
similar to the location of netropsin,
but only the "floor" (probably need to ↓ removal distance and/or
↑ outer probe size to cover such a shallow pocket)
[ChimeraX session]
- their viewer prematurely truncates the ribbon
(1bna doesn't have this problem)
KVFinder and CASTp Find the Same Pockets Qualitatively
- ChimeraX with the 8 pockets found with KVFinder defaults on 2gbp
- Chimera with 8 biggest CASTp pockets shown at once
parKVFinder Architecture
- written in C
- input via Pymol GUI or command-line interface
- multithreaded parallelization implemented with OpenMP
- uses customizable dictionary of atomic VDW radii
- grid spacing can be 0.25, 0.5, or 0.6 Å (web service uses the latter)
- [Github
Linux/Mac]
[Github
Windows]
[docs]
parKVFinder vs. Other Methods Across MD Trajectory
- HIV protease 200-ns MD trajectory
- in this comparison, parKVFinder and GHECOM were the most
accurate/successful at describing the conformational dynamics
of the active site
parKVFinder Performance
- Runtimes of the various programs on the 2001 frames from the HIV
protease trajectory
- 6-core 3.4 GHxz AMD Ryzen 5 2600 processor running on Ubuntu 19.04
- parKVFinder ProbeOut 12 Å, search space <5Å from inhibitor
in original structure, PDB 1hvr
- grid spacing?? supp. data only says
"All parameters that were not cited were kept at their default values"...
if I had to guess, 0.6 Å.
- Fpocket is a Voronoi tesselation method; faster than parKVFinder
in this test case, but with worse results (e.g., see previous slide)
-
parKVFinder runtimes vs. protein size (dataset of 1000 unique domains) and #
of threads
pyKVFinder Architecture
- python-C parallel KVFinder (pyKVFinder) applies
SWIG to extend grid operations written in C to Python
- stores results in NumPy arrays for better
interoperability with other programs
- dependencies: python ≥ 3.7 (tested on Linux and macOS),
swig ≥ 4.0.1, toml ≥ 0.10.2, numpy ≥ 1.20.3, matplotlib ≥ 3.3.3
- has the features of parKVFinder and more: assigns depth and AA
hydrophobicity (choice of 6 scales), runs faster, etc.
- GPL v3.0, "no restrictions to use by non-academics"
[Github]
[docs]
[PyPI]
pyKVFinder Test Case
PDB 6wen:
ADP-ribose phosphatase (ADRP) domain of SARS-CoV-2 NSP3 in the unbound form
- in b,"conservation" is not sequence conservation but the degree to
which the corresponding void area is present in a series of related structures
listed in c
- c shows the distribution of the hydropathies of the pocket dots
for each of these related structures, as well as a hierarchical clustering
of the pockets based on the frequencies of their surrounding residues,
done with SciPy and shown with matplotlib
pyKVFinder Benchmarking
Calculations on
600 frames from a 600-ns MD trajectory of the unbound NSP3 ADRP domain:
- a shows results for the frame with the smallest RMSD
from the original structure with ADRP bound,
6w02
chain B
- b gives total pocket volume (there could be >1 pocket)
across the trajectory
- the volume calculated by pyKVFinder (346.8 ± 78.7 Å3) and
parKVFinder (346.5 ± 79.3 Å3) is close to the molecular-surface volume
of ADPR (351.1 Å3) in 6w02 chain B from YASARA
(ChimeraX surf + measure volume 426.9 Å3, Chimera 409.4 Å3)
- pyKVFinder "Standard workflow" calculates pocket area and volume,
"Full workflow" also calculates depth and hydropathy
- par/pyKVFinder with grid spacing 0.8 Å, outer probe 6 Å,
volume cutoff 25 Å3, all pockets ≥50% in the box
enclosing ADPR in 6w02 chain B
- desktop computer with 6-core 3.4 GHz AMD Ryzen 5 2600 processor, 32GB
RAM, Ubuntu 20.04.2, max # threads available
- parKVFinder was faster than fpocket in this benchmark
(fpocket was faster in the parKVFinder paper's benchmark)
- CASTp on 6w02 chain B gives biggest pocket MS vol 844, area 532,
SAS volume 238, area 338 [Chimera session] ...2020 structure not in db, must
upload to server, wait for
results, then download files "for Pymol"
pyKVFinder vs. parKVFinder
- both are parallelized, but pyKVFinder is (even) faster due to "additional
possibility to parallelize routines"
- 2021 paper says that
"pyKVFinder will undergo continuous improvements and updates"
and its github shows very recent activity (2024);
parKVFinder also shows fairly recent github activity but less so (a year ago)
- "experienced users requiring scripting routines are encouraged to use
pyKVFinder due to its improved performance, while newcomers should prioritize
parKVFinder due to its simplicity of installation and execution"