[spoke.ucsf.edu]
[spoke.rbvi.ucsf.edu]
SPOKE Nodes and Edges
SPOKE
(Scalable Precision Medicine Knowledge Engine)
is a very large network containing multiple types of biological data.
Pooling such diverse data into a single knowledge environment
allows identifying new connections, with implications for
biomedical applications like personalized medicine:
suggesting which drugs may be effective for a specific patient.
An earlier version of the network was used to suggest new uses for existing
drugs (Himmelstein et al., 2017).
SPOKE is a heterogeneous network,
meaning that different nodes (points) within the network
can represent different types of data.
The edges between pairs of nodes represent known connections.
Paths that follow a series of edges may connect nodes not
previously known to be related.
Node and edge
types and their data sources are given below. Except as noted,
data are updated weekly on a rotating basis (different types on different days).
= updated currently
= in progress
= present but static
See also:
source licenses
[back to top]
Node Types
-
Anatomy
– anatomical structures; all entries in
Uberon,
along with directed edges
between them to reflect the hierarchy
-
AnatomyCellType
– Anatomy–Cell Type
combinations with expression data in the
Human Protein Atlas
-
BiologicalProcess
– biological process terms from
Gene Ontology
with 2-1000 annotated genes
-
CellType
– cell types from Cell Ontology
for AnatomyCellType nodes
-
CellularComponent
– cellular component terms from
Gene Ontology
with 2-1000 annotated genes
-
Compound – the union of the following:
- approved small-molecule compounds
in DrugBank
with documented chemical structures
- all compounds in ChEMBL
glycans from the Kyoto Encyclopedia of Genes and Genomes (KEGG)
-
Disease
– all entries in Disease Ontology,
along with directed edges
between them to reflect the hierarchy
-
EC
– Enzyme Commission numbers
from pathway sources
-
Food – from
FoodOn
-
Gene – protein-coding genes
of select organisms from
Entrez Gene
-
MolecularFunction
– molecular function terms from
Gene Ontology
with 2-1000 annotated genes
-
Organism
– all taxonomic levels from
NCBI Taxonomy
for Homo sapiens, bacteria with Pathway data,
and severe acute respiratory syndrome coronovirus 2 (SARS-CoV-2)
-
Pathway
– human pathways from:
... human and canonical bacterial pathways from:
... bacterial pathways not represented in KEGG from:
-
PharmacologicClass
– from DrugCentral,
the following annotation types:
- FDA Chemical/Ingredient
- FDA Mechanism of Action
- FDA Physiologic Effect
-
Protein
– proteins of select organisms in
UniProtKB
-
ProteinDomain
– from Pfam,
domains in proteins
-
ProteinFamily
– from Pfam,
families (clans) of protein domains
-
Reaction
– reactions in pathways
-
SARSCov2
– SARS-CoV-2 proteins studied in
Gordon et al., 2020
-
SideEffect
– all entries in SIDER
-
Symptom
– Medical Subject Headings (MeSH) terms from:
- MeSH subtree C23.888
(Diseases / Pathological Conditions, Signs and Symptoms / Signs and Symptoms)
- Human Phenotype Ontology disease-symptom data
[back to top]
Edge Types
-
Anatomy→contains→Anatomy
– indicates a relationship of increasing specifity
between Anatomy nodes from
Uberon,
for example, “sense organ” contains “eye”
-
Anatomy-downregulates-Gene
– from Bgee,
edges for genes with log2FC
(log2 fold change in transcripts per million)
≤ –10 in a tissue (Anatomy)
vs. the average over all human adult datasets;
annotated with values for log2FC, p-value, and
false discovery rate
according to the Benjamini-Hochberg procedure
-
Anatomy-upregulates-Gene
– from Bgee,
edges for genes with log2FC
(log2 fold change in transcripts per million)
≥ 10 in a tissue (Anatomy)
vs. the average over all human adult datasets;
annotated with values for log2FC, p-value, and
false discovery rate
according to the Benjamini-Hochberg procedure
-
Anatomy-expresses-Gene
– from Bgee,
each gene-Anatomy (gene-tissue)
pair with more records marked “present” than “absent”
-
Anatomy→isa→Anatomy
– indicates a relationship of increasing generality
between Anatomy nodes from
Uberon,
for example, “eye” is a “sense organ”
-
Anatomy→partof→Anatomy
– edges between Anatomy nodes from
Uberon indicating
physical inclusion, for example,
“brain” is part of “central nervous system”
-
AnatomyCellType-expresses-Gene
– gene expression in a specific cell type in a specific tissue from the
Human Protein Atlas
-
AnatomyCellType-isin-Anatomy
– link between corresponding
AnatomyCellType
and Anatomy nodes
-
AnatomyCellType-isin-CellType
– link between corresponding
AnatomyCellType
and CellType nodes
-
Compound-affects-(mutant)Gene
– evidence for functional interaction of
SPOKE compounds with mutant genes, from:
-
Compound-binds-Protein
– compound-protein binding relationships from the following sources:
- BindingDB
compound binding to single proteins, annotated with affinity
(using Kd over Ki over IC50, ignoring EC50,
and taking a geometric mean if there were multiple values of the same measure)
- DrugCentral target
- ChEMBL
single-protein target
-
Compound-binds-ProteinDomain
– from Protein Common Interface Database (ProtCID)
(Xu and Dunbrack, 2020)
-
Compound-causes-SideEffect
– from SIDER
-
Compound-contraindicates-Disease
– DrugCentral
contraindications
(more accurately, the compound is contraindicated by the disease)
-
Compound-downregulates-Gene
– compound downregulates gene
according to consensus transcriptional profiles calculated from
LINCS L1000 data
-
Compound-upregulates-Gene
– compound upregulates gene
according to consensus transcriptional profiles calculated from
LINCS L1000 data
-
Compound-treats-Disease
-
CoronavirusProtein-interacts-Protein
– SARS-CoV-2-human protein-protein interactions
from Gordon et al., 2020,
as well as the known interaction between the SARS2CoV spike protein and
ACE2_HUMAN
-
Disease-associates-Gene
- Online Mendelian Inheritance in Man
– see omim_spoke.pptx
- keep relationships with highest level of evidence
(phenotype mapping key = 3) and known inheritance patterns
- encode modifiers like “susceptibility
for”
- use inheritance patterns and modifiers to classify as somatic,
non-Mendelian (low confidence), or Mendelian (moderate or high confidence)
- keep only high-confidence
- DISEASES – annotated with source type:
- GWAS Catalog
– disease-gene pairs
with data from at least 1000 samples and p-value < 5 x 10–8
-
Disease→contains→Disease
– indicates a relationship of increasing specificity
between Disease nodes from
Disease Ontology,
for example, “cell type cancer” contains “melanoma”
-
Disease→isa→Disease
– indicates a relationship of increasing generality
between Disease nodes from
Disease Ontology,
for example, “melanoma” is a “cell type cancer”
-
Disease-localizes-Anatomy
– MeSH term co-occurrence in MEDLINE papers
at p-value <0.005
-
Disease-presents-Symptom
– from:
-
Disease-resembles-Disease
– MeSH term co-occurrence in MEDLINE papers
at p-value <0.005
-
EC-catalyzes-Reaction
– from pathway sources
-
EC→isa→EC
– indicates a relationship of increasing generality
from ExplorEnz
-
Food-contains-Compound
– from FooDB
and Australian Food Composition Database
-
Food→isa→Food
– indicates a relationship of increasing generality
from FoodOn
-
Gene-encodes-Protein
– from UniProt
-
Gene-participates-BiologicalProcess
– from Gene Ontology
-
Gene-participates-CellularComponent
– from Gene Ontology
-
Gene-participates-MolecularFunction
– from Gene Ontology
-
Gene-participates-Pathway
– see the first set of Pathway sources
-
GeneProduct→downregulates→Gene
– exogenous application of a protein or peptide
gene product downregulates another gene
according to consensus transcriptional profiles calculated from
LINCS L1000 data
(connects two Gene nodes)
-
GeneProduct→upregulates→Gene
– exogenous application of a protein or peptide
gene product upregulates another gene
according to consensus transcriptional profiles calculated from
LINCS L1000 data
(connects two Gene nodes)
-
KGene→downregulates→Gene
– relationships in which knockdown or knockout
(using short hairpin RNA or CRISPR) of one gene downregulates another
according to consensus transcriptional profiles calculated from
LINCS L1000 data
-
KGene→upregulates→Gene
– relationships in which knockdown or knockout
(using short hairpin RNA or CRISPR) of one gene upregulates another
according to consensus transcriptional profiles calculated from
LINCS L1000 data
-
OGene→downregulates→Gene
– relationships in which overexpression of one gene downregulates another
according to consensus transcriptional profiles calculated from
LINCS L1000 data
-
OGene→upregulates→Gene
– relationships in which overexpression of one gene upregulates another
according to consensus transcriptional profiles calculated from
LINCS L1000 data
-
Organism-causes-Disease
– human-reviewed relationships from
PathoPhenoDB
-
Organism→isa→Organism
– from NCBI Taxonomy,
indicates a relationship of increasing generality
between organisms (e.g., species belongs to genus)
-
Organism-encodes-Protein
-
Organism-includes-Pathway
– from pathway sources
and the
Pathosystems Resource
Integration Center (PATRIC)
-
Pathway→contains→Pathway
– indicates a relationship of increasing specifity
between pathways
-
Pathway→isa→Pathway
– indicates a relationship of increasing generality
between pathways
-
PharmacologicClass-includes-Compound
– see PharmacologicClass sources
-
Protein-decreasedin-Disease
– from Institute
for Systems Biology, human proteins decreased in Covid-19 disease
(Su et al., 2020)
-
Protein-increasedin-Disease
– from Institute
for Systems Biology, human proteins increased in Covid-19 disease
(Su et al., 2020)
-
Protein-interacts-Protein
– protein-protein interactions from:
-
Protein-has-EC
– from UniProtKB
-
ProteinDomain-interacts-ProteinDomain
– from Protein Common Interface Database (ProtCID)
(Xu and Dunbrack, 2020)
-
ProteinDomain-memberof-ProteinFamily
– from Pfam
-
ProteinDomain-partof-Protein
– from
InterPro
-
Reaction-consumes-Compound
– from pathway sources
-
Reaction-produces-Compound
– from pathway sources
[back to top]
Technical Notes
MeSH term co-occurrence.
The following were calculated for each pair of
Medical Subject Headings (MeSH) terms:
- co-occurrence
– the number of articles in which the two terms co-occurred
- expected
– the number of expected co-occurrences by chance based on
each term's marginal frequency
- enrichment – co-occurrence divided by expected
- p-value – the p-value from Fisher's exact test
for whether the observed co-occurrence exceeded that expected by chance
Disease-localizes-Anatomy and
Disease-presents-Symptom calculations used only
papers in which the disease term was major and the other term (anatomy or
symptom) was used directly (no “explosion” via the
MeSH tree). Disease and anatomy MeSH terms were taken from
Disease Ontology
and Uberon, respectively.
[back to top]
References
A SARS-CoV-2 protein interaction map reveals targets for drug repurposing.
Gordon DE, Jang GM, [...] Shoichet BK, Krogan NJ.
Nature. 2020 Apr 30.
Systematic integration of biomedical knowledge prioritizes drugs for repurposing.
Himmelstein DS, Lizee A, Hessler C, Brueggeman L, Chen SL, Hadley D, Green A, Khankhanian P, Baranzini SE.
eLife. 2017 Sep 22;6. pii: e26726.
Multi-omics resolves a sharp disease-state shift between mild and moderate
COVID-19.
Su Y, Chen D, [...] Goldman JD, Heath JR.
Cell. 2020 Dec 10;183(6):1479-1495.e20.
ProtCID: a data resource for structural information on protein interactions.
Xu Q, Dunbrack RL Jr.
Nat Commun. 2020 Feb 5;11(1):711.