Representing Pathways in SPOKE

Any of the thoughts here are to take or leave as the SPOKE team sees fit. Sui may be entering his recommendations in the SPOKE github, issue #69. Not discussed here are pathway boundaries or the precise definition of a pathway; presumably we would take those from the chosen data source(s).

Representing pathways with explicit edges introduces many complications, as discussed below. Note that most (all?) of these edges should already be present as Protein-interacts-Protein. In my opinion, it's more important to first restore the missing Gene-participates-Pathway edges (github #51).

Meaning of Edges
Meaning of Nodes
Convergence
Context

Meaning of Edges
If we represent pathways with explicit edges, then what the edges mean becomes important. For metabolic pathways, perhaps it boils down to:

sequential processing of metabolites

However, for signaling pathways, there are several possibilities, including:

binding activates
binding inhibits
modification (phosphorylation, acetylation, proteolysis, ...) activates
modification inhibits
binding or modification enables some other binding or modification... but is “enables” the same as “activates”? Think of a scaffolding protein, for example, or a specific phosphorylation event that does not yet activate the protein but enables a subsequent phosphorylation that does activate the protein.

Maybe the following types of edges would be enough to capture both adjacency and directionality (that is, if NE search honored directionality, github #23):

upstreamof
downstreamof

...with annotation to indicate: activates, inhibits, maybe others.

Meaning of Nodes
The functional entity within a pathway is often a macromolecular complex rather than a single gene product. We don't have a way to represent that currently; could this bioentities ontology (Pico group, Gladstone) help?

Bioentities (github) – a namespace encoding hierarchical relationships between proteins, protein families, and protein complexes

Convergence
There is frequently a nonlinear structure to signaling pathways, i.e. maybe multiple binding and/or modification steps are required to activate a single biological entity. Multiple edges would need to converge on a single node, and none of the incoming edges alone would be sufficient for activation, only their collective presence (or more accurately, some boolean combination of the presence or activity of some upstream actors and the absence or inactivity of others).

Context
A biological entity may participate in multiple pathways, depending on its state or localization. Further, an upstream modulator, by changing the state or localization of the downstread entity, could activate its role in one pathway but inhibit its role in a different pathway.

Elaine Meng / March 2021