Any of the thoughts here are to take or leave as the SPOKE team sees fit. Sui may be entering his recommendations in the SPOKE github, issue #69. Not discussed here are pathway boundaries or the precise definition of a pathway; presumably we would take those from the chosen data source(s).
Representing pathways with explicit edges introduces many complications, as discussed below. Note that most (all?) of these edges should already be present as Protein-interacts-Protein. In my opinion, it's more important to first restore the missing Gene-participates-Pathway edges (github #51).
Meaning of Edges
Meaning of Nodes
Convergence
Context
Meaning of Edges
If we represent pathways with explicit edges,
then what the edges mean becomes important.
For metabolic pathways, perhaps it boils down to:
However, for signaling pathways, there are several possibilities, including:
Maybe the following types of edges would be enough to capture both adjacency and directionality (that is, if NE search honored directionality, github #23):
Meaning of Nodes
The functional entity within a pathway is often a macromolecular complex
rather than a single gene product. We don't have a way to represent
that currently; could this bioentities ontology (Pico group, Gladstone) help?
Convergence
There is frequently a nonlinear structure to signaling pathways, i.e.
maybe multiple binding and/or modification steps are required to activate
a single biological entity. Multiple edges would need to converge on a
single node, and none of the incoming edges alone would be sufficient
for activation, only their collective presence (or more accurately,
some boolean combination of the presence or activity of some upstream actors
and the absence or inactivity of others).