I'm running into a similar issue discussed in PATHS query - getting the absolute shortest path to a variable node regarding when the path end is specified via a variable binding or a pattern. Namely, instead of enforcing the shortest path condition independently for each pair of start and end nodes (which can result in paths that are continuations of prior paths with a different end node), it is in many cases more useful to return only the shortest paths from the start node(s) to the first node(s) that meets the end pattern/variable, even if there are other nodes that match the end pattern/variable if the path continued to be traversed. For example, given the following graph where S*
is a start node and E*
matches the end pattern/variable:
S1 → A → B → E1 → E2
↓
S2 → E3 → E4
The current behavior would return six results- four for S1
and two for S2
. I'm only interested in three of those six: S1 → A → B → E1
, S1 → A → E3
, and S2 → E3
, as these are all the paths from the start nodes to the first matching end node.
I'm curious if this behavior could be added, perhaps by adding a keyword to start/end clauses in the query to enable this behavior. I'm curious about technical issues with doing so as discussed in the linked thread. For example:
- The linked thread says that "the semantics ensures that paths can be returned in a streaming fashion (even traversed concurrently and independently) for different start nodes". Assuming that you don't allow applying this "terminate at first match" behavior to both start and end patterns at the same time, at present I can't think of a reason why this would prevent the ability to run different start nodes concurrently or independently.
- The linked thread says that "the semantics ensures... that the engine can evaluate paths from start to end or from end to start, i.e. it’s completely symmetric. If you make the shortest path condition over all end nodes (i.e. only one path for every start node) or vice versa, it’d break it". Two comments about this one:
- This is not the case for this variation of the issue, as pairwise shortest paths between start and end are not a requirement.
- I agree that the ability to evaluate from end to start would have to be changed if this behavior was applied, but that simply means that whichever end that doesn't have the behavior applied would have to be the end from which evaluation starts. If you don't allow this behavior to be applied at both start and end, I can't think of an issue with this at present.
- The linked thread notes about a future addition: "If there's a
group by
on the start variable then it's the obvious choice for the engine to evaluate paths forward so they all come out grouped naturally. Then, assumed we still use BFS, returning only the shortest over all valid end nodes would be trivial". This would still be applicable in this case and would be a great addition but would only work efficiently iff grouping was applied to the to the same end to which this behavior was applied. If this were applied to the example provided above, it would result in two paths:S1 → A → E3
andS2 → E3
, eliminatingS1 → A → B → E1
.
Additionally, adding this behavior as an option should improve performance to a degree compared to the current behavior. As your documentation says:
However, when the query uses start or end patterns, it may return paths of different lengths for different start or end nodes. For example, a path from
:A
to:B
may be longer in the results than a path from:A
to:C
. This has implications for performance since the engine cannot avoid exploring paths which are longer than those already found because they may end at a different node. We recommend using theMAX LENGTH
keyword in path queries with start or end patterns.
Given that the goal here is to explicitly avoid exploring paths longer than those already found, the engine can avoid exploring paths which are longer than those already found regardless of whether they end at a different node that also matches the end pattern/variable. BFS would simply stop for that branch of the graph on the first match it hits.
Curious what your thoughts are.