PATHS query - getting the absolute shortest path to a variable node

Hi.

I’m trying to get the shortest paths using the paths query. However it’s returning all the shortests paths to anything that matches the end node, instead of the shortest og those again.

Shortest path from A to ?node.

Aproximate results:
(A) -> (B)
(A) -> (B) -> (C)

So I can use limit(1), which returns the shortest of those again. However I would like to expand my query to be something of ther order of:

Shortest path from all nodes that match a query to any node that match a different query.

My usecase is that our users make requests to view a document. I need to forward that request to the correct organisation. A document can be linked to an organisation, which is a child of another organisation, which again is a child of yet another organisation which has an organisation number that I can use to send the document to. So I would like to get a result of documents for the user with the corresponding organisation number for the first organisation when traversing the inverse child (eg. parent) predicate from organisation to organisation.

I can do this with a fairly complex !EXISTS filter on the end node, but perfomance on that isn’t very good.

Cheers,
Håvard

Hi Håvard,

It sounds like your problem is that Stardog enforces the shortest path condition independently for each pair of start and end nodes. I.e. if your query’s start and end predicates match multiple nodes, you may get paths of different lengths for different pairs. I.e. (A) -> (B) -> (C) is a valid shortest path from A to C even though it goes through B and B is also a valid end (of a different path).

There’s no easy way around that. The semantics ensures that paths can be returned in a streaming fashion (even traversed concurrently and independently) for different start nodes. It also ensures that the engine can evaluate paths from start to end or from end to start, i.e. it’s completely symmetric. If you make the shortest path condition over all end nodes (i.e. only one path for every start node) or vice versa, it’d break it.

Now back to your use case: do you want to write a query for paths which start at a set of documents (matched by a pattern) and end at organizations s.t. there’s only one (shortest) path (i.e. an org) returned for each document? Or do you want a single (shortest) path for all documents and organizations?

Cheers,
Pavel

Hi Pavel

Thank you for replying.

I would like for each node that matches start, exactly 1 node that matches end. Where the distance between those nodes is the shortest possible.

Lets assume START: {S1, S2} and END: {E1, E2, E3, E4} and the domain is {S1, S2, E1, E2, E3, E4, A, B, C} and we have the following paths

S1 -> A -> B -> E1 -> E2
S2 -> E3 -> E4 -> C

A query would return the following
S1 -> A -> B -> E1
S1 -> A -> B -> E1 -> E2
S2 -> E3
S2 -> E3 -> E4

Of which is want the following:
S1 -> A -> B -> E1
S2 -> E3

So that I can get S1 : E1 and S2 : E3.

Håvard

Yeah, this is what I meant by “there’s only one (shortest) path (i.e. an org) returned for each document”.

We don’t have a good server-side solution within the current model. You’d need to do filtering on the client.

You can use ORDER BY either start or end variable to get paths grouped so filtering is easy and cheap for the client (assuming not very many organizations per document). ORDER BY itself will add server overhead though.

Later we’ll add support for path patterns which you’d be able to use in normal select queries. That, combined with aggregation (min function on path length grouped by the start variable) will give the results that you want.

If we come up with a better workaround meanwhile, I’ll let you know.

Cheers,
Pavel

Thanks Pavel.

At the moment we’re doing multiple queries with limit 1 instead. Though the order by approach might be better, need to do some testing since we support up to 100 documents as START. Will the semantics of the paths query always return the shortest of the shortest paths first? (Like breadth-first-search)

For performance it would be great if your plans for supporting paths in select queries together with min aggregation would be optimised so that it doesn’t generate all the shortest paths under the hood.

Cheers,
Håvard

Will the semantics of the paths query always return the shortest of the shortest paths first? (Like breadth-first-search)

This isn't formally guaranteed. Indeed Stardog runs BFS under the hood but the engine chooses the direction. It tries to determine which pattern is more selective (start pattern or end pattern) and start from there. So if you have A -> ... -> B path (shortest for {A, B}) and A -> ... -> B ->...-> C path (shortest for {A,C}) you may, in theory, get the latter first if the engine decided to traverse backwards. Then BFS would start from C, reach B but won't stop (since B isn't a start node) and get to A. Then, at a later iteration, it'd start from B and get to A via a shorter path.

If your start is a constant but end is a pattern, then it'll always be evaluated in the forward direction (the opposite is also true). We also think about a query hint to specify direction (but hint is a hint, it's not usually a great idea to make hints a part of the semantics).

For performance it would be great if your plans for supporting paths in select queries together with min aggregation would be optimised so that it doesn’t generate all the shortest paths under the hood.

Definitely. If there's a group by on the start variable then it's the obvious choice for the engine to evaluate paths forward so they all come out grouped naturally. Then, assumed we still use BFS, returning only the shortest over all valid end nodes would be trivial.

Cheers,
Pavel

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.