Bind join algorithm

Hi,

I was wondering what the “bind join” algorithm is in the new release? Does it help with nested optionals by any chance?

And example of when the bind join algorithm is the most efficient choice would be great :slight_smile:

Cheers,
Håvard

Hi Håvard,

Right, this is a new algorithm in Stardog 6.1.1. It's a variation of the loop join algorithm but instead of doing a (nested) loop over the right relation for each left tuple, it "binds" the join key variables on the right to their values in the left tuple and then re-evaluates the resulting relation. It's expected to be useful when the left join operator produces only few tuples, the right operator produces a lot of tuples, and also neither of the operators produces results sorted by the join key (otherwise more efficient options are available).

Actually, it's motivated by the recent thread here, and the bind join (i believe) will be used in this version of the query:

SELECT ?predicate
WHERE
{  
   { select distinct ?predicate { [] ?predicate [] } }
    ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://example.org/SomeClass> .
    ?instance ?predicate ?any.
}

Here the number of predicates is (usually) low, but the number of instances of SomeClass is very high. The only algorithm which avoids iteration over all instances of SomeClass (e.g. sorting it or hashing it) is the bind join which simply iterates over all predicates and for each value binds it to ?predicate in ?instance ?predicate ?any. and re-evaluates the pattern

    ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://example.org/SomeClass> .
    ?instance :some-predicate ?any.

It's still not as efficient as the EXISTS formulation I suggested in that thread, but that has to do with handling of DISTINCT which is something I have in my list for this spring.

I don't think it specifically helps with nested optionals but it might in specific cases. Also since it's a new feature the optimiser may be overly cautious for the time being and avoid the bind join where it might be faster than the more mature alternatives. We'll tune the cost model as we gather more experience. Also, it's possible to disable the new join algorithm with #pragma join.bind off hint in case you suspect it's a wrong thing for a particular query. It's not possible to force the optimiser to choose it over other join algorithms though (except indirectly by disabling other algorithms).

Cheers,
Pavel

PS. The "bind join" name isn't our invention, it's not the most well-known join algorithm but can be found in literature, cf. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.7606&rep=rep1&type=pdf

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.