Bind join algorithm

hmottestad · February 13, 2019, 5:45pm

Hi,

I was wondering what the “bind join” algorithm is in the new release? Does it help with nested optionals by any chance?

And example of when the bind join algorithm is the most efficient choice would be great

Cheers,
Håvard

pavel · February 13, 2019, 8:12pm

Hi Håvard,

Right, this is a new algorithm in Stardog 6.1.1. It's a variation of the loop join algorithm but instead of doing a (nested) loop over the right relation for each left tuple, it "binds" the join key variables on the right to their values in the left tuple and then re-evaluates the resulting relation. It's expected to be useful when the left join operator produces only few tuples, the right operator produces a lot of tuples, and also neither of the operators produces results sorted by the join key (otherwise more efficient options are available).

Actually, it's motivated by the recent thread here, and the bind join (i believe) will be used in this version of the query:

SELECT ?predicate
WHERE
{  
   { select distinct ?predicate { [] ?predicate [] } }
    ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://example.org/SomeClass> .
    ?instance ?predicate ?any.
}

Here the number of predicates is (usually) low, but the number of instances of SomeClass is very high. The only algorithm which avoids iteration over all instances of SomeClass (e.g. sorting it or hashing it) is the bind join which simply iterates over all predicates and for each value binds it to ?predicate in ?instance ?predicate ?any. and re-evaluates the pattern

    ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://example.org/SomeClass> .
    ?instance :some-predicate ?any.

It's still not as efficient as the EXISTS formulation I suggested in that thread, but that has to do with handling of DISTINCT which is something I have in my list for this spring.

I don't think it specifically helps with nested optionals but it might in specific cases. Also since it's a new feature the optimiser may be overly cautious for the time being and avoid the bind join where it might be faster than the more mature alternatives. We'll tune the cost model as we gather more experience. Also, it's possible to disable the new join algorithm with #pragma join.bind off hint in case you suspect it's a wrong thing for a particular query. It's not possible to force the optimiser to choose it over other join algorithms though (except indirectly by disabling other algorithms).

Cheers,
Pavel

PS. The "bind join" name isn't our invention, it's not the most well-known join algorithm but can be found in literature, cf. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.7606&rep=rep1&type=pdf

system · February 27, 2019, 8:12pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query with wrong results using rules Support	20	568	January 31, 2020
BIND of a variable with CONCATs Bug	4	869	July 24, 2018
Insert query always using loop join Support	10	621	February 14, 2018
Forcing join order in StarDog Feature Request	1	352	February 28, 2022
Follow up on stale reads when using reasoning Bug	2	459	February 14, 2017

Bind join algorithm

Related topics