In Stardog Studio i happened to try pulling up the visualisation for a specific resource, however it eventually timed out after five minutes. The dataset is a large so i was expecting to see some performance issues as i evaluated things.
Having looked at the query generated in studio and played around a bit i've narrowed it down to a specific part which seems to be causing undue overhead for this dataset. However i'm not sure what's the best route going forward in general for the type of query pattern and of course then applying it to visualising items in such a dataset, so interested in some input.
If you run the query without the two OPTIONAL clauses it runs <400ms and returns 17 rows. Adding in either OPTIONAL will run in about the same time and may add a few rows due to repeated use of the properties. However when both are used together is where things go bad. The query plan for that query is shown below, as you can see the combination of the two clauses with the primary match on the IRI involve the hash join with 3B in the first instance and 450M in the second. This is with Stardog 7.6.4.
Thanks, Tony. This is a well spotted issue indeed which we need to look into. I cannot offer any immediate workaround which'd work with the Viz feature in Studio, but I'll appreciate if you try the following hint when you run the query directly:
Gave that one a try and didn't finish in time either. Query plan below. Would any other of the statistics generation effect the query planner decision?
Well, kind of, we don't have accurate cardinality estimations when the variable in the predicate position is not bound (?predicate). But it's mostly about choosing wrong join algorithms here. OK, can you also add #pragma join.bind off to the next line after the previous pragma? I am just trying to verify that it can run faster with proper joins...
I went back to the larger query and added in the other two OPTIONAL clauses in the first part of the UNION from the visualisation query and that then fails to return. I can play with those and the other related hints and see if some combination works, i wasnt sure whether i might need to place them at a different line to effect certain levels of the query.
I went back to the full visualisation query and got it to work, i had to reorder the OPTIONAL clauses. The one with "tag:stardog:studio:label" was causing a different join (HashJoinOuter), but moving it last made it use a merge join throughout.
Right, this query should run without any hash joins. It's very possible that adding #pragma join.hash off as well would have worked too, and you wouldn't need to reorder optionals (that generally doesn't guarantee any specific plan).
I have made a note about this. Obviously you shouldn't be using these hints for this query, we will look into the issue.
I know it's been a long time but I just came across this old ticket and decided to check how the newly released Stardog 8.0 does on it. It works as expected (I used the Yago2s KG):
From local
From named local named
Slice(offset=0, limit=1000) [#21]
`─ Projection(?subject, ?predicate, ?predicate_2, ?object, ?object_type, ?object_label_1) [#21]
`─ DirectHashJoinOuter(?object) [#21]
+─ DirectHashJoinOuter(?object) [#21]
│ +─ Bind(<http://yago-knowledge.org/resource/wordnet_person_100007846> AS ?iri) sortedBy=?iri [#21]
│ │ `─ Scan[SPOC](<http://yago-knowledge.org/resource/wordnet_person_100007846>, ?predicate, ?object) [#21]
│ `─ Scan[PSOC](?object, rdf:type, ?object_type) [#9.0M]
`─ Scan[PSOC](?object, rdfs:label, ?object_label_1) [#7.7M]