I have a query that has multiple FROM clauses and a property path expression with a ‘+’ wildcard in it, but is otherwise fairly innocuous:
SELECT
?sampleIRI
FROM
<tag:stardog:api:context:default>
FROM
<http://purl.org/net/grafli#tbox>
WHERE {
?sampleIRI a <http://purl.org/net/grafli#CollectedSample> ;
<http://purl.org/net/grafli#wasDerivedFrom>/<http://purl.org/net/grafli#isClassifiedBy> <http://purl.org/net/grafli/study#8702a342-c58a-40c5-81ea-65e466211688>
.
?a a <http://purl.org/net/grafli#Analysis> ;
<http://purl.org/net/grafli#analysisSummary> ?qpureSummary ;
<http://purl.org/net/grafli#dateCreated> ?qpureDate ;
<http://purl.org/net/grafli#hasAnalysisType> <http://purl.org/net/grafli/analysistype#qpure> ;
<http://purl.org/net/grafli#wasDerivedFrom>+ ?sampleIRI .
}
On 5.0.3 database with 5M triples the query runs in < 200ms and returns 260 results, as I expect. Against the same dataset on version 5.0.4 the query never returns — at least after 10 minutes it hadn’t returned and I gave up. Further, after I ran query kill
on it, the query stayed in the query list
results with Terminating
status for an hour, until I restarted the server.
The query plan looks basically the same for version 5.0.3
From <http://purl.org/net/grafli#tbox>
From default
Projection(?sampleIRI) [#296165708383.9M]
`─ MergeJoin(?sampleIRI) [#296165708383.9M]
+─ HashJoin(?tinsuyke) [#1276.4M]
│ +─ MergeJoin(?sampleIRI) [#18K]
│ │ +─ Scan[POSC](?sampleIRI, rdf:type, <http://purl.org/net/grafli#CollectedSample>) [#1]
│ │ `─ Scan[PSOC](?sampleIRI, <http://purl.org/net/grafli#wasDerivedFrom>, ?tinsuyke) [#1]
│ `─ Scan[POS](?tinsuyke, <http://purl.org/net/grafli#isClassifiedBy>, <http://purl.org/net/grafli/study#8702a342-c58a-40c5-81ea-65e466211688>) [#1]
`─ Sort(?sampleIRI) [#7.4K]
`─ MergeJoin(?a) [#7.4K]
+─ PropertyPath(?a -> ?sampleIRI, minLength=1, sorted by=?a) [#2]
│ `─ Scan[PSOC](?a, <http://purl.org/net/grafli#wasDerivedFrom>, ?sampleIRI) [#1]
`─ NaryJoin(?a) [#1.9K]
+─ Scan[PSC](?a, <http://purl.org/net/grafli#dateCreated>, _) [#1]
+─ Scan[POSC](?a, rdf:type, <http://purl.org/net/grafli#Analysis>) [#1]
+─ Scan[PSC](?a, <http://purl.org/net/grafli#analysisSummary>, _) [#1]
`─ Scan[POSC](?a, <http://purl.org/net/grafli#hasAnalysisType>, <http://purl.org/net/grafli/analysistype#qpure>) [#1]
and 5.0.4:
From <http://purl.org/net/grafli#tbox>
From default
Projection(?sampleIRI) [#296165708383.9M]
`─ MergeJoin(?sampleIRI) [#296165708383.9M]
+─ HashJoin(?ekztyovu) [#1276.4M]
│ +─ MergeJoin(?sampleIRI) [#18K]
│ │ +─ Scan[POSC](?sampleIRI, rdf:type, <http://purl.org/net/grafli#CollectedSample>) [#1]
│ │ `─ Scan[PSOC](?sampleIRI, <http://purl.org/net/grafli#wasDerivedFrom>, ?ekztyovu) [#1]
│ `─ Scan[POS](?ekztyovu, <http://purl.org/net/grafli#isClassifiedBy>, <http://purl.org/net/grafli/study#8702a342-c58a-40c5-81ea-65e466211688>) [#1]
`─ Sort(?sampleIRI) [#7.4K]
`─ MergeJoin(?a) [#7.4K]
+─ PropertyPath(?a -> ?sampleIRI, minLength=1, sorted by=?a) [#2]
│ `─ Scan[PSOC](?a, <http://purl.org/net/grafli#wasDerivedFrom>, ?sampleIRI) [#1]
`─ NaryJoin(?a) [#1.9K]
+─ Scan[POSC](?a, <http://purl.org/net/grafli#hasAnalysisType>, <http://purl.org/net/grafli/analysistype#qpure>) [#1]
+─ Scan[POSC](?a, rdf:type, <http://purl.org/net/grafli#Analysis>) [#1]
+─ Scan[PSC](?a, <http://purl.org/net/grafli#dateCreated>, _) [#1]
`─ Scan[PSC](?a, <http://purl.org/net/grafli#analysisSummary>, _) [#1]
Additional info:
- Both 5.0.3. and 5.0.4 databases have
query.pp.contexts = false
. - Removing the second FROM clause (which for this particular query happens to be redundant because none of the triples live in the second named graph) makes it run fast again on 5.0.4
Hypothesis:
- Some badness between the multiple FROM clauses and the property path expression. Perhaps because of changes in 5.0.4 to query.pp.contexts, which I see mentioned in release notes.