Query time blows out > 1000x moving from 5.0.3 => 5.0.4

I have a query that has multiple FROM clauses and a property path expression with a ‘+’ wildcard in it, but is otherwise fairly innocuous:

SELECT
        ?sampleIRI
FROM
        <tag:stardog:api:context:default>
FROM
        <http://purl.org/net/grafli#tbox>
WHERE {
        ?sampleIRI a <http://purl.org/net/grafli#CollectedSample> ;
                <http://purl.org/net/grafli#wasDerivedFrom>/<http://purl.org/net/grafli#isClassifiedBy> <http://purl.org/net/grafli/study#8702a342-c58a-40c5-81ea-65e466211688>
        .
        ?a a <http://purl.org/net/grafli#Analysis> ;
                <http://purl.org/net/grafli#analysisSummary> ?qpureSummary ;
                <http://purl.org/net/grafli#dateCreated> ?qpureDate ;
                <http://purl.org/net/grafli#hasAnalysisType> <http://purl.org/net/grafli/analysistype#qpure> ;
                <http://purl.org/net/grafli#wasDerivedFrom>+ ?sampleIRI .
}

On 5.0.3 database with 5M triples the query runs in < 200ms and returns 260 results, as I expect. Against the same dataset on version 5.0.4 the query never returns — at least after 10 minutes it hadn’t returned and I gave up. Further, after I ran query kill on it, the query stayed in the query list results with Terminating status for an hour, until I restarted the server.

The query plan looks basically the same for version 5.0.3

From <http://purl.org/net/grafli#tbox>
From default
Projection(?sampleIRI) [#296165708383.9M]
`─ MergeJoin(?sampleIRI) [#296165708383.9M]
   +─ HashJoin(?tinsuyke) [#1276.4M]
   │  +─ MergeJoin(?sampleIRI) [#18K]
   │  │  +─ Scan[POSC](?sampleIRI, rdf:type, <http://purl.org/net/grafli#CollectedSample>) [#1]
   │  │  `─ Scan[PSOC](?sampleIRI, <http://purl.org/net/grafli#wasDerivedFrom>, ?tinsuyke) [#1]
   │  `─ Scan[POS](?tinsuyke, <http://purl.org/net/grafli#isClassifiedBy>, <http://purl.org/net/grafli/study#8702a342-c58a-40c5-81ea-65e466211688>) [#1]
   `─ Sort(?sampleIRI) [#7.4K]
      `─ MergeJoin(?a) [#7.4K]
         +─ PropertyPath(?a -> ?sampleIRI, minLength=1, sorted by=?a) [#2]
         │  `─ Scan[PSOC](?a, <http://purl.org/net/grafli#wasDerivedFrom>, ?sampleIRI) [#1]
         `─ NaryJoin(?a) [#1.9K]
            +─ Scan[PSC](?a, <http://purl.org/net/grafli#dateCreated>, _) [#1]
            +─ Scan[POSC](?a, rdf:type, <http://purl.org/net/grafli#Analysis>) [#1]
            +─ Scan[PSC](?a, <http://purl.org/net/grafli#analysisSummary>, _) [#1]
            `─ Scan[POSC](?a, <http://purl.org/net/grafli#hasAnalysisType>, <http://purl.org/net/grafli/analysistype#qpure>) [#1]

and 5.0.4:

From <http://purl.org/net/grafli#tbox>
From default
Projection(?sampleIRI) [#296165708383.9M]
`─ MergeJoin(?sampleIRI) [#296165708383.9M]
   +─ HashJoin(?ekztyovu) [#1276.4M]
   │  +─ MergeJoin(?sampleIRI) [#18K]
   │  │  +─ Scan[POSC](?sampleIRI, rdf:type, <http://purl.org/net/grafli#CollectedSample>) [#1]
   │  │  `─ Scan[PSOC](?sampleIRI, <http://purl.org/net/grafli#wasDerivedFrom>, ?ekztyovu) [#1]
   │  `─ Scan[POS](?ekztyovu, <http://purl.org/net/grafli#isClassifiedBy>, <http://purl.org/net/grafli/study#8702a342-c58a-40c5-81ea-65e466211688>) [#1]
   `─ Sort(?sampleIRI) [#7.4K]
      `─ MergeJoin(?a) [#7.4K]
         +─ PropertyPath(?a -> ?sampleIRI, minLength=1, sorted by=?a) [#2]
         │  `─ Scan[PSOC](?a, <http://purl.org/net/grafli#wasDerivedFrom>, ?sampleIRI) [#1]
         `─ NaryJoin(?a) [#1.9K]
            +─ Scan[POSC](?a, <http://purl.org/net/grafli#hasAnalysisType>, <http://purl.org/net/grafli/analysistype#qpure>) [#1]
            +─ Scan[POSC](?a, rdf:type, <http://purl.org/net/grafli#Analysis>) [#1]
            +─ Scan[PSC](?a, <http://purl.org/net/grafli#dateCreated>, _) [#1]
            `─ Scan[PSC](?a, <http://purl.org/net/grafli#analysisSummary>, _) [#1]

Additional info:

  • Both 5.0.3. and 5.0.4 databases have query.pp.contexts = false.
  • Removing the second FROM clause (which for this particular query happens to be redundant because none of the triples live in the second named graph) makes it run fast again on 5.0.4

Hypothesis:

  • Some badness between the multiple FROM clauses and the property path expression. Perhaps because of changes in 5.0.4 to query.pp.contexts, which I see mentioned in release notes.

Yes, it’s likely that the changes related to the interaction between property paths and contexts caused this. You may try to set query.pp.contexts = true with 5.0.4 to see if it resolves the problem. In the past couple of weeks I saw some cases where it made a difference.

If, by any chance, you are able to provide the data today (possibly in the obfuscated form), we’ll make sure this is resolved before 5.0.5 comes out (which might be later today). If you can do that, we can discuss it off list.

Thanks,
Pavel

Hi Pavel;
Here is the obfuscated dataset:
https://s3-ap-southeast-2.amazonaws.com/public-obfuscated-data-only/db-obfsc.trig
And the query on it that succeeds quickly (260 results) on 5.0.3 but times out on 5.0.4:
https://s3-ap-southeast-2.amazonaws.com/public-obfuscated-data-only/wildcard-multiple-FROM-clauses.sparql

obviously too late for 5.0.5, sorry…

An update…
I tried this query on the same dataset with the new version 5.0.5, and I get a different result again: it now does return but it takes about 1 minute (compared to < 1s on 5.0.3) and has 299 results instead of 260.

Thanks, Conrad.

I can reproduce the behavior and looking into it. I also see that removing the TBox graph makes the query quick.

One question: are you confident that 260 is the right answer, not 299? I can verify it too but you might be able to tell faster (since you have the original data).

Thanks,
Pavel

OK, i see that the difference is duplicates and I think they should not be there.

Thanks again for the data,
Pavel

Hi Conrad,

We released 5.0.5.1 yesterday which should resolve this issue. Let us know if you hit any other issues with property paths.

Cheers,
Pavel

Thanks Pavel; query runs great on 5.0.5.1.

1 Like