Query time blows out > 1000x moving from 5.0.3 => 5.0.4

conradL · November 1, 2017, 6:25am

I have a query that has multiple FROM clauses and a property path expression with a ‘+’ wildcard in it, but is otherwise fairly innocuous:

SELECT
        ?sampleIRI
FROM
        <tag:stardog:api:context:default>
FROM
        <http://purl.org/net/grafli#tbox>
WHERE {
        ?sampleIRI a <http://purl.org/net/grafli#CollectedSample> ;
                <http://purl.org/net/grafli#wasDerivedFrom>/<http://purl.org/net/grafli#isClassifiedBy> <http://purl.org/net/grafli/study#8702a342-c58a-40c5-81ea-65e466211688>
        .
        ?a a <http://purl.org/net/grafli#Analysis> ;
                <http://purl.org/net/grafli#analysisSummary> ?qpureSummary ;
                <http://purl.org/net/grafli#dateCreated> ?qpureDate ;
                <http://purl.org/net/grafli#hasAnalysisType> <http://purl.org/net/grafli/analysistype#qpure> ;
                <http://purl.org/net/grafli#wasDerivedFrom>+ ?sampleIRI .
}

On 5.0.3 database with 5M triples the query runs in < 200ms and returns 260 results, as I expect. Against the same dataset on version 5.0.4 the query never returns — at least after 10 minutes it hadn’t returned and I gave up. Further, after I ran query kill on it, the query stayed in the query list results with Terminating status for an hour, until I restarted the server.

The query plan looks basically the same for version 5.0.3

From <http://purl.org/net/grafli#tbox>
From default
Projection(?sampleIRI) [#296165708383.9M]
`─ MergeJoin(?sampleIRI) [#296165708383.9M]
   +─ HashJoin(?tinsuyke) [#1276.4M]
   │  +─ MergeJoin(?sampleIRI) [#18K]
   │  │  +─ Scan[POSC](?sampleIRI, rdf:type, <http://purl.org/net/grafli#CollectedSample>) [#1]
   │  │  `─ Scan[PSOC](?sampleIRI, <http://purl.org/net/grafli#wasDerivedFrom>, ?tinsuyke) [#1]
   │  `─ Scan[POS](?tinsuyke, <http://purl.org/net/grafli#isClassifiedBy>, <http://purl.org/net/grafli/study#8702a342-c58a-40c5-81ea-65e466211688>) [#1]
   `─ Sort(?sampleIRI) [#7.4K]
      `─ MergeJoin(?a) [#7.4K]
         +─ PropertyPath(?a -> ?sampleIRI, minLength=1, sorted by=?a) [#2]
         │  `─ Scan[PSOC](?a, <http://purl.org/net/grafli#wasDerivedFrom>, ?sampleIRI) [#1]
         `─ NaryJoin(?a) [#1.9K]
            +─ Scan[PSC](?a, <http://purl.org/net/grafli#dateCreated>, _) [#1]
            +─ Scan[POSC](?a, rdf:type, <http://purl.org/net/grafli#Analysis>) [#1]
            +─ Scan[PSC](?a, <http://purl.org/net/grafli#analysisSummary>, _) [#1]
            `─ Scan[POSC](?a, <http://purl.org/net/grafli#hasAnalysisType>, <http://purl.org/net/grafli/analysistype#qpure>) [#1]

and 5.0.4:

From <http://purl.org/net/grafli#tbox>
From default
Projection(?sampleIRI) [#296165708383.9M]
`─ MergeJoin(?sampleIRI) [#296165708383.9M]
   +─ HashJoin(?ekztyovu) [#1276.4M]
   │  +─ MergeJoin(?sampleIRI) [#18K]
   │  │  +─ Scan[POSC](?sampleIRI, rdf:type, <http://purl.org/net/grafli#CollectedSample>) [#1]
   │  │  `─ Scan[PSOC](?sampleIRI, <http://purl.org/net/grafli#wasDerivedFrom>, ?ekztyovu) [#1]
   │  `─ Scan[POS](?ekztyovu, <http://purl.org/net/grafli#isClassifiedBy>, <http://purl.org/net/grafli/study#8702a342-c58a-40c5-81ea-65e466211688>) [#1]
   `─ Sort(?sampleIRI) [#7.4K]
      `─ MergeJoin(?a) [#7.4K]
         +─ PropertyPath(?a -> ?sampleIRI, minLength=1, sorted by=?a) [#2]
         │  `─ Scan[PSOC](?a, <http://purl.org/net/grafli#wasDerivedFrom>, ?sampleIRI) [#1]
         `─ NaryJoin(?a) [#1.9K]
            +─ Scan[POSC](?a, <http://purl.org/net/grafli#hasAnalysisType>, <http://purl.org/net/grafli/analysistype#qpure>) [#1]
            +─ Scan[POSC](?a, rdf:type, <http://purl.org/net/grafli#Analysis>) [#1]
            +─ Scan[PSC](?a, <http://purl.org/net/grafli#dateCreated>, _) [#1]
            `─ Scan[PSC](?a, <http://purl.org/net/grafli#analysisSummary>, _) [#1]

Additional info:

Both 5.0.3. and 5.0.4 databases have query.pp.contexts = false.
Removing the second FROM clause (which for this particular query happens to be redundant because none of the triples live in the second named graph) makes it run fast again on 5.0.4

Hypothesis:

Some badness between the multiple FROM clauses and the property path expression. Perhaps because of changes in 5.0.4 to query.pp.contexts, which I see mentioned in release notes.

pavel · November 1, 2017, 8:04am

Yes, it’s likely that the changes related to the interaction between property paths and contexts caused this. You may try to set query.pp.contexts = true with 5.0.4 to see if it resolves the problem. In the past couple of weeks I saw some cases where it made a difference.

If, by any chance, you are able to provide the data today (possibly in the obfuscated form), we’ll make sure this is resolved before 5.0.5 comes out (which might be later today). If you can do that, we can discuss it off list.

Thanks,
Pavel

conradL · November 3, 2017, 1:00am

Hi Pavel;
Here is the obfuscated dataset:
https://s3-ap-southeast-2.amazonaws.com/public-obfuscated-data-only/db-obfsc.trig
And the query on it that succeeds quickly (260 results) on 5.0.3 but times out on 5.0.4:
https://s3-ap-southeast-2.amazonaws.com/public-obfuscated-data-only/wildcard-multiple-FROM-clauses.sparql

obviously too late for 5.0.5, sorry...

conradL · November 3, 2017, 3:04am

An update…
I tried this query on the same dataset with the new version 5.0.5, and I get a different result again: it now does return but it takes about 1 minute (compared to < 1s on 5.0.3) and has 299 results instead of 260.

pavel · November 3, 2017, 8:57am

Thanks, Conrad.

I can reproduce the behavior and looking into it. I also see that removing the TBox graph makes the query quick.

One question: are you confident that 260 is the right answer, not 299? I can verify it too but you might be able to tell faster (since you have the original data).

Thanks,
Pavel

pavel · November 3, 2017, 11:17am

OK, i see that the difference is duplicates and I think they should not be there.

Thanks again for the data,
Pavel

pavel · November 7, 2017, 6:12pm

Hi Conrad,

We released 5.0.5.1 yesterday which should resolve this issue. Let us know if you hit any other issues with property paths.

Cheers,
Pavel

conradL · November 8, 2017, 12:19am

Thanks Pavel; query runs great on 5.0.5.1.

Topic		Replies	Views
Property path wildcard query result changed from 5.0.3 to 5.0.4 Bug	1	508	November 1, 2017
Stardog rule is making a 50 ms query take 500 ms Bug	6	684	March 29, 2017
Very different query plans for very similar queries, involving blank nodes Support	2	264	May 9, 2022
Query performance when using VALUES goes from 30 ms to > 5 minutes Support	4	504	February 28, 2017
Query timeout, simple query, medium dataset Support	7	806	July 19, 2017

Query time blows out > 1000x moving from 5.0.3 => 5.0.4

Related topics