Federated query returns empty result when LIMIT is increased

perevalov_a · March 24, 2023, 5:04pm

Hi, I'm trying to run the following federated query on our internal triplestore:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX oa: <http://www.w3.org/ns/openannotation/core/>
    PREFIX qa: <http://www.wdaqua.eu/qa#>
    prefix fqaac: <urn:fqaac:> 
    prefix prov: <http://www.w3.org/ns/prov#>
    prefix qado: <urn:qado#>

    SELECT DISTINCT ?answerCandidate ?qId ?verbalizedText ?isCorrect ?questionText
    WHERE {
        fqaac:experiment:qanswer:qald-9-plus-wikidata:test:en prov:generated ?answerCandidate .
        ?answerCandidate fqaac:hasNaturalLanguageRepresentation ?nl ;
            fqaac:qaF1Score ?qaF1score ;
            fqaac:relatedTo ?qId . 
        ?nl fqaac:algorithm "2022" ;
            fqaac:text ?verbalizedText .
        
        SERVICE <http://user:pass@host:40100/RDFized-datasets/query> {
            VALUES ?hasQuestion { qado:correctedQuestion qado:hasQuestion qado:questionEng qado:questionText }

            ?qId ?hasQuestion ?questionText .
            FILTER(LANG(?questionText) = 'en')
        }  

        BIND (IF(?qaF1score = 1.0, "True", "False") as ?isCorrect) .
        FILTER(LANG(?verbalizedText) = "en") 
    }
    LIMIT 500

This query works fine, however, if I increase the value of the LIMIT statement to 5000, it fails. Here are the query plans:

LIMIT 500

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix oa: <http://www.w3.org/ns/openannotation/core/>
prefix qa: <http://www.wdaqua.eu/qa#>
prefix fqaac: <urn:fqaac:>
prefix prov: <http://www.w3.org/ns/prov#>
prefix qado: <urn:qado#>

Slice(offset=0, limit=500) [#500]
`─ Distinct [#500]
   `─ Projection(?answerCandidate, ?qId, ?verbalizedText, ?isCorrect, ?questionText) [#500]
      `─ Bind(IF(?qaF1score = "1.0"^^xsd:decimal, "True", "False") AS ?isCorrect) [#500]
         `─ ServiceJoin [#500]
            +─ Service <http://user:pass@host:40100/RDFized-datasets/query>  {
            │  +─ Filter("en" = Lang(?questionText))
            │  +─ `─ {
            │  +─    `─ Scan[SPO](?qId, ?hasQuestion, ?questionText)
            │  +─    `─ VALUES (?hasQuestion) {
            │  +─       +─ ( <urn:qado#correctedQuestion> )
            │  +─       +─ ( <urn:qado#hasQuestion> )
            │  +─       +─ ( <urn:qado#questionEng> )
            │  +─       `─ ( <urn:qado#questionText> )
            │  +─       }
            │  +─    }
            │  }
            `─ Filter("en" = Lang(?verbalizedText)) [#1]
               `─ MergeJoin(?answerCandidate) [#1.1K]
                  +─ Scan[SPOC](<urn:fqaac:experiment:qanswer:qald-9-plus-wikidata:test:en>, prov:generated, ?answerCandidate) [#8.1K]
                  `─ MergeJoin(?answerCandidate) [#2.6K]
                     +─ Scan[PSOC](?answerCandidate, fqaac:qaF1Score, ?qaF1score) [#17K]
                     `─ BindJoin(?nl) [#1.3K]
                        +─ MergeJoin(?answerCandidate) [#1.7K]
                        │  +─ Scan[PSOC](?answerCandidate, fqaac:relatedTo, ?qId) [#16K]
                        │  `─ Scan[PSOC](?answerCandidate, fqaac:hasNaturalLanguageRepresentation, ?nl) [#2.1K]
                        `─ MergeJoin(?nl) [#1.7K]
                           +─ Scan[POSC](?nl, fqaac:algorithm, "2022") [#1.7K]
                           `─ Scan[PSOC](?nl, fqaac:text, ?verbalizedText) [#1.7K]

and LIMIT 5000

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix oa: <http://www.w3.org/ns/openannotation/core/>
prefix qa: <http://www.wdaqua.eu/qa#>
prefix fqaac: <urn:fqaac:>
prefix prov: <http://www.w3.org/ns/prov#>
prefix qado: <urn:qado#>

Slice(offset=0, limit=5000) [#1]
`─ Distinct [#1]
   `─ Projection(?answerCandidate, ?qId, ?verbalizedText, ?isCorrect, ?questionText) [#1]
      `─ Bind(IF(?qaF1score = "1.0"^^xsd:decimal, "True", "False") AS ?isCorrect) sortedBy=?answerCandidate [#1]
         `─ MergeJoin(?answerCandidate) [#1]
            +─ MergeJoin(?answerCandidate) [#8.5K]
            │  +─ Scan[SPOC](<urn:fqaac:experiment:qanswer:qald-9-plus-wikidata:test:en>, prov:generated, ?answerCandidate) [#8.1K]
            │  `─ Scan[PSOC](?answerCandidate, fqaac:qaF1Score, ?qaF1score) [#17K]
            `─ Sort(?answerCandidate) [#1]
               `─ Filter("en" = Lang(?verbalizedText)) [#1]
                  `─ MergeJoin(?nl) [#82]
                     +─ MergeJoin(?nl) [#1.7K]
                     │  +─ Scan[POSC](?nl, fqaac:algorithm, "2022") [#1.7K]
                     │  `─ Scan[PSOC](?nl, fqaac:text, ?verbalizedText) [#1.7K]
                     `─ Sort(?nl) [#1.0K]
                        `─ HashJoin(?qId) [#1.0K]
                           +─ Service <http://user:pass@host:40100/RDFized-datasets/query>  {
                           │  +─ Filter("en" = Lang(?questionText))
                           │  +─ `─ {
                           │  +─    `─ Scan[SPO](?qId, ?hasQuestion, ?questionText)
                           │  +─    `─ VALUES (?hasQuestion) {
                           │  +─       +─ ( <urn:qado#correctedQuestion> )
                           │  +─       +─ ( <urn:qado#hasQuestion> )
                           │  +─       +─ ( <urn:qado#questionEng> )
                           │  +─       `─ ( <urn:qado#questionText> )
                           │  +─       }
                           │  +─    }
                           │  }
                           `─ MergeJoin(?answerCandidate) [#1.7K]
                              +─ Scan[PSOC](?answerCandidate, fqaac:relatedTo, ?qId) [#16K]
                              `─ Scan[PSOC](?answerCandidate, fqaac:hasNaturalLanguageRepresentation, ?nl) [#2.1K]

I noticed in the visual representation of the Plan that some of the steps are marked as red (when using LIMIT 5000):

In comparison to LIIMT 500 (the Plan is a little bit different, though):

Lars_Heling · March 24, 2023, 7:18pm

Hi Aleksandr,

What do you mean by "it fails"? Does it reach the timeout you specified for the query?
Could you also share the profiling results for the two queries?

Thanks and best regards
Lars

perevalov_a · March 28, 2023, 8:47am

By "failed" I mean that there is simply no result (when increasing the limit). What do you mean by profiling?

Lars_Heling · March 28, 2023, 8:58am

Ok. You can profile any query which will provide information on the runtime behavior of the query (results of operators, runtime, memory consumption, etc.). You can also read more in the docs. In Studio, you can find the "Run Profiler" button in the dropdown next to the "Show Plan" button.

Best regards,
Lars

perevalov_a · March 30, 2023, 11:13am

Hello @Lars_Heling, thanks for the hint! The profiling results are attached here.
query-ok.txt (33.5 KB)
query-failed.txt (3.4 KB)

Lars_Heling · April 2, 2023, 5:33pm

Thanks for providing the profiling results. There are no results produced by the second plan because the number of results that are obtained from non-Stardog SPARQL services are limited to 1000 results by default. You can change the service.sparql.result.limit option (see docs page) to a negative value which results in no such limit being applied. Alternatively, adding the following query hint should also work: #pragma join.choice.strategy streaming.

Best regards
Lars

Topic		Replies	Views
Federated query with inline data returns empty result Support	3	1380	August 10, 2017
Federated recursive query Feature Request	10	882	November 28, 2019
LIMIT not respected in Javascript-submitted query Bug	4	529	September 14, 2018
Federated Endpoint Query fails for rdfs:label property Support	14	519	July 31, 2020
Unexpected results with SERVICE clause Bug	10	501	March 6, 2019

Federated query returns empty result when LIMIT is increased

Related topics