Federated query returns empty result when LIMIT is increased

Hi, I'm trying to run the following federated query on our internal triplestore:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX oa: <http://www.w3.org/ns/openannotation/core/>
    PREFIX qa: <http://www.wdaqua.eu/qa#>
    prefix fqaac: <urn:fqaac:> 
    prefix prov: <http://www.w3.org/ns/prov#>
    prefix qado: <urn:qado#>

    SELECT DISTINCT ?answerCandidate ?qId ?verbalizedText ?isCorrect ?questionText
    WHERE {
        fqaac:experiment:qanswer:qald-9-plus-wikidata:test:en prov:generated ?answerCandidate .
        ?answerCandidate fqaac:hasNaturalLanguageRepresentation ?nl ;
            fqaac:qaF1Score ?qaF1score ;
            fqaac:relatedTo ?qId . 
        ?nl fqaac:algorithm "2022" ;
            fqaac:text ?verbalizedText .
        
        SERVICE <http://user:pass@host:40100/RDFized-datasets/query> {
            VALUES ?hasQuestion { qado:correctedQuestion qado:hasQuestion qado:questionEng qado:questionText }

            ?qId ?hasQuestion ?questionText .
            FILTER(LANG(?questionText) = 'en')
        }  

        BIND (IF(?qaF1score = 1.0, "True", "False") as ?isCorrect) .
        FILTER(LANG(?verbalizedText) = "en") 
    }
    LIMIT 500

This query works fine, however, if I increase the value of the LIMIT statement to 5000, it fails. Here are the query plans:

LIMIT 500

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix oa: <http://www.w3.org/ns/openannotation/core/>
prefix qa: <http://www.wdaqua.eu/qa#>
prefix fqaac: <urn:fqaac:>
prefix prov: <http://www.w3.org/ns/prov#>
prefix qado: <urn:qado#>

Slice(offset=0, limit=500) [#500]
`─ Distinct [#500]
   `─ Projection(?answerCandidate, ?qId, ?verbalizedText, ?isCorrect, ?questionText) [#500]
      `─ Bind(IF(?qaF1score = "1.0"^^xsd:decimal, "True", "False") AS ?isCorrect) [#500]
         `─ ServiceJoin [#500]
            +─ Service <http://user:pass@host:40100/RDFized-datasets/query>  {
            │  +─ Filter("en" = Lang(?questionText))
            │  +─ `─ {
            │  +─    `─ Scan[SPO](?qId, ?hasQuestion, ?questionText)
            │  +─    `─ VALUES (?hasQuestion) {
            │  +─       +─ ( <urn:qado#correctedQuestion> )
            │  +─       +─ ( <urn:qado#hasQuestion> )
            │  +─       +─ ( <urn:qado#questionEng> )
            │  +─       `─ ( <urn:qado#questionText> )
            │  +─       }
            │  +─    }
            │  }
            `─ Filter("en" = Lang(?verbalizedText)) [#1]
               `─ MergeJoin(?answerCandidate) [#1.1K]
                  +─ Scan[SPOC](<urn:fqaac:experiment:qanswer:qald-9-plus-wikidata:test:en>, prov:generated, ?answerCandidate) [#8.1K]
                  `─ MergeJoin(?answerCandidate) [#2.6K]
                     +─ Scan[PSOC](?answerCandidate, fqaac:qaF1Score, ?qaF1score) [#17K]
                     `─ BindJoin(?nl) [#1.3K]
                        +─ MergeJoin(?answerCandidate) [#1.7K]
                        │  +─ Scan[PSOC](?answerCandidate, fqaac:relatedTo, ?qId) [#16K]
                        │  `─ Scan[PSOC](?answerCandidate, fqaac:hasNaturalLanguageRepresentation, ?nl) [#2.1K]
                        `─ MergeJoin(?nl) [#1.7K]
                           +─ Scan[POSC](?nl, fqaac:algorithm, "2022") [#1.7K]
                           `─ Scan[PSOC](?nl, fqaac:text, ?verbalizedText) [#1.7K]

and LIMIT 5000

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix oa: <http://www.w3.org/ns/openannotation/core/>
prefix qa: <http://www.wdaqua.eu/qa#>
prefix fqaac: <urn:fqaac:>
prefix prov: <http://www.w3.org/ns/prov#>
prefix qado: <urn:qado#>

Slice(offset=0, limit=5000) [#1]
`─ Distinct [#1]
   `─ Projection(?answerCandidate, ?qId, ?verbalizedText, ?isCorrect, ?questionText) [#1]
      `─ Bind(IF(?qaF1score = "1.0"^^xsd:decimal, "True", "False") AS ?isCorrect) sortedBy=?answerCandidate [#1]
         `─ MergeJoin(?answerCandidate) [#1]
            +─ MergeJoin(?answerCandidate) [#8.5K]
            │  +─ Scan[SPOC](<urn:fqaac:experiment:qanswer:qald-9-plus-wikidata:test:en>, prov:generated, ?answerCandidate) [#8.1K]
            │  `─ Scan[PSOC](?answerCandidate, fqaac:qaF1Score, ?qaF1score) [#17K]
            `─ Sort(?answerCandidate) [#1]
               `─ Filter("en" = Lang(?verbalizedText)) [#1]
                  `─ MergeJoin(?nl) [#82]
                     +─ MergeJoin(?nl) [#1.7K]
                     │  +─ Scan[POSC](?nl, fqaac:algorithm, "2022") [#1.7K]
                     │  `─ Scan[PSOC](?nl, fqaac:text, ?verbalizedText) [#1.7K]
                     `─ Sort(?nl) [#1.0K]
                        `─ HashJoin(?qId) [#1.0K]
                           +─ Service <http://user:pass@host:40100/RDFized-datasets/query>  {
                           │  +─ Filter("en" = Lang(?questionText))
                           │  +─ `─ {
                           │  +─    `─ Scan[SPO](?qId, ?hasQuestion, ?questionText)
                           │  +─    `─ VALUES (?hasQuestion) {
                           │  +─       +─ ( <urn:qado#correctedQuestion> )
                           │  +─       +─ ( <urn:qado#hasQuestion> )
                           │  +─       +─ ( <urn:qado#questionEng> )
                           │  +─       `─ ( <urn:qado#questionText> )
                           │  +─       }
                           │  +─    }
                           │  }
                           `─ MergeJoin(?answerCandidate) [#1.7K]
                              +─ Scan[PSOC](?answerCandidate, fqaac:relatedTo, ?qId) [#16K]
                              `─ Scan[PSOC](?answerCandidate, fqaac:hasNaturalLanguageRepresentation, ?nl) [#2.1K]

I noticed in the visual representation of the Plan that some of the steps are marked as red (when using LIMIT 5000):

In comparison to LIIMT 500 (the Plan is a little bit different, though):

Hi Aleksandr,

What do you mean by "it fails"? Does it reach the timeout you specified for the query?
Could you also share the profiling results for the two queries?

Thanks and best regards
Lars

By "failed" I mean that there is simply no result (when increasing the limit). What do you mean by profiling?

Ok. You can profile any query which will provide information on the runtime behavior of the query (results of operators, runtime, memory consumption, etc.). You can also read more in the docs. In Studio, you can find the "Run Profiler" button in the dropdown next to the "Show Plan" button.

Best regards,
Lars

Hello @Lars_Heling, thanks for the hint! The profiling results are attached here.
query-ok.txt (33.5 KB)
query-failed.txt (3.4 KB)

Thanks for providing the profiling results. There are no results produced by the second plan because the number of results that are obtained from non-Stardog SPARQL services are limited to 1000 results by default. You can change the service.sparql.result.limit option (see docs page) to a negative value which results in no such limit being applied. Alternatively, adding the following query hint should also work: #pragma join.choice.strategy streaming.

Best regards
Lars

1 Like