I have the following query comparing offsets in text extracted by different tools. I load all the information extracted by tool1/tool2 into different databases and use a federated query from the tool1 database (since it extracts many more details and therefore has way more triples).
prefix foo: <http://purl.org/foo#>
prefix doco: <http://purl.org/spar/doco/>
select distinct ?sent1 ?sent2 ?offset
where {
{ #pragma group.join
?doc1 rdfs:label "someLabel"^^xsd:string .
?doc1 foo:contains ?sent1 .
?sent1 a doco:Sentence ; foo:char_offset ?offset }
{ SERVICE <http://localhost:5820/tool2/query>
{ SELECT ?sent2 where {
?doc2 rdfs:label "someLabel"^^xsd:string .
?doc2 foo:contains ?sent2 .
?sent2 a doco:Sentence ; foo:char_offset ?offset .
}}}
} ORDER BY ?offset
I get a binding set result for EVERY sentence from tool1 matched with EVERY sentence from tool2. This is wrong since the ?offset should be the same value/same variable.
However, if I change the query to add a filter, then the results are correct:
prefix foo: <http://purl.org/foo#>
prefix doco: <http://purl.org/spar/doco/>
select distinct ?sent1 ?sent2 ?offset
where {
{ #pragma group.join
?doc1 rdfs:label "someLabel"^^xsd:string .
?doc1 foo:contains ?sent1 .
?sent1 a doco:Sentence ; foo:char_offset ?offset }
{ SERVICE <http://localhost:5820/tool2/query>
{ SELECT ?sent2 ?offset2 where {
?doc2 rdfs:label "someLabel"^^xsd:string .
?doc2 foo:contains ?sent2 .
?sent2 a doco:Sentence ; foo:char_offset ?offset2 .
}}}
filter (?offset = ?offset2)
} ORDER BY ?offset
Andrea