Hello everyone,
This section was created a while ago but remained fairly dormant. We're going to try to change that! Over the years we have seen many queries which, if executed according to the SPARQL spec, won't do what their authors wanted. We think this section is the great place to show some examples to help SPARQL users avoid those pitfalls. We might push some longer explanations to the Stardog Labs blog (working on one already!) but shorter ones can go here.
I am not very good at naming things so I'll just dub it "SPARQL Fun"
We'll start with a very basic thing: scoping (or Group Graph Patterns). Things that go inside {}
. In most cases adding extra curly brackets, eg. to visually group things together in a query, does not change the results. For example:
SELECT ?name ?mbox
WHERE {
?x foaf:name ?name .
?x foaf:mbox ?mbox .
}
returns exact same results as
SELECT ?name ?mbox
WHERE { { ?x foaf:name ?name . }
{ ?x foaf:mbox ?mbox . }
}
But be careful! There are patterns in SPARQL where it could change results in a rather unexpected way. Good examples are BINDs and FILTERs. Let's take a look at the latter:
SELECT * {
?person :lives ?city
OPTIONAL {
?person :worksFor ?company .
?company :hq ?hqCity .
FILTER (?city = ?hqCity)
}
}
this query returns all people and company information for those who work for a company headquartered in their home city. The plan looks expected:
`─ Projection(?person, ?city, ?company, ?hqCity) [#1]
`─ MergeJoinOuter(?person)(?city = ?hqCity) [#1]
+─ Scan[PSO](?person, :lives, ?city) [#1]
`─ Sort(?person) [#1]
`─ MergeJoin(?company) [#1]
+─ Scan[POS](?person, :worksFor, ?company) [#1]
`─ Scan[PSO](?company, :hq, ?hqCity) [#1]
the filter is a part of the outer join condition ("outer" or "left join" just means "optional"). Now let's see what happens if we add seemingly innocuous curly brackets inside that OPTIONAL:
SELECT * {
?person :lives ?city
OPTIONAL {
{
?person :worksFor ?company .
?company :hq ?hqCity .
FILTER (?city = ?hqCity)
}
}
}
now the plan looks strange:
`─ Projection(?person, ?city, ?company, ?hqCity) [#1]
`─ Scan[POS](?person, :lives, ?city) [#1]
where did the OPTIONAL go? How is this a valid optimization?
The answer is: the optimizer figured out that the OPTIONAL can never match any results and dropped it altogether. If one looks at the semantics of the 2nd query, for example, at SPARQLer Query Validator (this is vendor independent and a great tool), they would see the following:
(leftjoin
(bgp (triple ?person :lives ?city))
(filter (= ?city ?hqCity)
(bgp
(triple ?person :worksFor ?company)
(triple ?company :hq ?hqCity)
)))))
That is: the filter is no longer a part of the OPTIONAL join condition (called "leftjoin"). It's evaluated only over the results of the OPTIONAL pattern itself and has no access to the variables outside of the OPTIONAL. That means the ?city
variable can never contain any value at the time the filter is evaluated, thus the equality check is never true
, thus the OPTIONAL pattern cannot return results, and thus it can be removed from the query without changing results.
A similar thing can happen if you use BIND
operators since their semantics is scoped too. Try to come up with an example which exhibits it and mind the curly brackets!
Let us know what you think in the comments and feel free to drop any SPARQL examples you find bizarre, happy to discuss those here.
Cheers,
Pavel