[SPARQL Fun]: mind the curly brackets

Hello everyone,

This section was created a while ago but remained fairly dormant. We're going to try to change that! Over the years we have seen many queries which, if executed according to the SPARQL spec, won't do what their authors wanted. We think this section is the great place to show some examples to help SPARQL users avoid those pitfalls. We might push some longer explanations to the Stardog Labs blog (working on one already!) but shorter ones can go here.

I am not very good at naming things so I'll just dub it "SPARQL Fun" :slight_smile:

We'll start with a very basic thing: scoping (or Group Graph Patterns). Things that go inside {}. In most cases adding extra curly brackets, eg. to visually group things together in a query, does not change the results. For example:

SELECT ?name ?mbox
WHERE  {
          ?x foaf:name ?name .
          ?x foaf:mbox ?mbox .
       }

returns exact same results as

SELECT ?name ?mbox
WHERE  { { ?x foaf:name ?name . }
         { ?x foaf:mbox ?mbox . }
       }

But be careful! There are patterns in SPARQL where it could change results in a rather unexpected way. Good examples are BINDs and FILTERs. Let's take a look at the latter:

SELECT * {
    ?person :lives ?city
    OPTIONAL { 
        ?person :worksFor ?company .
        ?company :hq ?hqCity .
        FILTER (?city = ?hqCity)
    }
}

this query returns all people and company information for those who work for a company headquartered in their home city. The plan looks expected:

`─ Projection(?person, ?city, ?company, ?hqCity) [#1]
   `─ MergeJoinOuter(?person)(?city = ?hqCity) [#1]
      +─ Scan[PSO](?person, :lives, ?city) [#1]
      `─ Sort(?person) [#1]
         `─ MergeJoin(?company) [#1]
            +─ Scan[POS](?person, :worksFor, ?company) [#1]
            `─ Scan[PSO](?company, :hq, ?hqCity) [#1]

the filter is a part of the outer join condition ("outer" or "left join" just means "optional"). Now let's see what happens if we add seemingly innocuous curly brackets inside that OPTIONAL:

SELECT * {
    ?person :lives ?city
    OPTIONAL { 
      {  
        ?person :worksFor ?company .
        ?company :hq ?hqCity .
        FILTER (?city = ?hqCity)
      }
    }
}

now the plan looks strange:

`─ Projection(?person, ?city, ?company, ?hqCity) [#1]
   `─ Scan[POS](?person, :lives, ?city) [#1]

where did the OPTIONAL go? How is this a valid optimization?

The answer is: the optimizer figured out that the OPTIONAL can never match any results and dropped it altogether. If one looks at the semantics of the 2nd query, for example, at SPARQLer Query Validator (this is vendor independent and a great tool), they would see the following:

      (leftjoin
       (bgp (triple ?person :lives ?city))
        (filter (= ?city ?hqCity)
          (bgp
            (triple ?person :worksFor ?company)
            (triple ?company :hq ?hqCity)
          )))))

That is: the filter is no longer a part of the OPTIONAL join condition (called "leftjoin"). It's evaluated only over the results of the OPTIONAL pattern itself and has no access to the variables outside of the OPTIONAL. That means the ?city variable can never contain any value at the time the filter is evaluated, thus the equality check is never true, thus the OPTIONAL pattern cannot return results, and thus it can be removed from the query without changing results.

A similar thing can happen if you use BIND operators since their semantics is scoped too. Try to come up with an example which exhibits it and mind the curly brackets!

Let us know what you think in the comments and feel free to drop any SPARQL examples you find bizarre, happy to discuss those here.

Cheers,
Pavel

3 Likes

It doesn't make sense to me that the filter would be part of the join condition when the extra braces are missing. I usually think in terms of evaluating what's inside the curly brackets in isolation and join those intermediate results as I move up the plan. I see that the filter is inside those brackets so I assume it's going to behave like the query with the extra brackets. I would have thought the only way to have the cities be part of the join condition would be to name them the same:

SELECT * {
    ?person :lives ?city
    OPTIONAL { 
        ?person :worksFor ?company .
        ?company :hq ?city .
    }
}

Good lesson!

Right, see the example in SPARQL 1.1 Query Language for how filters behave inside OPTIONALs. I guess they could have provided a better example to make it clear that the filter is evaluated over joined solutions. Instead it's defined formally in 18.5 (see the definition of LeftJoin)

1 Like