SPARQL UNION only working with inferencing enabled?

I have a situation where a SPARQL query does not return the results that I would expect with reasoning disabled. With reasoning enabled it works just fine.

Consider the following SPARQL query:

SELECT *
FROM <http://example.org/test>
WHERE
{
  ?s_ ?p_ ?o_ .
  ?s_ <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
  
  {
    {
      {
        SELECT DISTINCT ?s_ ( SAMPLE ( ?o0 ) AS ?o0_sample )
        WHERE
		{
          ?s_ <http://xmlns.com/foaf/0.1/knows> ?o0 .
          ?o0 <http://xmlns.com/foaf/0.1/firstName> ?o1 .
          
          FILTER ( ?o1 = 'Alice' )
        }
        GROUP BY ?s_
      }
    }
  }
  UNION
  {
    {
      {
        SELECT DISTINCT ?s_ ( SAMPLE ( ?o0 ) AS ?o0_sample )
        WHERE
        {
          ?s_ <http://xmlns.com/foaf/0.1/knows> ?o0 .
          ?o0 <http://xmlns.com/foaf/0.1/firstName> ?o1 .
          
          FILTER ( ?o1 = 'Eve' )
        }
        GROUP BY ?s_
      }
    }
  }
}

And the following dataset:

ex:Alice a foaf:Person ; foaf:firstName 'Alice' .
ex:Bob a foaf:Person: foaf:firstName 'Bob' ; foaf:knows ex:Alice .

I want to select all persons who know at least one person named ‘Alice’ or ‘Eve’.

My problem:
The query does not return the single solution that I expect with inferecing disabled.

My observations:

  • If I remove the lower UNION part of the query, the query returns one result with inferencing disabled and one result with inferencing enabled. As UNION combines alternatives in SPARQL, I would expect the whole query to return a single result with inferencing disabled.

  • If I execute the query without inferencing it returns 0 results. But if I turn inferencing on, it returns the single result that I expect.

Does anyone know what am I missing out here?

Thank you! :slight_smile:

Thanks for this report. I can confirm that in 5.3.0 this query is optimized differently with/without reasoning enabled and that without it’s not returning any values. We’ll take a look into this, however you can retrieve the desired results with much simpler queries:

SELECT *
FROM <http://example.org/test>
WHERE
{
  {
    SELECT DISTINCT ?s_ WHERE {
      ?s_ a foaf:Person ; foaf:knows [ foaf:firstName 'Alice' ] .
    }
  }
  UNION
  {
    SELECT DISTINCT ?s_ WHERE {
      ?s_ a foaf:Person ; foaf:knows [ foaf:firstName 'Eve' ] .
    }
  }
}

Or even:

SELECT *
FROM <http://example.org/test>
WHERE
{
  ?s_ a foaf:Person ; foaf:knows [ foaf:firstName ?o1 ] .
  VALUES ?o1 { 'Alice' 'Eve' } 
}

Dear Stephan,

thank you for your quick response. The query is part of a LINQ to SPARQL generator and hence not optimal. We will take a look at query optimization in future releases.

However, we require it to return triples and therefore we need the enclosed s_ p_ o_ query at root level. I know that there is DESCRIBE, but that has issues with other triplestores that provide fast ODBC bindings and we currently have no support to generate storage-specific queries.

With best regards,

Sebastian

So are you saying you have no control over the actual SPARQL that is being generated at the moment? Or can you tweak this existing query but not its overall structure?

We do have control over the actual SPARQL that is being generated, but we are limited to the way the LINQ query is visited by the parser. Which means that we can tweak the patterns that are generated - but not the details of a specific query.

So if we encounter SubQueryExpressions which are combined with OrElseBinaryExpression, then we implement the sub query and combine it using UNION. Which is perfectly fine with SPARQL. Certainly not optimal, but it should work.

To reduce the expression to the form you suggested, we actually need to analyze the query before generating the SPARQL and realize that the sub-queries all have the same structure or that certain variables are not selected after we generated them… This kind of optimization we currently cannot do.

So, we cannot tweak this specific query but the pattern that is used to generate it.

Can I ask what the SAMPLE() and GROUP BY are meant to accomplish? Do you ultimately even want ?o0_sample selected?

This is the LINQ query:

persons = from person in Model.AsQueryable<Person>() where person.KnownPeople.Any(p => p.FirstName == "Alice") || person.KnownPeople.Any(p => p.FirstName == "Eve") select person;

So every time you do .Any() we generate SAMPLE and select the value, just in case it might be used in the outer query. In this specific situation it is not used and could be ommited. The GROUP BY is there because SAMPLE is an aggregate and requires it.

I just saw that when I remove the SAMPLE and the GROUP BY, the query returns the expected results., even with reasoning disabled. :slight_smile: However, when I create a query that actually selects the sampled value the same issue arises:

SELECT ?s_ ?p_ ?o_
FROM <http://example.org/test>
WHERE
{
  ?s_ ?p_ ?o_ .
  
  {
    {
      {
        SELECT DISTINCT ?s_ ( SAMPLE ( ?o0 ) AS ?o0_sample )
        WHERE
        {
          ?s_ <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
          ?s_ <http://xmlns.com/foaf/0.1/knows> ?o0 .
          ?o0 <http://xmlns.com/foaf/0.1/firstName> ?o1 .
          FILTER ( ?o1 = 'Alice' )
        }
        GROUP BY ?s_
      }
    }
  }
  UNION
  {
    {
      {
        SELECT DISTINCT ?s_ ( SAMPLE ( ?o2 ) AS ?o0_sample )
        WHERE
        {
          ?s_ <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
          ?s_ <http://xmlns.com/foaf/0.1/knows> ?o2 .
          ?o2 <http://xmlns.com/foaf/0.1/firstName> ?o3 .
          FILTER ( ?o3 = 'Eve' )
        }
        GROUP BY ?s_
      }
    }
  }
  
  FILTER (?o0_sample != <http://example.org/test/Bob>)
}

I just realized that the variables inside .Any() are locally scoped and cannot be referenced from outer queries. Which means that we can ommit the SAMPLE and GROUP BY in our case. That solves the issue for us. Yay! :smiley:

1 Like

Works. Thanks for your help @stephen! :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.