Strange query plan

This seems like a very strange query plan … Any ideas to change this massive join being done twice?

Explaining Query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX event: <http://ontology.causeex.com/ontology/odps/Event#>
PREFIX general: <http://ontology.causeex.com/ontology/odps/GeneralConcepts#>
select (COUNT(?s) as ?count)
# location IRIs are returned in ?relatedIRI to facilitate getting events that involve a particular location
FROM <http://graph.causeex.com>
FROM <http://ontology.causeex.com>
where { 
   ?s a ?type . ?type rdfs:subClassOf* event:Event .
    ?s ?p ?relatedIRI .
       FILTER ( ?p IN ( general:located_at, general:located_near, 
                event:has_origin, event:has_destination ) ) .
       ?relatedIRI ?p1 ?label .
       FILTER ( ?p1 IN ( rdfs:label, general:has_canonical_label ) ) .
       BIND ( 'event-relatedLocation' as ?textFrom ) .
       BIND ( ?label as ?relatedText ) 
   }

The Query Plan:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix event: <http://ontology.causeex.com/ontology/odps/Event#>
prefix general: <http://ontology.causeex.com/ontology/odps/GeneralConcepts#>

From <http://ontology.causeex.com>
From <http://graph.causeex.com>
Projection(?count) [#1]
`─ Group(aggregates=[(COUNT(*) AS ?count)]) [#1]
   `─ HashJoin(?p1) [#1]
      +─ HashJoin(?p1) [#1]
      │  +─ Bind("event-relatedLocation" AS ?textFrom) (?label AS ?relatedText) [#433]
      │  │  `─ MergeJoin(?relatedIRI) [#433]
      │  │     +─ Sort(?relatedIRI) [#433]
      │  │     │  `─ MergeJoin(?s) [#433]
      │  │     │     +─ Sort(?s) [#69]
      │  │     │     │  `─ MergeJoin(?type) [#69]
      │  │     │     │     +─ Sort(?type) [#214]
      │  │     │     │     │  `─ PropertyPath(event:Event -> ?type, minLength=0) [#214]
      │  │     │     │     │     `─ Scan[POSC](?type, rdfs:subClassOf, event:Event) [#107]
      │  │     │     │     `─ Scan[POSC](?s, rdf:type, ?type) [#3.6M]
      │  │     │     `─ Scan[SPOC](?s, ?p, ?relatedIRI) [#20.1M]
      │  │     `─ Scan[SPOC](?relatedIRI, ?p1, ?label) [#20.1M]
      │  `─ VALUES (?p1) {
      │     +─ ( rdfs:label )
      │     `─ ( general:has_canonical_label )
      │     }
      `─ HashJoin(?p) [#1]
         +─ Bind("event-relatedLocation" AS ?textFrom) (?label AS ?relatedText) [#433]
         │  `─ MergeJoin(?relatedIRI) [#433]
         │     +─ Sort(?relatedIRI) [#433]
         │     │  `─ MergeJoin(?s) [#433]
         │     │     +─ Sort(?s) [#69]
         │     │     │  `─ MergeJoin(?type) [#69]
         │     │     │     +─ Sort(?type) [#214]
         │     │     │     │  `─ PropertyPath(event:Event -> ?type, minLength=0) [#214]
         │     │     │     │     `─ Scan[POSC](?type, rdfs:subClassOf, event:Event) [#107]
         │     │     │     `─ Scan[POSC](?s, rdf:type, ?type) [#3.6M]
         │     │     `─ Scan[SPOC](?s, ?p, ?relatedIRI) [#20.1M]
         │     `─ Scan[SPOC](?relatedIRI, ?p1, ?label) [#20.1M]
         `─ VALUES (?p) {
            +─ ( general:located_at )
            +─ ( general:located_near )
            +─ ( event:has_origin )
            `─ ( event:has_destination )
            }

Thanks.
Andrea

Hi Andrea,

This plan doesn’t look terrible assuming the join estimates are roughly accurate. Query optimization may result in what looks like redundant computation if it reduces the overall cost of the query. How long does the query take to execute? How long does it take to execute if you remove this second half of the query:

       ?relatedIRI ?p1 ?label .
       FILTER ( ?p1 IN ( rdfs:label, general:has_canonical_label ) ) .
       BIND ( 'event-relatedLocation' as ?textFrom ) .
       BIND ( ?label as ?relatedText ) 

Jess

The query times out.
Execution time exceeded query timeout 1800000

But, overall this query is part of a larger UNION (hence the binds) that I was trying to debug. So, I broke it down and went after just a count to compare the results with another db (which did not time out :frowning:).

Removing just the ?relatedIRI returned …
Query returned 1 results in 00:00:15.277

Removing both the ?relatedIRI and BINDs returned …
Query returned 1 results in 00:00:01.808

I need to affect the execution order in order to require the ?relatedIRI to be limited to the results from the ?s ?p ?relatedIRI results.

Andrea

Can you try wrapping the first four lines of the WHERE clause with a hint (new set of curly braces required here):

{ #pragma group.joins
   ?s a ?type . ?type rdfs:subClassOf* event:Event .
    ?s ?p ?relatedIRI .
       FILTER ( ?p IN ( general:located_at, general:located_near, 
                event:has_origin, event:has_destination ) ) .
}

Looks like it did the trick. Executed in 00:00:02.594.

Thanks.
Andrea