Stardog rule is making a 50 ms query take 500 ms

Hi,

I have the following query:

select * WHERE { 
    <http://data.einnsyn.no/bruker/39372e59-d8f9-47cb-a04d-281de6942216> <http://data.einnsyn.no/brukermeta/lagretSak> / <http://data.einnsyn.no/brukermeta/sak> ?sak.
  
    ?sak arkiv:offentligTittel_SENSITIV ?offentligTittel_SENSITIV.
	?sak arkiv:saksaar ?saksaar .
 
}

And the following rule:

[ rdf:type <tag:stardog:api:rule:SPARQLRule> ;
   <tag:stardog:api:rule:content> """

 PREFIX arkiv: <http://www.arkivverket.no/standarder/noark5/arkivstruktur/>

 IF {
 	?a arkiv:mappeID ?mappeID.
        BIND(REPLACE(?mappeID, \"\\\\/.*\", \"\") as ?year)
 }
 THEN {
          ?a arkiv:saksaar ?year.
 }

"""
 ] .

Commenting out ?sak arkiv:saksaar ?saksaar brings the query from ~500 ms to around 50 ms.

Inlining the rule brings the query back down to ~50 ms:

select * WHERE { 
    <http://data.einnsyn.no/bruker/39372e59-d8f9-47cb-a04d-281de6942216> <http://data.einnsyn.no/brukermeta/lagretSak> / <http://data.einnsyn.no/brukermeta/sak> ?sak.
  
	?sak arkiv:offentligTittel_SENSITIV ?offentligTittel_SENSITIV.
   #?sak    arkiv:saksaar ?saksaar .
 
	?sak arkiv:mappeID ?mappeID.
	BIND(REPLACE(?mappeID, "\\/.*", "") as ?year)
}

Why is it so slow, and how can I make it faster?

Cheers,
Håvard M. Ottestad

The purpose of the rule is to extract the year from the following string format: “2017/123”

Query plan

Explaining Query:

select * WHERE { 
    <http://data.einnsyn.no/bruker/39372e59-d8f9-47cb-a04d-281de6942216> <http://data.einnsyn.no/brukermeta/lagretSak> / <http://data.einnsyn.no/brukermeta/sak> ?sak.
  
    ?sak arkiv:offentligTittel_SENSITIV ?offentligTittel_SENSITIV.
     ?sak    arkiv:saksaar ?saksaar .
 
 
 
}

The Query Plan:

From all
Distinct [#4]
`─ Projection(?sak, ?offentligTittel_SENSITIV, ?saksaar) [#4]
   `─ Union [#4]
      +─ MergeJoin(?sak) [#2]
      │  +─ MergeJoin(?sak) [#2]
      │  │  +─ Sort(?sak) [#2]
      │  │  │  `─ MergeJoin(?pxtthzsd) [#2]
      │  │  │     +─ Scan[SPOC](<http://data.einnsyn.no/bruker/39372e59-d8f9-47cb-a04d-281de6942216>, <http://data.einnsyn.no/brukermeta/lagretSak>, ?pxtthzsd) [#2]
      │  │  │     `─ Scan[PSOC](?pxtthzsd, <http://data.einnsyn.no/brukermeta/sak>, ?sak) [#5]
      │  │  `─ Scan[PSOC](?sak, <http://www.arkivverket.no/standarder/noark5/arkivstruktur/saksaar>, ?saksaar) [#8.9K]
      │  `─ Scan[PSOC](?sak, <http://www.arkivverket.no/standarder/noark5/arkivstruktur/offentligTittel_SENSITIV>, ?offentligTittel_SENSITIV) [#24K]
      `─ MergeJoin(?sak) [#2]
         +─ MergeJoin(?sak) [#2]
         │  +─ Bind(REPLACE(?hdzqvacu, "\\/.*", "") AS ?saksaar) [#2]
         │  │  `─ MergeJoin(?sak) [#2]
         │  │     +─ MergeJoin(?sak) [#2]
         │  │     │  +─ Sort(?sak) [#2]
         │  │     │  │  `─ MergeJoin(?pxtthzsd) [#2]
         │  │     │  │     +─ Scan[SPOC](<http://data.einnsyn.no/bruker/39372e59-d8f9-47cb-a04d-281de6942216>, <http://data.einnsyn.no/brukermeta/lagretSak>, ?pxtthzsd) [#2]
         │  │     │  │     `─ Scan[PSOC](?pxtthzsd, <http://data.einnsyn.no/brukermeta/sak>, ?sak) [#5]
         │  │     │  `─ Scan[PSOC](?sak, <http://www.arkivverket.no/standarder/noark5/arkivstruktur/mappeID>, ?hdzqvacu) [#9.1K]
         │  │     `─ Scan[PSOC](?sak, <http://www.arkivverket.no/standarder/noark5/arkivstruktur/offentligTittel_SENSITIV>, ?offentligTittel_SENSITIV) [#24K]
         │  `─ Sort(?sak) [#2]
         │     `─ MergeJoin(?pxtthzsd) [#2]
         │        +─ Scan[SPOC](<http://data.einnsyn.no/bruker/39372e59-d8f9-47cb-a04d-281de6942216>, <http://data.einnsyn.no/brukermeta/lagretSak>, ?pxtthzsd) [#2]
         │        `─ Scan[PSOC](?pxtthzsd, <http://data.einnsyn.no/brukermeta/sak>, ?sak) [#5]
         `─ MergeJoin(?sak) [#9.1K]
            +─ Scan[PSOC](?sak, <http://www.arkivverket.no/standarder/noark5/arkivstruktur/offentligTittel_SENSITIV>, ?offentligTittel_SENSITIV) [#24K]
            `─ Scan[PSOC](?sak, <http://www.arkivverket.no/standarder/noark5/arkivstruktur/mappeID>, ?hdzqvacu) [#9.1K]


Culprit seems to be the last merge join. I’ve compared this plan to the one created by inlining the sparql rule. And the biggest difference is the last merge join which merges all data in the database with two triples. Which for our production database will be in the order of 10 million.

If I remove the ?sak arkiv:offentligTittel_SENSITIV ?offentligTittel_SENSITIV. query statement the query runs considerably faster. Here is the plan:

From all
Distinct [#4]
`─ Projection(?sak, ?saksaar) [#4]
   `─ Union [#4]
      +─ MergeJoin(?sak) [#2]
      │  +─ Sort(?sak) [#2]
      │  │  `─ MergeJoin(?vdacbbip) [#2]
      │  │     +─ Scan[SPOC](<http://data.einnsyn.no/bruker/39372e59-d8f9-47cb-a04d-281de6942216>, <http://data.einnsyn.no/brukermeta/lagretSak>, ?vdacbbip) [#2]
      │  │     `─ Scan[PSOC](?vdacbbip, <http://data.einnsyn.no/brukermeta/sak>, ?sak) [#5]
      │  `─ Scan[PSOC](?sak, <http://www.arkivverket.no/standarder/noark5/arkivstruktur/saksaar>, ?saksaar) [#8.9K]
      `─ MergeJoin(?sak) [#2]
         +─ MergeJoin(?sak) [#2]
         │  +─ Bind(REPLACE(?hqjxxplb, "\\/.*", "") AS ?saksaar) [#2]
         │  │  `─ MergeJoin(?sak) [#2]
         │  │     +─ Sort(?sak) [#2]
         │  │     │  `─ MergeJoin(?vdacbbip) [#2]
         │  │     │     +─ Scan[SPOC](<http://data.einnsyn.no/bruker/39372e59-d8f9-47cb-a04d-281de6942216>, <http://data.einnsyn.no/brukermeta/lagretSak>, ?vdacbbip) [#2]
         │  │     │     `─ Scan[PSOC](?vdacbbip, <http://data.einnsyn.no/brukermeta/sak>, ?sak) [#5]
         │  │     `─ Scan[PSOC](?sak, <http://www.arkivverket.no/standarder/noark5/arkivstruktur/mappeID>, ?hqjxxplb) [#9.1K]
         │  `─ Sort(?sak) [#2]
         │     `─ MergeJoin(?vdacbbip) [#2]
         │        +─ Scan[SPOC](<http://data.einnsyn.no/bruker/39372e59-d8f9-47cb-a04d-281de6942216>, <http://data.einnsyn.no/brukermeta/lagretSak>, ?vdacbbip) [#2]
         │        `─ Scan[PSOC](?vdacbbip, <http://data.einnsyn.no/brukermeta/sak>, ?sak) [#5]
         `─ Scan[PSOC](?sak, <http://www.arkivverket.no/standarder/noark5/arkivstruktur/mappeID>, ?hqjxxplb) [#9.1K]

Hi Håvard,
This is a known issue when using BIND() in rules. The ticket number is #3046. It will be fixed in a future version of Stardog.
Jess

Well that’s bad. Maybe consider writing that in the documentation and removing the example rule with bind.

Can you give me a time frame for a fix? Theres not that much time left in the project, so I would like to know if I have to drop rules all together and rewrite all my sparql queries instead.

I think I’ll just add a new data transform to my pipeline to enrich the data instead of relying on query time computation.