ICV slow when graph without ICV has 30 million triples

hmottestad · April 27, 2017, 4:58pm

Thanks for letting me know Stephen.

hmottestad · May 9, 2017, 11:29am

I’ve been doing some digging myself and found out that the general problem seems to be qualifiedCardinality restrictions together with reasoning. I’ve rewritten some of them to be unqualified, which helps, but the restrictions that use datatype properties this isn’t possible.

For the following axiom: SubClassOf(virksomhetmeta:Enhet,cardinality(virksomhetmeta:opprettetDato,1,rdfs:Literal))

Stardog creates this query:

SELECT DISTINCT *
WHERE {
   ?x0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://data.einnsyn.no/virksomhetmeta/Enhet> .
   {
      ?x0 <http://data.einnsyn.no/virksomhetmeta/opprettetDato> ?dx28 .
      FILTER (isLiteral(?dx28))
      ?x0 <http://data.einnsyn.no/virksomhetmeta/opprettetDato> ?dx29 .
      FILTER (isLiteral(?dx29))
      FILTER (?dx28 != ?dx29)
   }
   UNION
   {
      FILTER NOT EXISTS {
         ?x0 <http://data.einnsyn.no/virksomhetmeta/opprettetDato> ?dx30 .
         FILTER (isLiteral(?dx30))
      }
      ?x0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
   }
}

Which runs fine without reasoning, but with reasoning it takes close to a minute on our test server.

Query plan without reasoning

From all
Distinct [#65]
`─ Projection(?x0, ?dx28, ?dx29, ?dx30) [#65]
   `─ MergeJoin(?x0) [#65]
      +─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Enhet>) [#2]
      `─ Union [#210]
         +─ Filter(((IsLiteral(?dx28) && IsLiteral(?dx29)) && ?dx28 != ?dx29)) [#194]
         │  `─ MergeJoin(?x0) [#388]
         │     +─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx28) [#388]
         │     `─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx29) [#388]
         `─ Filter(!(Bound(?dx30))) [#16]
            `─ MergeJoinOuter(?x0) [#32]
               +─ Scan[POSC](?x0, rdf:type, owl:Thing) [#0]
               `─ Filter(IsLiteral(?dx30)) [#194]
                  `─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx30) [#388]

Query plan with reasoning

From all
Distinct [#1.3K]
`─ Projection(?x0, ?dx28, ?dx29, ?dx30) [#1.3K]
   `─ HashJoin(?x0) [#1.3K]
      +─ Union [#4.1M]
      │  +─ Filter(((IsLiteral(?dx28) && IsLiteral(?dx29)) && ?dx28 != ?dx29)) [#194]
      │  │  `─ MergeJoin(?x0) [#388]
      │  │     +─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx28) [#388]
      │  │     `─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx29) [#388]
      │  `─ Filter(!(Bound(?dx30))) [#4.1M]
      │     `─ HashJoinOuter(?x0) [#8.2M]
      │        +─ Top(?x0)
      │        `─ Filter(IsLiteral(?dx30)) [#194]
      │           `─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx30) [#388]
      `─ Union [#390]
         +─ Union [#219]
         │  +─ Union [#214]
         │  │  +─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/AdministrativEnhet>) [#199]
         │  │  `─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Bydel>) [#15]
         │  `─ Union [#5]
         │     +─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/DummyEnhet>) [#3]
         │     `─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Enhet>) [#2]
         `─ Union [#171]
            +─ Union [#3]
            │  +─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Kommune>) [#1]
            │  `─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Organ>) [#2]
            `─ Union [#168]
               +─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Utvalg>) [#37]
               `─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Virksomhet>) [#131]

stephen · May 10, 2017, 11:54am

Hi,

I had noticed that there was a bit of a speedup when I changed the qualifiedCardinality to unqualified, but thought that to be an insufficient answer, as I shouldn’t have to tell you to change your ontology. We’re able to see the query plans generated by ICV and can hopefully use the difference between qualified and unqualified to figure out what’s happening.

I apologize again for the delay in answering this question. With 5.0-beta out the door we should hopefully be able to look into it shortly.

hmottestad · May 10, 2017, 2:25pm

Also found out that changing my “exactly 1” to “(min 1) and (max 1)” made a huge huge difference.

hmottestad · May 15, 2017, 10:30am

After rewriting my ICV constraints to not use qualified restrictions or “exactly n” restrictions we are now managing about 1 transaction every 2-3 seconds with 44 635 763 triples in our database. Even for transactions that only do read (no write).

It is still very strange why uploading data to the default graph or other graph seems to be triggering ICV validation when we have not listed those graphs in the ICV named graphs list.

Do all the ICV queries run at the end of every transaction for all the data in the database regardless of if the transaction contains updates or if those updates are to graphs in the ICV named graphs list or not?

system · May 29, 2017, 10:32am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ICV slow when graph without ICV has 30 million triples - Part II Support	4	491	June 16, 2017
Per graph ICV rules - with minimal overhead Feature Request	1	591	November 18, 2017
Stardog Explorer response times Stardog Explorer	1	262	July 31, 2023
How ICV impact performance? Support	2	362	August 7, 2020
Scalability issues in Stardog Cloud Support	17	434	May 31, 2025

ICV slow when graph without ICV has 30 million triples

Query plan without reasoning

Query plan with reasoning

Related topics