ICV slow when graph without ICV has 30 million triples

Thanks for letting me know Stephen.

I’ve been doing some digging myself and found out that the general problem seems to be qualifiedCardinality restrictions together with reasoning. I’ve rewritten some of them to be unqualified, which helps, but the restrictions that use datatype properties this isn’t possible.

For the following axiom: SubClassOf(virksomhetmeta:Enhet,cardinality(virksomhetmeta:opprettetDato,1,rdfs:Literal))

Stardog creates this query:

SELECT DISTINCT *
WHERE {
   ?x0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://data.einnsyn.no/virksomhetmeta/Enhet> .
   {
      ?x0 <http://data.einnsyn.no/virksomhetmeta/opprettetDato> ?dx28 .
      FILTER (isLiteral(?dx28))
      ?x0 <http://data.einnsyn.no/virksomhetmeta/opprettetDato> ?dx29 .
      FILTER (isLiteral(?dx29))
      FILTER (?dx28 != ?dx29)
   }
   UNION
   {
      FILTER NOT EXISTS {
         ?x0 <http://data.einnsyn.no/virksomhetmeta/opprettetDato> ?dx30 .
         FILTER (isLiteral(?dx30))
      }
      ?x0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
   }
}

Which runs fine without reasoning, but with reasoning it takes close to a minute on our test server.

Query plan without reasoning

From all
Distinct [#65]
`─ Projection(?x0, ?dx28, ?dx29, ?dx30) [#65]
   `─ MergeJoin(?x0) [#65]
      +─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Enhet>) [#2]
      `─ Union [#210]
         +─ Filter(((IsLiteral(?dx28) && IsLiteral(?dx29)) && ?dx28 != ?dx29)) [#194]
         │  `─ MergeJoin(?x0) [#388]
         │     +─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx28) [#388]
         │     `─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx29) [#388]
         `─ Filter(!(Bound(?dx30))) [#16]
            `─ MergeJoinOuter(?x0) [#32]
               +─ Scan[POSC](?x0, rdf:type, owl:Thing) [#0]
               `─ Filter(IsLiteral(?dx30)) [#194]
                  `─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx30) [#388]

Query plan with reasoning

From all
Distinct [#1.3K]
`─ Projection(?x0, ?dx28, ?dx29, ?dx30) [#1.3K]
   `─ HashJoin(?x0) [#1.3K]
      +─ Union [#4.1M]
      │  +─ Filter(((IsLiteral(?dx28) && IsLiteral(?dx29)) && ?dx28 != ?dx29)) [#194]
      │  │  `─ MergeJoin(?x0) [#388]
      │  │     +─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx28) [#388]
      │  │     `─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx29) [#388]
      │  `─ Filter(!(Bound(?dx30))) [#4.1M]
      │     `─ HashJoinOuter(?x0) [#8.2M]
      │        +─ Top(?x0)
      │        `─ Filter(IsLiteral(?dx30)) [#194]
      │           `─ Scan[PSOC](?x0, <http://data.einnsyn.no/virksomhetmeta/opprettetDato>, ?dx30) [#388]
      `─ Union [#390]
         +─ Union [#219]
         │  +─ Union [#214]
         │  │  +─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/AdministrativEnhet>) [#199]
         │  │  `─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Bydel>) [#15]
         │  `─ Union [#5]
         │     +─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/DummyEnhet>) [#3]
         │     `─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Enhet>) [#2]
         `─ Union [#171]
            +─ Union [#3]
            │  +─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Kommune>) [#1]
            │  `─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Organ>) [#2]
            `─ Union [#168]
               +─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Utvalg>) [#37]
               `─ Scan[POSC](?x0, rdf:type, <http://data.einnsyn.no/virksomhetmeta/Virksomhet>) [#131]


Hi,

I had noticed that there was a bit of a speedup when I changed the qualifiedCardinality to unqualified, but thought that to be an insufficient answer, as I shouldn’t have to tell you to change your ontology. We’re able to see the query plans generated by ICV and can hopefully use the difference between qualified and unqualified to figure out what’s happening.

I apologize again for the delay in answering this question. With 5.0-beta out the door we should hopefully be able to look into it shortly.

Also found out that changing my “exactly 1” to “(min 1) and (max 1)” made a huge huge difference.

After rewriting my ICV constraints to not use qualified restrictions or “exactly n” restrictions we are now managing about 1 transaction every 2-3 seconds with 44 635 763 triples in our database. Even for transactions that only do read (no write).

It is still very strange why uploading data to the default graph or other graph seems to be triggering ICV validation when we have not listed those graphs in the ICV named graphs list.

Do all the ICV queries run at the end of every transaction for all the data in the database regardless of if the transaction contains updates or if those updates are to graphs in the ICV named graphs list or not?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.