ICV slow when graph without ICV has 30 million triples

Hi,

I’ve split all my data into graphs, one main graph with 30 million triples, one for the ontology and one for our user data.

I’ve enabled ICV on the ontology graph and on the user data graph.

Insert queries take around 20 seconds for creating a user in the user data graph :frowning:

When I drop the main graph (the one with 30 million triples) then inserting a user takes 500 ms :slight_smile:

Is this performance abnormal? Can I somehow debug what the ICV engine is doing to find out why the performance drops when I add data to a non-ICV graph?

Cheers,
Håvard M. Ottestad

Bump.

Community requires 20 caharcter min :stuck_out_tongue:

Hi,

Can you share with us your ICV configuration, and also how many constraints you’re validating?

Can’t upload them here, email them to you.

Thanks, I got them. Can I also get an example of an insert query for the user data graph?

Håvard,

On top of the insert query example, can I also get the ICV configuration you’re using? This includes your stardog.properties file and the output of stardog-admin metadata get myDb.

Thanks!

We haven’t modified out stardog.properties file. It’s just the default 200 lines of commented examples.

Here is the command we use for creating the database:

  ./stardog-admin db create -n testSesame \
    -o security.named.graphs=true  \
    icv.enabled=true \
    icv.reasoning.enabled=true \
    icv.active.graphs=http://data.einnsyn.no/osloKommuneVirksomheterGraph,http://data.einnsyn.no/virksomheterGraph,http://data.einnsyn.no/brukereGraph,http://data.einnsyn.no/innsynskravGraph,http://www.arkivverket.no/standarder/noark5/arkivstruktur/ontologyGraph   \
    preserve.bnode.ids=true \
    reasoning.consistency.automatic=true \
    query.all.graphs=true \
    reasoning.type=SL \
    reasoning.schema.graphs=http://www.arkivverket.no/standarder/noark5/arkivstruktur/ontologyGraph \
    --

Our ontology: https://gist.github.com/hmottestad/8af129b79e59d3e69dcbd1150d205b01

bash-4.3# ./stardog-admin metadata get testSesame
+-------------------------------------------+----------------------------------------------------------------------------------+
|                  Option                   |                                      Value                                       |
+-------------------------------------------+----------------------------------------------------------------------------------+
| database.archetypes                       |                                                                                  |
| database.connection.timeout               | 1h                                                                               |
| database.creator                          | admin                                                                            |
| database.name                             | testSesame                                                                       |
| database.namespaces                       | =http://api.stardog.com/,                                                        |
|                                           | arkiv=http://www.arkivverket.no/standarder/noark5/arkivstruktur/,                |
|                                           | bruker=http://data.einnsyn.no/brukermeta/, owl=http://www.w3.org/2002/07/owl#,   |
|                                           | rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#,                                 |
|                                           | rdfs=http://www.w3.org/2000/01/rdf-schema#, stardog=tag:stardog:api:,            |
|                                           | virksomhet=http://data.einnsyn.no/virksomhetmeta/,                               |
|                                           | xsd=http://www.w3.org/2001/XMLSchema#                                            |
| database.online                           | true                                                                             |
| database.time.creation                    | 2017-04-20T13:41:03.396+02:00                                                    |
| database.time.modification                | 2017-04-20T13:52:00.404+02:00                                                    |
| docs.default.rdf.extractors               | tika                                                                             |
| docs.default.text.extractors              | tika                                                                             |
| docs.filesystem.uri                       | file:///                                                                         |
| docs.path                                 | docs                                                                             |
| icv.active.graphs                         | http://data.einnsyn.no/osloKommuneVirksomheterGraph,                             |
|                                           | http://data.einnsyn.no/virksomheterGraph, http://data.einnsyn.no/brukereGraph,   |
|                                           | http://data.einnsyn.no/innsynskravGraph,                                         |
|                                           | http://www.arkivverket.no/standarder/noark5/arkivstruktur/ontologyGraph          |
| icv.consistency.automatic                 | false                                                                            |
| icv.enabled                               | true                                                                             |
| icv.reasoning.enabled                     | true                                                                             |
| index.differential.enable.limit           | 500000                                                                           |
| index.differential.merge.limit            | 20000                                                                            |
| index.differential.size                   | 0                                                                                |
| index.disk.page.count.total               | 161                                                                              |
| index.disk.page.count.used                | 60                                                                               |
| index.disk.page.fill.ratio                | 0.5297159830729167                                                               |
| index.last.tx                             | 164f70d9-ead2-47d9-ad6f-e92717d44c9f                                             |
| index.literals.canonical                  | true                                                                             |
| index.named.graphs                        | true                                                                             |
| index.persist                             | true                                                                             |
| index.persist.sync                        | true                                                                             |
| index.size                                | 4726                                                                             |
| index.statistics.update.automatic         | true                                                                             |
| index.type                                | Disk                                                                             |
| preserve.bnode.ids                        | true                                                                             |
| progress.monitor.enabled                  | true                                                                             |
| query.all.graphs                          | true                                                                             |
| query.plan.reuse                          | ALWAYS                                                                           |
| query.timeout                             | 5m                                                                               |
| reasoning.approximate                     | false                                                                            |
| reasoning.classify.eager                  | true                                                                             |
| reasoning.consistency.automatic           | true                                                                             |
| reasoning.punning.enabled                 | false                                                                            |
| reasoning.sameas                          | OFF                                                                              |
| reasoning.schema.graphs                   | http://www.arkivverket.no/standarder/noark5/arkivstruktur/ontologyGraph          |
| reasoning.schema.timeout                  | 1m                                                                               |
| reasoning.type                            | SL                                                                               |
| reasoning.virtual.graph.enabled           | true                                                                             |
| search.default.limit                      | 100                                                                              |
| search.enabled                            | false                                                                            |
| search.index.datatypes                    | http://www.w3.org/2001/XMLSchema#string,                                         |
|                                           | http://www.w3.org/1999/02/22-rdf-syntax-ns#langString                            |
| search.reindex.mode                       | sync                                                                             |
| search.wildcard.search.enabled            | false                                                                            |
| security.named.graphs                     | true                                                                             |
| spatial.enabled                           | false                                                                            |
| spatial.index.version                     | 1                                                                                |
| spatial.precision                         | 11                                                                               |
| strict.parsing                            | true                                                                             |
| transaction.isolation                     | SNAPSHOT                                                                         |
| transaction.logging                       | false                                                                            |
| transaction.logging.ignore.startup.errors | true                                                                             |
| transaction.logging.rotation.remove       | true                                                                             |
| transaction.logging.rotation.size         | 524288000                                                                        |
| versioning.directory                      | versioning                                                                       |
| versioning.enabled                        | false                                                                            |
+-------------------------------------------+----------------------------------------------------------------------------------+

Insert query

insert data {

  graph <http://data.einnsyn.no/brukereGraph> {
  	<http://example.com/user1> a bruker:Sluttbruker;
                                 bruker:passord "password";
                                  bruker:epost "test@example.com";
                                 bruker:brukernavn "username1".
  
  }

}

My current db size is: 4730

Insert query with ICV takes ~500 ms. Disable ICV and it takes ~20 ms.

I've now added a lot of data:

And the insert query takes ~ 20 seconds to complete with ICV. Disable ICV and it's back down to ~ 20 ms.

Some stats from http://newGraph.com/

Instances of class:

Predicate usage:

Håvard,

Two questions:

  1. What are your ICV constraints stardog icv export? Or are you validating axioms from the ontology?
  2. Do you have a generator of some kind for that newgraph data, or is it just a big dataset you have?

icv export:

https://gist.github.com/hmottestad/406bc64456f939ebe8fefcf79e8083b7

The data in newGraph is real data from the client. It's 1/17th the amount of data we expect in production. And we also expect a substantial growth in the years to come.

I've tried adding just about anything to the newGraph and it still slows down the ICV validation. I downloaded and added this file http://rdf.geospecies.org/geospecies.rdf.gz and the insert query goes from 500ms to 600-800 ms.

Hi Håvard,

I seem unable to reproduce your issue in Stardog 4.2.4, though granted the data I’m using probably isn’t hitting ICV the same way yours is. I inserted a user into the <http://data.einnsyn.no/brukermeta/> graph on the testSesame db as created by your pasted db create command. I then loaded the geospecies.rdf.gz data multiple times into separate graphs. I loaded it once, then 5 more times, then 15 more times, and after each load I was able to insert one more user into the <http://data.einnsyn.no/brukermeta/> graph in about the same amount of time.

Are you on Stardog 4.2.4? If you try setting reasoning.consistency.automatic or icv.reasoning.enabled to false, do you see any difference?

http://data.einnsyn.no/brukermeta/ or http://data.einnsyn.no/brukereGraph graph?

Btw.

Here are the timings for me:

1x geospecies

5x geospecies

Turning off automatic consistency doesn’t seem to do anything with the performance on this little data, but it does make it faster when I have 10x the amount.

Turning off icv reasoning though doesn’t work, cause there is data that is dependent on reasoning being enabled.

I know there is some data that you don’t have, I’ll just simply make a dump of the database for you.