ICV slow when graph without ICV has 30 million triples

Ah sorry, I was looking at the prefix declaration, not the graph. It's <http://data.einnsyn.no/brukereGraph>

Each of those queries are individual inserts into <http://data.einnsyn.no/brukereGraph>, or inserts of multiple users?

Inserting the same user. Exact same query running multiple times.

Before I can give you the rest of the data I need to clean up a lot of my literals.

Hi Stephen,

Thanks for your patience :slight_smile:

Here is a dump (obscured) of the entire database: temp/export (6).trig at master · hmottestad/temp · GitHub

If you add the geospecies file multiple times in separate graphs it should slow down insertions.

I think the big difference here might be that I have a graph with all the organisations (metadata) delivering data to the solution, and all those are validated by ICV rules since admins from each org can edit the data online. So a lot more data here that the ICV engine has to handle.

Just to confirm, is this happening with Stardog 4.2.4?

Looks like we might still be on 4.2.3. Are you not able to reproduce it in
4.2.4?

No, unless I’m doing something wrong here:

# Create DB with your trig file and options
Stephens-iMac:community stephen$ stardog-admin db create -n testSesame     -o security.named.graphs=true      icv.enabled=true     icv.reasoning.enabled=true     icv.active.graphs=http://data.einnsyn.no/osloKommuneVirksomheterGraph,http://data.einnsyn.no/virksomheterGraph,http://data.einnsyn.no/brukereGraph,http://data.einnsyn.no/innsynskravGraph,http://www.arkivverket.no/standarder/noark5/arkivstruktur/ontologyGraph       preserve.bnode.ids=true     reasoning.consistency.automatic=true     query.all.graphs=true     reasoning.type=SL     reasoning.schema.graphs=http://www.arkivverket.no/standarder/noark5/arkivstruktur/ontologyGraph     -- havard.trig
Bulk loading data to new database testSesame.
Loaded 4,142 triples to testSesame from 1 file(s) in 00:00:00.266 @ 15.6K triples/sec.
Successfully created database 'testSesame'.

# Insert one user into the user graph
Stephens-iMac:community stephen$ time sq testSesame havardInsert.rq
Update query processed successfully in 00:00:00.087.

real	0m1.789s
user	0m3.451s
sys	0m0.260s

# Shell script that adds geospecies data 10 times
Stephens-iMac:community stephen$ ./havard.sh
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:11.618
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:12.404
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:13.574
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:13.485
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:14.008
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:14.264
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:17.401
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:22.053
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:25.347
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:28.208

# Same insert query
Stephens-iMac:community stephen$ time sq testSesame havardInsert.rq
Update query processed successfully in 00:00:00.097.

real	0m1.784s
user	0m3.195s
sys	0m0.265s

If I’m doing something horribly wrong, please let me know. However the second insert takes essentially the same amount of time as the first one.

Could I get the scripts?

Sure, they’re nothing fancy:

Stephens-iMac:community stephen$ cat havard.sh
#!/bin/bash

for i in `seq 10`; do
	 stardog data add --named-graph http://www.newgraph.org/$i/ testSesame geospecies.rdf.gz
done

We are using enterprise trial license, would that make a difference?

With that code, I don’t see you actually adding any of the ICV constraints.

If you run these two queries, then the second one should fail.


insert data {

  graph <http://data.einnsyn.no/brukereGraph> {
  	<http://example.com/user1> a bruker:Sluttbruker;
                                 bruker:passord "password";
                                  bruker:epost "test@example.com";
                                 bruker:brukernavn "username1".
  
  }

}

Second one

insert data {

  graph <http://data.einnsyn.no/brukereGraph> {
  	<http://example.com/user2> a bruker:Sluttbruker;
                                 bruker:passord "password";
                                  bruker:epost "test@example.com";
                                 bruker:brukernavn "username1".
  
  }

}

We have the following in our script to add the ICV constraints:

  ./stardog-admin icv drop testSesame
  ./stardog-admin icv add testSesame /tmp/virksomhet_constraints.ttl
  ./stardog-admin icv add testSesame /tmp/user_constraints.ttl

I suppose that could have something to do with it. :flushed:

Yeah, let me try again with the constraints. Sorry about that!

I put them here in case you can't find the email: GitHub - hmottestad/temp

Yes, now that I’m trying the correct way, I can reproduce this issue in 4.2.4. I’ll look into what’s going on. Thanks!

Great to know. Hope you can figure something out for us.

Hi Håvard, I’ve got nothing solid to report as of yet, but I just wanted to pop in and assure you that we’re still looking into this issue!