ICV slow when graph without ICV has 30 million triples

stephen · April 21, 2017, 7:05pm

Ah sorry, I was looking at the prefix declaration, not the graph. It's <http://data.einnsyn.no/brukereGraph>

stephen · April 21, 2017, 7:06pm

Each of those queries are individual inserts into <http://data.einnsyn.no/brukereGraph>, or inserts of multiple users?

hmottestad · April 21, 2017, 7:07pm

Inserting the same user. Exact same query running multiple times.

hmottestad · April 21, 2017, 7:08pm

Before I can give you the rest of the data I need to clean up a lot of my literals.

hmottestad · April 21, 2017, 7:25pm

Hi Stephen,

Thanks for your patience

Here is a dump (obscured) of the entire database: temp/export (6).trig at master · hmottestad/temp · GitHub

If you add the geospecies file multiple times in separate graphs it should slow down insertions.

hmottestad · April 21, 2017, 7:28pm

I think the big difference here might be that I have a graph with all the organisations (metadata) delivering data to the solution, and all those are validated by ICV rules since admins from each org can edit the data online. So a lot more data here that the ICV engine has to handle.

stephen · April 24, 2017, 11:33am

Just to confirm, is this happening with Stardog 4.2.4?

hmottestad · April 24, 2017, 2:56pm

Looks like we might still be on 4.2.3. Are you not able to reproduce it in
4.2.4?

stephen · April 24, 2017, 3:07pm

No, unless I’m doing something wrong here:

# Create DB with your trig file and options
Stephens-iMac:community stephen$ stardog-admin db create -n testSesame     -o security.named.graphs=true      icv.enabled=true     icv.reasoning.enabled=true     icv.active.graphs=http://data.einnsyn.no/osloKommuneVirksomheterGraph,http://data.einnsyn.no/virksomheterGraph,http://data.einnsyn.no/brukereGraph,http://data.einnsyn.no/innsynskravGraph,http://www.arkivverket.no/standarder/noark5/arkivstruktur/ontologyGraph       preserve.bnode.ids=true     reasoning.consistency.automatic=true     query.all.graphs=true     reasoning.type=SL     reasoning.schema.graphs=http://www.arkivverket.no/standarder/noark5/arkivstruktur/ontologyGraph     -- havard.trig
Bulk loading data to new database testSesame.
Loaded 4,142 triples to testSesame from 1 file(s) in 00:00:00.266 @ 15.6K triples/sec.
Successfully created database 'testSesame'.

# Insert one user into the user graph
Stephens-iMac:community stephen$ time sq testSesame havardInsert.rq
Update query processed successfully in 00:00:00.087.

real	0m1.789s
user	0m3.451s
sys	0m0.260s

# Shell script that adds geospecies data 10 times
Stephens-iMac:community stephen$ ./havard.sh
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:11.618
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:12.404
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:13.574
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:13.485
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:14.008
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:14.264
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:17.401
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:22.053
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:25.347
Adding data from file: geospecies.rdf.gz
Added 2,201,532 triples in 00:00:28.208

# Same insert query
Stephens-iMac:community stephen$ time sq testSesame havardInsert.rq
Update query processed successfully in 00:00:00.097.

real	0m1.784s
user	0m3.195s
sys	0m0.265s

If I’m doing something horribly wrong, please let me know. However the second insert takes essentially the same amount of time as the first one.

hmottestad · April 24, 2017, 3:31pm

Could I get the scripts?

stephen · April 24, 2017, 3:33pm

Sure, they’re nothing fancy:

Stephens-iMac:community stephen$ cat havard.sh
#!/bin/bash

for i in `seq 10`; do
	 stardog data add --named-graph http://www.newgraph.org/$i/ testSesame geospecies.rdf.gz
done

hmottestad · April 24, 2017, 3:41pm

We are using enterprise trial license, would that make a difference?

hmottestad · April 24, 2017, 3:44pm

With that code, I don’t see you actually adding any of the ICV constraints.

hmottestad · April 24, 2017, 3:45pm

If you run these two queries, then the second one should fail.


insert data {

  graph <http://data.einnsyn.no/brukereGraph> {
  	<http://example.com/user1> a bruker:Sluttbruker;
                                 bruker:passord "password";
                                  bruker:epost "test@example.com";
                                 bruker:brukernavn "username1".
  
  }

}

Second one

insert data {

  graph <http://data.einnsyn.no/brukereGraph> {
  	<http://example.com/user2> a bruker:Sluttbruker;
                                 bruker:passord "password";
                                  bruker:epost "test@example.com";
                                 bruker:brukernavn "username1".
  
  }

}

hmottestad · April 24, 2017, 3:49pm

We have the following in our script to add the ICV constraints:

  ./stardog-admin icv drop testSesame
  ./stardog-admin icv add testSesame /tmp/virksomhet_constraints.ttl
  ./stardog-admin icv add testSesame /tmp/user_constraints.ttl

stephen · April 24, 2017, 3:49pm

I suppose that could have something to do with it.

Yeah, let me try again with the constraints. Sorry about that!

hmottestad · April 24, 2017, 3:50pm

I put them here in case you can't find the email: GitHub - hmottestad/temp

stephen · April 24, 2017, 6:27pm

Yes, now that I’m trying the correct way, I can reproduce this issue in 4.2.4. I’ll look into what’s going on. Thanks!

hmottestad · April 24, 2017, 6:28pm

Great to know. Hope you can figure something out for us.

stephen · April 27, 2017, 2:10pm

Hi Håvard, I’ve got nothing solid to report as of yet, but I just wanted to pop in and assure you that we’re still looking into this issue!

Topic		Replies	Views
ICV slow when graph without ICV has 30 million triples - Part II Support	4	491	June 16, 2017
Per graph ICV rules - with minimal overhead Feature Request	1	590	November 18, 2017
Stardog Explorer response times Stardog Explorer	1	261	July 31, 2023
How ICV impact performance? Support	2	362	August 7, 2020
Scalability issues in Stardog Cloud Support	17	433	May 31, 2025

ICV slow when graph without ICV has 30 million triples

Related topics