Stardog add is too slow (loading split files)

AtilioA · October 2, 2020, 2:48pm

I am trying to load ~2 billion triples using stardog. The compressed file size is 19GB.

Here are some things I tried:

I'm in a rural area with a somewhat unreliable power supply. I can't expect to have continuous power for many consecutive days, so I split the 19GB file with Stardog into ~150 files of 100mb (resulting in about 15GB. I tried it many times with different settings just to make sure and it never sums up close to 19GB).
I've set memory.mode = bulk_load, strict.parsing=false in the properties file.
Exported STARDOG_SERVER_JAVA_ARGS="-Xmx6g -Xms6g -XX:MaxDirectMemorySize=10g".

But adding speed is getting slow to a point I don't have any more hopes of finishing this :P. I loaded the first file in less than 15 minutes, then, after a while, it got to 30 minutes, maybe one hour. Today it is taking 2+ hours. Creating a new database from scratch is way faster, but as I said I can't do that. What are my options here? Or is there any other tool more appropriate for this (e.g. Jena)?

I'm adding to an existing database with:
stardog data add -f NTRIPLES --compression gzip database_name wdump-xxx.nt.gz
and creating a new database with:
stardog-admin db create -n database_name wdump-xxx.nt.gz

zachary.whitley · October 8, 2020, 4:45pm

While Stardog should be able to handle 2b triples it's a non-trivial mount of data and your memory allocations are much lower than the recommended settings. See the docs under the capacity planning section

My best recommendation would be to invest in a UPS to address your more immediate problem of having an unreliable power supply.

dlbis · October 9, 2020, 7:55pm

Have you tried to turn off the automatic statistics calculation? That should help also.

system · October 23, 2020, 7:56pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Setting for bulk load Support	5	1144	August 23, 2018
Slow data loading even in bulk load settings Support	18	888	October 22, 2019
Creating a database with .gz or .zip does not load any/all the triples Bug	5	861	July 24, 2017
Loading Yago into Stardog Community Support	2	670	June 11, 2017
Load data files in parallel Bug	2	494	April 1, 2021

Stardog add is too slow (loading split files)

Related topics