Stardog add is too slow (loading split files)

I am trying to load ~2 billion triples using stardog. The compressed file size is 19GB.

Here are some things I tried:

  1. I'm in a rural area with a somewhat unreliable power supply. I can't expect to have continuous power for many consecutive days, so I split the 19GB file with Stardog into ~150 files of 100mb (resulting in about 15GB. I tried it many times with different settings just to make sure and it never sums up close to 19GB).
  2. I've set memory.mode = bulk_load, strict.parsing=false in the properties file.
  3. Exported STARDOG_SERVER_JAVA_ARGS="-Xmx6g -Xms6g -XX:MaxDirectMemorySize=10g".

But adding speed is getting slow to a point I don't have any more hopes of finishing this :P. I loaded the first file in less than 15 minutes, then, after a while, it got to 30 minutes, maybe one hour. Today it is taking 2+ hours. Creating a new database from scratch is way faster, but as I said I can't do that. What are my options here? Or is there any other tool more appropriate for this (e.g. Jena)?

I'm adding to an existing database with:
stardog data add -f NTRIPLES --compression gzip database_name wdump-xxx.nt.gz
and creating a new database with:
stardog-admin db create -n database_name wdump-xxx.nt.gz

While Stardog should be able to handle 2b triples it's a non-trivial mount of data and your memory allocations are much lower than the recommended settings. See the docs under the capacity planning section

My best recommendation would be to invest in a UPS to address your more immediate problem of having an unreliable power supply.

Have you tried to turn off the automatic statistics calculation? That should help also.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.