I am trying to load ~2 billion triples using stardog. The compressed file size is 19GB.
Here are some things I tried:
- I'm in a rural area with a somewhat unreliable power supply. I can't expect to have continuous power for many consecutive days, so I split the 19GB file with Stardog into ~150 files of 100mb (resulting in about 15GB. I tried it many times with different settings just to make sure and it never sums up close to 19GB).
- I've set
memory.mode = bulk_load
,strict.parsing=false
in the properties file. - Exported
STARDOG_SERVER_JAVA_ARGS="-Xmx6g -Xms6g -XX:MaxDirectMemorySize=10g"
.
But adding speed is getting slow to a point I don't have any more hopes of finishing this :P. I loaded the first file in less than 15 minutes, then, after a while, it got to 30 minutes, maybe one hour. Today it is taking 2+ hours. Creating a new database from scratch is way faster, but as I said I can't do that. What are my options here? Or is there any other tool more appropriate for this (e.g. Jena)?
I'm adding to an existing database with:
stardog data add -f NTRIPLES --compression gzip database_name wdump-xxx.nt.gz
and creating a new database with:
stardog-admin db create -n database_name wdump-xxx.nt.gz