Prevent deteriorating load speed and swapping during bulk load

Hi,

I'm trying to load PubChem 'core' (~7B triples) on a Ubuntu 16.04 machine with 64GB memory and 1TB ssd disk space with a Stardog server 5.2.1 installation.
Following the memory management and capacity planning tips, I also set the envvar STARDOG_SERVER_JAVA_ARGS to -Xmx8g -Xms8g -XX:MaxDirectMemorySize=48g and specified memory.mode = bulk_load in stardog.properties, before creating and filling the named graphs using stardog data add. I loaded a couple of ~0.33G, ~0.1G and ~16M triple batches each (mostly the same graph).

The first 2.7B triples took 4 full days to load. I was wondering if you have tips to optimize loading speed and capacity planning. Two observations:

  • Physical memory usage went up to ~35G while the amount of swap memory also went up to ~10G.
  • Loading speed decreased significantly after the first few batches:

Any advice?

Are these files being loaded with the --server-side flag, or are the file contents being pushed across a network?

The files are stored on the same ssd, i.e. as close to STARDOG_HOME as possible. They are not being loaded with the --server-side flag. Should I use the flag?

Additionally, 16.5G Mem and 15G swap are not released after loading only the first 2.5 Gtriples to the same named graph in 8 consecutive stardog data add steps. Is this expected behaviour?

Hi Stephen,

Having started creating the second named graph with the --server-side flag, 51.8 G Mem and >3 G swap (and increasing) is quickly occupied. Now the server log also shows memory exceptions:

...
Stardog server is listening on all network interfaces.
HTTP server available at http://localhost:5820.

STARDOG_HOME=/data/db/stardog

INFO  2018-03-08 10:05:06,590 [main] com.complexible.stardog.cli.impl.ServerStart:call(233): Memory options
INFO  2018-03-08 10:05:06,590 [main] com.complexible.stardog.cli.impl.ServerStart:call(234): Memory mode: BULK_LOAD
INFO  2018-03-08 10:05:06,591 [main] com.complexible.stardog.cli.impl.ServerStart:call(235): Min Heap Size: 8.0G
INFO  2018-03-08 10:05:06,591 [main] com.complexible.stardog.cli.impl.ServerStart:call(236): Max Heap Size: 7.7G
INFO  2018-03-08 10:05:06,591 [main] com.complexible.stardog.cli.impl.ServerStart:call(237): Max Direct Mem: 48G
INFO  2018-03-08 10:05:06,591 [main] com.complexible.stardog.cli.impl.ServerStart:call(238): Max Mapped Mem: 471M
INFO  2018-03-08 10:05:06,592 [main] com.complexible.stardog.cli.impl.ServerStart:call(239): System Memory: 63G
INFO  2018-03-08 10:08:01,495 [stardog-user-2] com.complexible.stardog.index.Index:printInternal(314): Preparing cache: 3% complete in 00:00:06
INFO  2018-03-08 10:08:03,005 [stardog-user-2] com.complexible.stardog.index.Index:printInternal(314): Preparing cache: 4% complete in 00:00:08
INFO  2018-03-08 10:08:04,541 [stardog-user-2] com.complexible.stardog.index.Index:printInternal(314): Preparing cache: 5% complete in 00:00:09
INFO  2018-03-08 10:08:06,064 [stardog-user-2] com.complexible.stardog.index.Index:printInternal(314): Preparing cache: 6% complete in 00:00:11
...
INFO  2018-03-08 10:09:58,837 [stardog-user-2] com.complexible.stardog.index.Index:printInternal(314): Preparing cache: 55% complete in 00:02:04
INFO  2018-03-08 10:10:00,472 [stardog-user-2] com.complexible.stardog.index.Index:printInternal(314): Preparing cache: 56% complete in 00:02:05
INFO  2018-03-08 10:10:02,098 [stardog-user-2] com.complexible.stardog.index.Index:printInternal(314): Preparing cache: 57% complete in 00:02:07
INFO  2018-03-08 10:10:03,787 [stardog-user-2] com.complexible.stardog.index.Index:printInternal(314): Preparing cache: 58% complete in 00:02:09
INFO  2018-03-08 10:10:05,538 [stardog-user-2] com.complexible.stardog.index.Index:printInternal(314): Preparing cache: 59% complete in 00:02:10
INFO  2018-03-08 10:10:07,423 [stardog-user-2] com.complexible.stardog.index.Index:printInternal(314): Preparing cache: 60% complete in 00:02:12
WARN  2018-03-08 10:10:40,560 [stardog-user-2] com.complexible.stardog.dht.impl.PagedDiskHashTable:createPage(1127): Available direct memory is low, will try to reduce memory usage
INFO  2018-03-08 10:10:40,564 [stardog-user-2] com.complexible.stardog.dht.impl.PagedDiskHashTable:createPage(1135): Direct Mem: 51,539,607,552 -> 51,539,607,552 Page Mem: 51,539,447,808 ->
 51,539,447,808 (0)
INFO  2018-03-08 10:10:40,564 [stardog-user-2] com.complexible.stardog.dht.impl.PagedDiskHashTable:createPage(1135): Direct Mem: 51,539,607,552 -> 51,539,607,552 Page Mem: 51,539,4[36/1875]
 51,539,447,808 (0)
WARN  2018-03-08 10:10:48,057 [stardog-user-2] com.complexible.stardog.dht.impl.PagedDiskHashTable:createPage(1127): Available direct memory is low, will try to reduce memory usage
INFO  2018-03-08 10:10:48,058 [stardog-user-2] com.complexible.stardog.dht.impl.PagedDiskHashTable:createPage(1135): Direct Mem: 51,539,607,552 -> 51,539,607,552 Page Mem: 51,539,451,904 ->
 51,539,451,904 (0)
WARN  2018-03-08 10:10:51,862 [stardog-user-2] com.complexible.stardog.dht.impl.PagedDiskHashTable:createPage(1127): Available direct memory is low, will try to reduce memory usage
INFO  2018-03-08 10:10:51,863 [stardog-user-2] com.complexible.stardog.dht.impl.PagedDiskHashTable:createPage(1135): Direct Mem: 51,539,607,552 -> 51,539,607,552 Page Mem: 51,539,456,000 ->
 51,539,456,000 (0)
WARN  2018-03-08 10:10:56,332 [stardog-user-2] com.complexible.stardog.dht.impl.PagedDiskHashTable:fillUpCache(844): Error while filling up the dictionary cache
java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:695) ~[?:1.8.0_151]
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[?:1.8.0_151]
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[?:1.8.0_151]
        at com.complexible.stardog.dht.impl.ByteBufferCompactPage.<init>(ByteBufferCompactPage.java:48) ~[stardog-5.2.1.jar:?]
        at com.complexible.stardog.dht.impl.ByteBufferFixedKeyPage.<init>(ByteBufferFixedKeyPage.java:52) ~[stardog-5.2.1.jar:?]
        at com.complexible.stardog.dht.impl.ByteBufferFixedKeyPage$Factory.create(ByteBufferFixedKeyPage.java:31) ~[stardog-5.2.1.jar:?]
        at com.complexible.stardog.dht.impl.ByteBufferFixedKeyPage$Factory.create(ByteBufferFixedKeyPage.java:22) ~[stardog-5.2.1.jar:?]
...
        at com.complexible.stardog.StardogKernel$NotifyingDatabaseConnection.add(StardogKernel.java:3323) ~[stardog-5.2.1.jar:?]
        at java.util.stream.Streams$StreamBuilderImpl.forEachRemaining(Streams.java:419) ~[?:1.8.0_151]
        at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) ~[?:1.8.0_151]
        at com.complexible.stardog.protocols.http.server.TransactionService.processUpdate(TransactionService.java:205) ~[stardog-protocols-http-server-5.2.1.jar:?]
        at com.complexible.stardog.protocols.http.server.TransactionService.add(TransactionService.java:174) ~[stardog-protocols-http-server-5.2.1.jar:?]
        at com.stardog.http.server.undertow.jaxrs.ExtractRoutes.lambda$handleIt$5(ExtractRoutes.java:196) ~[stardog-protocols-http-server-5.2.1.jar:?]
        at org.apache.shiro.subject.support.SubjectRunnable.doRun(SubjectRunnable.java:120) ~[shiro-core-1.2.3.jar:1.2.3]
        at org.apache.shiro.subject.support.SubjectRunnable.run(SubjectRunnable.java:108) ~[shiro-core-1.2.3.jar:1.2.3]
        at com.stardog.http.server.undertow.ErrorHandling.lambda$safeDispatch$1(ErrorHandling.java:71) ~[stardog-protocols-http-server-5.2.1.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
INFO  2018-03-08 10:12:06,366 [Stardog.Executor-2] com.complexible.stardog.index.Index:printInternal(314): Parsing triples: 0% complete in 00:01:09 (1.0M triples - 14.4K triples/sec)
INFO  2018-03-08 10:12:19,371 [Stardog.Executor-5] com.complexible.stardog.index.Index:printInternal(314): Parsing triples: 1% complete in 00:01:23 (1.2M triples - 13.9K triples/sec)
INFO  2018-03-08 10:13:30,975 [Stardog.Executor-5] com.complexible.stardog.index.Index:printInternal(314): Parsing triples: 1% complete in 00:02:34 (2.0M triples - 12.9K triples/sec)
INFO  2018-03-08 10:14:04,380 [Stardog.Executor-0] com.complexible.stardog.index.Index:printInternal(314): Parsing triples: 2% complete in 00:03:08 (2.3M triples - 12.1K triples/sec)
INFO  2018-03-08 10:15:58,882 [Stardog.Executor-0] com.complexible.stardog.index.Index:printInternal(314): Parsing triples: 2% complete in 00:05:02 (3.0M triples - 9.9K triples/sec)
INFO  2018-03-08 10:16:57,699 [Stardog.Executor-5] com.complexible.stardog.index.Index:printInternal(314): Parsing triples: 3% complete in 00:06:01 (3.4M triples - 9.3K triples/sec)
INFO  2018-03-08 10:18:41,679 [Stardog.Executor-2] com.complexible.stardog.index.Index:printInternal(314): Parsing triples: 3% complete in 00:07:45 (4.0M triples - 8.6K triples/sec)
...

Any suggestions?

Loading 7B is best done by passing all the files to the db create which is done outside the transaction system and much more efficient. Doing it in batches will either not work or take a very long time (days instead of hours).

In case you are trying to load triples into named graphs you’ll need to write a small Java program like this instead of the db create command.

I must say that the numbers given in the capacity planning section are close to the minimum required values. If you have a machine with more memory where you can increase the heap size you’ll see improved performance.

Best,
Evren

1 Like

Thanks @evren. I more or less assumed that the memory-tweaking arguments and property would have the same effect (I was pointed to your gist already in this post). I will try out your suggestion!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.