Stardog Server 5.3.3 crashed on loading the file larger than the DirectMomorySize.
WARN 2018-09-04 05:26:11,700 [Stardog.Executor-3] com.complexible.stardog.dht.impl.PagedDiskHashTable:createPage(1123): Available direct memory is low, will try to reduce memory usage
WARN 2018-09-04 05:26:13,817 [Stardog.Executor-3] com.complexible.common.rdf.rio.RDFStreamProcessor:setException(586): Error during loading /data/Downloads/yago3.1_entire_ttl/yagoSources.ttl: java.lang.OutOfMemoryError: Direct buffer memory
java.lang.RuntimeException: java.lang.OutOfMemoryError: Direct buffer memory
then
: Parsing triples: 100% complete in 11:37:25 (475.0M triples - 11.4K triples/sec)
INFO 2018-09-04 05:26:13,842 [stardog-user-1] com.complexible.stardog.index.Index:stop(326):
INFO 2018-09-04 05:26:13,842 [stardog-user-1] com.complexible.stardog.index.Index:stop(329): Parsing triples finished in 11:37:25.777
ERROR 2018-09-04 05:26:13,856 [stardog-user-1] com.stardog.http.server.undertow.ErrorHandling:writeError(138): Unexpected error on the server
java.lang.NoClassDefFoundError: Could not initialize class io.netty.util.internal.Cleaner0
Can we get some more details on what you’re trying to do? Can you also let us know how big the file is and how much direct/total memory there is on the machine?
The biggest file (yagoSources.ttl) is 30GB.
The computer has 32Gb of RAM, plus 16Gb of swap. I try not to use swap, it is there to prevent OOM killing
The primary hard drive is 256Gb NvME. The data drive is 2Tb spinning drive
Alex, just reporting my experience with YAGO and other large KBs:
Yes (to your original question), it’s definitely possible. In experimental (=non-production) mode, I at one point loaded both a substantial subset of YAGO and the entirety of FreeBase (the last publicly available version) on a machine with 48G RAM. Stardog is not an in-memory DB (by default at least) and “loading” doesn’t mean that every bit of deserialized stuff needs to be in memory at the same time. I wasn’t interested in seeing “how low you can go” on RAM, so I don’t know whether it could have worked in 32 or 16G, but there you have it.
For my case, I found that 48G for loading all that stuff were barely working. Queries otoh were much less happy (but still perfectly possible once you tweaked the env parameters Stephen is pointing to). For actual usage, I expanded at some point RAM on the virtual machine to 92G, and set about 3/4 of that for direct access.
The data loading was much more efficient if you change the default memory mode parameters to the value expressing “optimized for updates” (it’s in the doc), and that was really helpful in practical terms.
Hi Stephen, I use 12Gb for Heap and 12Gb for External Memory…Somehow it ends up using 29Gb.
I also use bulk_load.
Should I use write_optimised instead? Because it looks like it loads first 30% very fast, then it becomes non-stop memory management.