Stardog is running out of RAM with 16GB RAM and 15 millions triples

We are running stardog version 7.1.0 at the customer deployment. The configuration is as follows:

# Default options
query.all.graphs = true
security.named.graphs = true
# Base 64 compatible passwords are a requirement for DataPlatform
password.length.max = 64
password.regex = [\\w+\\/=]+

logging.access.enabled = true
logging.access.type = text
logging.access.file = ...snip.../access.log
logging.access.rotation.type = size
logging.access.rotation.limit = 1000000

logging.slow_query.enabled = true
logging.slow_query.time = 30s
logging.slow_query.type = text

backup.location = ...snip...
export.dir = ...snip...

javax.net.ssl.keyStore = ...snip...
javax.net.ssl.keyStorePassword = ...snip...
javax.net.ssl.keyStoreType = ...snip...

sql.server.enabled = true
sql.server.port = 5806
sql.server.commit.invalidates.schema = true

The memory configuration is as follows:

stardog_java_heap_memory: "3500M"
stardog_direct_memory: "13500M"

According to documentation, this should be enough to store ~1 billion triples (Home | Stardog Documentation Latest), we have around 15 millions triples in the database right now.

The RAM is fully consumed by Stardog and Stardog writes this to the logs:

INFO  2020-03-18 11:26:36,108 [memory-monitor] com.complexible.stardog.api.NativeMemoryMonitor:reportStatus(118): Memory usage 99% - initiating detailed analysis - will check again in 1m
INFO  2020-03-18 11:26:36,109 [memory-monitor] com.complexible.stardog.api.NativeMemoryMonitor:reportStatus(119): Stardog JVM memory usage
OSMemory{ virtualSize=21G, residentSize=14G, residentSizePeak=15G, regionCount=1, pinnedSize=0B, vmTotal=15G, vmAvailable=98M, pageSize=4.0K}
INFO  2020-03-18 11:26:36,109 [memory-monitor] com.complexible.stardog.api.NativeMemoryMonitor:reportStatus(124): Stardog Metrics
dbms.memory.blockcache.data.add.count = 12,511,364
dbms.memory.blockcache.data.add.failure.count = 0
dbms.memory.blockcache.data.cachesIndexBlocks = false
dbms.memory.blockcache.data.capacity = 2,348,154,624
dbms.memory.blockcache.data.hits = 4,622,787
dbms.memory.blockcache.data.misses = 12,511,364
dbms.memory.blockcache.data.pinnedUsage = 274,752
dbms.memory.blockcache.data.ratio = 0.26979959497263684
dbms.memory.blockcache.data.read = 291,374,723,568
dbms.memory.blockcache.data.strictCapacity = true
dbms.memory.blockcache.data.usage = 2,345,750,848
dbms.memory.blockcache.data.written = 808,825,115,616
dbms.memory.blockcache.dictionary.add.count = 5,633
dbms.memory.blockcache.dictionary.add.failure.count = 0
dbms.memory.blockcache.dictionary.cachesIndexBlocks = true
dbms.memory.blockcache.dictionary.capacity = 1,174,077,312
dbms.memory.blockcache.dictionary.hits = 58,886,693
dbms.memory.blockcache.dictionary.misses = 5,633
dbms.memory.blockcache.dictionary.pinnedUsage = 0
dbms.memory.blockcache.dictionary.ratio = 0.9999043508656799
dbms.memory.blockcache.dictionary.read = 1,967,363,663,296
dbms.memory.blockcache.dictionary.strictCapacity = true
dbms.memory.blockcache.dictionary.usage = 211,034,832
dbms.memory.blockcache.dictionary.written = 363,894,208
dbms.memory.blockcache.total.capacity = 4,109,270,592
dbms.memory.blockcache.total.data.hits = 0
dbms.memory.blockcache.total.data.misses = 0
dbms.memory.blockcache.total.data.ratio = 0.0
dbms.memory.blockcache.total.filter.hits = 0
dbms.memory.blockcache.total.filter.misses = 0
dbms.memory.blockcache.total.filter.ratio = 0.0
dbms.memory.blockcache.total.index.hits = 0
dbms.memory.blockcache.total.index.misses = 0
dbms.memory.blockcache.total.index.ratio = 0.0
dbms.memory.blockcache.total.pinnedUsage = 274,752
dbms.memory.blockcache.total.usage = 2,556,918,912
dbms.memory.blockcache.txn.add.count = 48
dbms.memory.blockcache.txn.add.failure.count = 0
dbms.memory.blockcache.txn.cachesIndexBlocks = true
dbms.memory.blockcache.txn.capacity = 587,038,656
dbms.memory.blockcache.txn.hits = 3,020,889
dbms.memory.blockcache.txn.misses = 48
dbms.memory.blockcache.txn.pinnedUsage = 0
dbms.memory.blockcache.txn.ratio = 0.999984110890098
dbms.memory.blockcache.txn.read = 25,757,433,968
dbms.memory.blockcache.txn.strictCapacity = true
dbms.memory.blockcache.txn.usage = 133,232
dbms.memory.blockcache.txn.written = 138,208
dbms.memory.config.dict_index_cache = 703,489,638
dbms.memory.config.dict_value_cache = 703,489,638
dbms.memory.config.query_mem_blocks = 4,034,396,160
dbms.memory.config.starrocks_size = 9,413,591,040
dbms.memory.direct.buffer.used = 4,048,343,474
dbms.memory.heap.max = 3,649,044,480
dbms.memory.heap.query.blocks.max = 0
dbms.memory.heap.query.blocks.used = 0
dbms.memory.heap.used = 2,172,871,688
dbms.memory.management = NATIVE
dbms.memory.mapped.used = 92,343,033
dbms.memory.mode = DEFAULT
dbms.memory.monitor.interval = 60000
dbms.memory.monitor.report.threshold = 0.9
dbms.memory.native.max = 14,155,776,000
dbms.memory.native.query.blocks.max = 4,034,396,160
dbms.memory.native.query.blocks.used = 3,083,952
dbms.memory.system.pageSize = 4,096
dbms.memory.system.pinnedSize = 0
dbms.memory.system.regionCount = 1
dbms.memory.system.rss = 15,545,507,840
dbms.memory.system.rss.peak = 15,831,801,856
dbms.memory.system.usageRatio = 0.9938290708006107
dbms.memory.system.virtual = 22,415,282,176
INFO  2020-03-18 11:26:36,109 [memory-monitor] com.complexible.stardog.api.NativeMemoryMonitor:reportStatus(126): System mem
Total Memory: 16637079552
Available Memory: 102666240
INFO  2020-03-18 11:26:36,109 [memory-monitor] com.complexible.stardog.api.NativeMemoryMonitor:reportStatus(127): init = 3.4G used = 2.0G committed = 3.4G max = 3.4G
INFO  2020-03-18 11:26:36,109 [memory-monitor] com.complexible.stardog.api.NativeMemoryMonitor:reportStatus(128): 
DETAILED MEMORY INFO
Heap Memory Usage: init = 3670016000(3584000K) used = 2172990472(2122061K) committed = 3649044480(3563520K) max = 3649044480(3563520K)
Non-Heap Memory Usage: init = 2555904(2496K) used = 158539512(154823K) committed = 164823040(160960K) max = -1(-1K)
Name: PS Scavenge
	Collection count: 211
	Collection time: 15011
	Memory Pools: [PS Eden Space, PS Survivor Space]
Name: PS MarkSweep
	Collection count: 3
	Collection time: 218
	Memory Pools: [PS Eden Space, PS Survivor Space, PS Old Gen]

Memory Pools Info
Name: Code Cache
	Usage: init = 2555904(2496K) used = 75559552(73788K) committed = 76218368(74432K) max = 251658240(245760K)
	Collection Usage: null
	Peak Usage: init = 2555904(2496K) used = 75569920(73798K) committed = 76218368(74432K) max = 251658240(245760K)
	Type: Non-heap memory
	Memory Manager Names: [CodeCacheManager]
Name: Metaspace
	Usage: init = 0(0K) used = 74446432(72701K) committed = 79167488(77312K) max = -1(-1K)
	Collection Usage: null
	Peak Usage: init = 0(0K) used = 74446432(72701K) committed = 79167488(77312K) max = -1(-1K)
	Type: Non-heap memory
	Memory Manager Names: [Metaspace Manager]
Name: Compressed Class Space
	Usage: init = 0(0K) used = 8533528(8333K) committed = 9437184(9216K) max = 1073741824(1048576K)
	Collection Usage: null
	Peak Usage: init = 0(0K) used = 8533528(8333K) committed = 9437184(9216K) max = 1073741824(1048576K)
	Type: Non-heap memory
	Memory Manager Names: [Metaspace Manager]
Name: PS Eden Space
	Usage: init = 918028288(896512K) used = 562539384(549354K) committed = 1182269440(1154560K) max = 1184366592(1156608K)
	Collection Usage: init = 918028288(896512K) used = 0(0K) committed = 1182269440(1154560K) max = 1184366592(1156608K)
	Peak Usage: init = 918028288(896512K) used = 1197473792(1169408K) committed = 1197473792(1169408K) max = 1197473792(1169408K)
	Type: Heap memory
	Memory Manager Names: [PS MarkSweep, PS Scavenge]
Name: PS Survivor Space
	Usage: init = 152567808(148992K) used = 3842576(3752K) committed = 19922944(19456K) max = 19922944(19456K)
	Collection Usage: init = 152567808(148992K) used = 3842576(3752K) committed = 19922944(19456K) max = 19922944(19456K)
	Peak Usage: init = 152567808(148992K) used = 259507648(253425K) committed = 332398592(324608K) max = 332398592(324608K)
	Type: Heap memory
	Memory Manager Names: [PS MarkSweep, PS Scavenge]
Name: PS Old Gen
	Usage: init = 2446852096(2389504K) used = 1606608512(1568953K) committed = 2446852096(2389504K) max = 2446852096(2389504K)
	Collection Usage: init = 2446852096(2389504K) used = 51879424(50663K) committed = 2446852096(2389504K) max = 2446852096(2389504K)
	Peak Usage: init = 2446852096(2389504K) used = 1606608512(1568953K) committed = 2446852096(2389504K) max = 2446852096(2389504K)
	Type: Heap memory
	Memory Manager Names: [PS MarkSweep]

The stardog instance is not working at the moment, stalling the work of couple of people as well as the application which depends on it.

1 Like

There was this error earlier yesterday (RAM consumption was around 96% at the time):

ERROR 2020-03-17 16:21:03,201 [stardog-user-67] com.complexible.stardog.db.DatabaseConnectionImpl:apply(776): There was an error adding data
com.stardog.starrocks.StarrocksException: std::bad_alloc
	at com.stardog.starrocks.index.NativePipelineBuffer.submitInternal(Native Method) ~[stardog-7.1.0.jar:?]
	at com.stardog.starrocks.index.NativePipelineBuffer.submit(NativePipelineBuffer.java:97) ~[stardog-7.1.0.jar:?]
	at com.stardog.starrocks.index.NativePipelineWriter.writeInternal(NativePipelineWriter.java:117) ~[stardog-7.1.0.jar:?]
	at com.stardog.starrocks.index.NativePipelineWriter.write(NativePipelineWriter.java:57) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.index.mvcc.rocks.NativeSingleThreadedUpdater.flush(NativeSingleThreadedUpdater.java:127) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.index.mvcc.rocks.NativeSingleThreadedUpdater.values(NativeSingleThreadedUpdater.java:101) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.index.IndexUpdaters.update(IndexUpdaters.java:191) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.index.Indexes.removeContext(Indexes.java:165) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.db.index.ConnectableIndexRWConnectionImpl$IndexResourceTransaction.apply(ConnectableIndexRWConnectionImpl.java:662) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.db.index.ConnectableIndexRWConnectionImpl.apply(ConnectableIndexRWConnectionImpl.java:379) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.db.ConnectableConnectionCollection.apply(ConnectableConnectionCollection.java:193) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.db.DatabaseConnectionImpl.apply(DatabaseConnectionImpl.java:769) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.db.DatabaseConnectionImpl.clearContext(DatabaseConnectionImpl.java:969) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.ClearOperatorImpl.computeNext(ClearOperatorImpl.java:53) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.ClearOperatorImpl.computeNext(ClearOperatorImpl.java:26) ~[stardog-7.1.0.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:147) ~[stardog-utils-common-7.1.0.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:134) ~[stardog-utils-common-7.1.0.jar:?]
	at com.complexible.common.base.CloseableIterator.size(CloseableIterator.java:200) ~[stardog-utils-common-7.1.0.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.UpdateSequenceOperatorImpl.computeNext(UpdateSequenceOperatorImpl.java:76) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.UpdateSequenceOperatorImpl.computeNext(UpdateSequenceOperatorImpl.java:29) ~[stardog-7.1.0.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:147) ~[stardog-utils-common-7.1.0.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:134) ~[stardog-utils-common-7.1.0.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.AutoCloseOperator.computeNext(AutoCloseOperator.java:119) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.AutoCloseOperator.computeNext(AutoCloseOperator.java:26) ~[stardog-7.1.0.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:147) ~[stardog-utils-common-7.1.0.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:134) ~[stardog-utils-common-7.1.0.jar:?]
	at com.google.common.collect.Iterators.size(Iterators.java:163) ~[guava-26.0-jre.jar:?]
	at com.complexible.stardog.plan.eval.QueryEngine.executeUpdate(QueryEngine.java:330) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.query.DefaultQueryFactory$UpdateQueryImpl.execute(DefaultQueryFactory.java:316) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.query.DefaultQueryFactory$UpdateQueryImpl.execute(DefaultQueryFactory.java:295) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.query.DelegatingUpdateQuery.execute(DelegatingUpdateQuery.java:141) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.db.DatabaseImpl$TimerPauseUpdate.execute(DatabaseImpl.java:1718) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.db.DatabaseImpl$TimerPauseUpdate.execute(DatabaseImpl.java:1705) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.query.DelegatingUpdateQuery.execute(DelegatingUpdateQuery.java:141) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.StardogKernel$SecuredUpdateQuery.execute(StardogKernel.java:4570) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.StardogKernel$SecuredUpdateQuery.execute(StardogKernel.java:4557) ~[stardog-7.1.0.jar:?]
	at com.complexible.stardog.protocols.http.server.ProtocolUtils.executeUpdateQuery(ProtocolUtils.java:586) ~[stardog-protocols-http-server-7.1.0.jar:?]
	at com.complexible.stardog.protocols.http.server.SPARQLUpdate.executeQuery(SPARQLUpdate.java:80) ~[stardog-protocols-http-server-7.1.0.jar:?]
	at com.complexible.stardog.protocols.http.server.SPARQLUpdate.updatePost(SPARQLUpdate.java:66) ~[stardog-protocols-http-server-7.1.0.jar:?]
	at com.stardog.http.server.undertow.jaxrs.ExtractRoutes.lambda$handleIt$5(ExtractRoutes.java:192) ~[stardog-protocols-http-server-7.1.0.jar:?]
	at org.apache.shiro.subject.support.SubjectRunnable.doRun(SubjectRunnable.java:120) [shiro-core-1.3.0.jar:1.3.0]
	at org.apache.shiro.subject.support.SubjectRunnable.run(SubjectRunnable.java:108) [shiro-core-1.3.0.jar:1.3.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]

Any SELECT queries work. INSERT query does not work, for example this will fail:

INSERT DATA { GRAPH <urn:test> {<urn:test> <urn:test> <urn:test> .} }

we have restarted the server and the problem is gone for now, but there might be something underneath (memory leak?)