Query timeouts and killing fails

Hi,

When using Stardog with recommended memory parameters according to a dataset size of over 1 billion triples (10 GB Java Direct Memory and 6 GB Java Heap), we experienced several issues when running a sample of 40 queries (see below). None of those queries succeeded within its internal database timeout (240 s) and every query kept running even after the timeout was supposed to be triggered. By just ignoring this fact and continuing with the next query after the client side timeout (400 s) of our application was over, the database went into an undefined state after a number of queries (usually 23). It was still running but did not respond to any command or activity and did not consume cpu time. We therefore tried to check after the client side timeout was triggered, how many queries were still active. But this function seemed to block as long as the database was struggling with the old query (indicated by a very high cpu usage). We therefore tried to kill all queries after the client side timeout, but this resulted in a similar effect, where this function was blocking from several minutes to hours. Both procedures did not prevent stardog from reaching the undefined state mentioned above. By investigating the database logs, we found several garbage collection related and out of memory error messages. After increasing the memory settings to 32 GB each for heap and off-heap while also applying the query killing strategy after every query, stardog processed all 40 queries within more than 14 hours. As the runtime for 40 queries based on the client timeout of 400 seconds is supposed to be shorter than 4.5 hours, in total more than 10 hours were spent on the attempt to kill the last query before executing the next one. For 7 out of 40 queries stardog was able to cancel the query before the internal timeout was triggered. But still no single query returned a result. We can run these queries without issues and within less than 10 seconds (per query) in virtuoso.

The queries were executed on an Ubuntu 14.04 server system with a 3.19.0-33 kernel, Oracle Java 1.8.0.66, a 1,8 GHz Intel Xeon E5-2630L CPU, 256GB RAM and a 3.6 TB hard disk drive.
We used Stardog in version 4.2.3

A lot of question arise:
How is the killing command implemented?
What does it mean if it returns nothing?
Is there a timeout for the killing command as well?
Why does it not kill all the queries in our case?
What is direct and what is heap memory used for in stardog and how would you configure it?
Why are the queries so slow?

But the major question is: Is this caused by Java 8 and is there really no reliable way to
cancel queries or is it just a bug?

I appreciate any hints, ideas, suggestions and explanations.

Cheers,

======================
40 queries which have the same structure like this (but different dbo:industry objects)
PREFIX dbo: http://dbpedia.org/ontology/
SELECT DISTINCT ?company1
WHERE { {
?company1 rdf:type dbo:Company .
?company2 rdf:type dbo:Company .
?company1 dbo:industry dbo:Advertising .
?company2 dbo:industry dbo:Advertising .
} OPTIONAL{
?company1 rdfs:label ?label1 .
?company2 rdfs:label ?label2 .
} OPTIONAL{
?company1 dbo:locationCity ?city1 .
?company2 dbo:locationCity ?city2 .
} OPTIONAL{
?company1 dbo:locationCountry ?country1 .
?company2 dbo:locationCountry ?country2 .
}
FILTER(?company1!=?company2 && ( (?label1=?label2 && STRLEN(?label1)>3) || ?city1=?city2 || ?country1=?country2 ) )
}

log excerpt from the 6G/10G Setup

[StardogServer.WorkerGroup-16] com.complexible.stardog.protocols.http.server.HttpMessageEncoder:write(171): There was an error writing the HTTP response
org.openrdf.query.QueryEvaluationException: com.complexible.stardog.plan.eval.operator.OperatorException: Query execution cancelled: Execution time exceeded query timeout 240000
	at com.complexible.common.rdf.query.IteratorAsTupleQueryResult.hasNext(IteratorAsTupleQueryResult.java:81) ~[stardog-utils-rdf-4.2.3.jar:?]
	at com.complexible.stardog.protocols.server.ConnectionFinishQueryResult.hasNext(ConnectionFinishQueryResult.java:75) ~[stardog-protocols-spec-server-4.2.3.jar:?]
	at org.openrdf.query.QueryResults.report(QueryResults.java:158) ~[sesame-query-4.0.0.jar:?]
	at org.openrdf.query.resultio.QueryResultIO.writeTuple(QueryResultIO.java:449) ~[sesame-queryresultio-api-4.0.0.jar:?]
	at com.complexible.stardog.protocols.http.server.io.ResponseBodyEncoders$13.encode(ResponseBodyEncoders.java:470) ~[stardog-protocols-http-server-4.2.3.jar:?]
	at com.complexible.stardog.protocols.http.server.io.ResponseBodyEncoders$2.encode(ResponseBodyEncoders.java:101) ~[stardog-protocols-http-server-4.2.3.jar:?]
	at com.complexible.stardog.protocols.http.server.HttpMessageEncoder$SingletonMessageResponseBodyEncoder.encode(HttpMessageEncoder.java:518) ~[stardog-protocols-http-server-4.2.3.jar:?]
	at com.complexible.stardog.protocols.http.server.HttpMessageEncoder.write(HttpMessageEncoder.java:158) [stardog-protocols-http-server-4.2.3.jar:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]
Caused by: com.complexible.stardog.plan.eval.operator.OperatorException: Query execution cancelled: Execution time exceeded query timeout 240000
	at com.complexible.stardog.plan.eval.operator.impl.AbstractOperator.checkCanceled(AbstractOperator.java:139) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.IndexScan.computeNext(IndexScan.java:377) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.IndexScan.computeNext(IndexScan.java:41) ~[stardog-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:143) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:130) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.LoopJoin.computeNext(LoopJoin.java:196) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.LoopJoin.computeNext(LoopJoin.java:25) ~[stardog-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:143) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:130) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.HashJoinOp.buildHashTable(HashJoinOp.java:271) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.HashJoinOp._init(HashJoinOp.java:249) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.HashJoinOp.computeNext(HashJoinOp.java:149) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.HashJoinOp.computeNext(HashJoinOp.java:32) ~[stardog-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:143) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:130) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.FilterOp.computeNext(FilterOp.java:83) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.FilterOp.computeNext(FilterOp.java:28) ~[stardog-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:143) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:130) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.SingleProjectionOp.computeNext(SingleProjectionOp.java:83) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.SingleProjectionOp.computeNext(SingleProjectionOp.java:33) ~[stardog-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:143) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:130) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.LazyUnsortedGroupify$1.computeNext(GroupifyAlgorithms.java:139) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.LazyUnsortedGroupify$1.computeNext(GroupifyAlgorithms.java:130) ~[stardog-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:143) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:130) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.AbstractGroupOp.computeNext(AbstractGroupOp.java:141) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.AbstractGroupOp.computeNext(AbstractGroupOp.java:34) ~[stardog-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:143) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:130) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.DistinctOp.computeNext(DistinctOp.java:53) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.impl.DistinctOp.computeNext(DistinctOp.java:21) ~[stardog-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:143) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:130) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.AutoCloseOperator.computeNext(AutoCloseOperator.java:115) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.AutoCloseOperator.computeNext(AutoCloseOperator.java:25) ~[stardog-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:143) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:130) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.OpBasedBindingSetIteration.computeNext(OpBasedBindingSetIteration.java:110) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.OpBasedBindingSetIteration.computeNext(OpBasedBindingSetIteration.java:34) ~[stardog-4.2.3.jar:?]
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-18.0.jar:?]
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-18.0.jar:?]
	at com.complexible.common.rdf.query.IteratorAsTupleQueryResult.hasNext(IteratorAsTupleQueryResult.java:77) ~[stardog-utils-rdf-4.2.3.jar:?]
	... 15 more
ERROR 2017-02-15 02:31:15,219 [SPEC-Server-1-18] com.complexible.common.protocols.server.rpc.ServerHandler:exceptionCaught(401): exceptionCaughtServerHandler
java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR 2017-02-15 02:31:15,219 [StardogServer.WorkerGroup-17] com.complexible.stardog.protocols.http.server.HttpMessageEncoder:createErrorResponse(293): The result encoder
ERROR 2017-02-15 05:23:29,643 [StardogServer.WorkerGroup-37] com.complexible.stardog.protocols.http.server.HttpMessageEncoder:write(161): There was an error while encoding the response
com.complexible.common.protocols.EncodingException: com.complexible.stardog.plan.eval.operator.OperatorException: Query execution cancelled: Execution time exceeded query timeout 240000
	at com.complexible.stardog.protocols.http.server.io.ResponseBodyEncoders$13.encode(ResponseBodyEncoders.java:473) ~[stardog-protocols-http-server-4.2.3.jar:?]
	at com.complexible.stardog.protocols.http.server.io.ResponseBodyEncoders$2.encode(ResponseBodyEncoders.java:101) ~[stardog-protocols-http-server-4.2.3.jar:?]
	at com.complexible.stardog.protocols.http.server.HttpMessageEncoder$SingletonMessageResponseBodyEncoder.encode(HttpMessageEncoder.java:518) ~[stardog-protocols-http-server-4.2.3.jar:?]
	at com.complexible.stardog.protocols.http.server.HttpMessageEncoder.write(HttpMessageEncoder.java:158) [stardog-protocols-http-server-4.2.3.jar:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]
Caused by: org.openrdf.query.QueryEvaluationException: com.complexible.stardog.plan.eval.operator.OperatorException: Query execution cancelled: Execution time exceeded query timeout 240000
	at com.complexible.common.rdf.query.IteratorAsTupleQueryResult.hasNext(IteratorAsTupleQueryResult.java:81) ~[stardog-utils-rdf-4.2.3.jar:?]
	at com.complexible.stardog.protocols.server.ConnectionFinishQueryResult.hasNext(ConnectionFinishQueryResult.java:75) ~[stardog-protocols-spec-server-4.2.3.jar:?]
	at org.openrdf.query.QueryResults.report(QueryResults.java:158) ~[sesame-query-4.0.0.jar:?]
	at org.openrdf.query.resultio.QueryResultIO.writeTuple(QueryResultIO.java:449) ~[sesame-queryresultio-api-4.0.0.jar:?]
	at com.complexible.stardog.protocols.http.server.io.ResponseBodyEncoders$13.encode(ResponseBodyEncoders.java:470) ~[stardog-protocols-http-server-4.2.3.jar:?]
	... 11 more
Caused by: com.complexible.stardog.plan.eval.operator.OperatorException: Query execution cancelled: Execution time exceeded query timeout 240000
	at com.complexible.stardog.plan.eval.operator.impl.AbstractOperator.checkCanceled(AbstractOperator.java:139) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.AutoCloseOperator.computeNext(AutoCloseOperator.java:129) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.AutoCloseOperator.computeNext(AutoCloseOperator.java:25) ~[stardog-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:143) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:130) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.OpBasedBindingSetIteration.computeNext(OpBasedBindingSetIteration.java:110) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.OpBasedBindingSetIteration.computeNext(OpBasedBindingSetIteration.java:34) ~[stardog-4.2.3.jar:?]
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-18.0.jar:?]
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-18.0.jar:?]
	at com.complexible.common.rdf.query.IteratorAsTupleQueryResult.hasNext(IteratorAsTupleQueryResult.java:77) ~[stardog-utils-rdf-4.2.3.jar:?]
	at com.complexible.stardog.protocols.server.ConnectionFinishQueryResult.hasNext(ConnectionFinishQueryResult.java:75) ~[stardog-protocols-spec-server-4.2.3.jar:?]
	at org.openrdf.query.QueryResults.report(QueryResults.java:158) ~[sesame-query-4.0.0.jar:?]
	at org.openrdf.query.resultio.QueryResultIO.writeTuple(QueryResultIO.java:449) ~[sesame-queryresultio-api-4.0.0.jar:?]
	at com.complexible.stardog.protocols.http.server.io.ResponseBodyEncoders$13.encode(ResponseBodyEncoders.java:470) ~[stardog-protocols-http-server-4.2.3.jar:?]
	... 11 more
ERROR 2017-02-15 05:27:55,192 [StardogServer.WorkerGroup-37] com.complexible.stardog.protocols.http.server.HttpMessageEncoder:write(171): There was an error writing the HTTP response
org.openrdf.query.QueryEvaluationException: com.complexible.stardog.plan.eval.operator.OperatorException: Query execution cancelled: Execution time exceeded query timeout 240000
	at com.complexible.common.rdf.query.IteratorAsTupleQueryResult.hasNext(IteratorAsTupleQueryResult.java:81) ~[stardog-utils-rdf-4.2.3.jar:?]
	at com.complexible.stardog.protocols.server.ConnectionFinishQueryResult.hasNext(ConnectionFinishQueryResult.java:75) ~[stardog-protocols-spec-server-4.2.3.jar:?]
	at org.openrdf.query.QueryResults.report(QueryResults.java:158) ~[sesame-query-4.0.0.jar:?]
	at org.openrdf.query.resultio.QueryResultIO.writeTuple(QueryResultIO.java:449) ~[sesame-queryresultio-api-4.0.0.jar:?]
	at com.complexible.stardog.protocols.http.server.io.ResponseBodyEncoders$13.encode(ResponseBodyEncoders.java:470) ~[stardog-protocols-http-server-4.2.3.jar:?]
	at com.complexible.stardog.protocols.http.server.io.ResponseBodyEncoders$2.encode(ResponseBodyEncoders.java:101) ~[stardog-protocols-http-server-4.2.3.jar:?]
	at com.complexible.stardog.protocols.http.server.HttpMessageEncoder$SingletonMessageResponseBodyEncoder.encode(HttpMessageEncoder.java:518) ~[stardog-protocols-http-server-4.2.3.jar:?]
	at com.complexible.stardog.protocols.http.server.HttpMessageEncoder.write(HttpMessageEncoder.java:158) [stardog-protocols-http-server-4.2.3.jar:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) [netty-all-4.0.32.Final.jar:4.0.32.Final]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]
Caused by: com.complexible.stardog.plan.eval.operator.OperatorException: Query execution cancelled: Execution time exceeded query timeout 240000
	at com.complexible.stardog.plan.eval.operator.impl.AbstractOperator.checkCanceled(AbstractOperator.java:139) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.AutoCloseOperator.computeNext(AutoCloseOperator.java:129) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.AutoCloseOperator.computeNext(AutoCloseOperator.java:25) ~[stardog-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.tryToComputeNext(AbstractSkippingIterator.java:143) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.common.collect.AbstractSkippingIterator.hasNext(AbstractSkippingIterator.java:130) ~[stardog-utils-common-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.OpBasedBindingSetIteration.computeNext(OpBasedBindingSetIteration.java:110) ~[stardog-4.2.3.jar:?]
	at com.complexible.stardog.plan.eval.operator.util.OpBasedBindingSetIteration.computeNext(OpBasedBindingSetIteration.java:34) ~[stardog-4.2.3.jar:?]
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-18.0.jar:?]
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-18.0.jar:?]
	at com.complexible.common.rdf.query.IteratorAsTupleQueryResult.hasNext(IteratorAsTupleQueryResult.java:77) ~[stardog-utils-rdf-4.2.3.jar:?]
	... 15 more
ERROR 2017-02-15 05:48:14,058 [SPEC-Server-1-8] com.complexible.common.protocols.server.rpc.ServerHandler:exceptionCaught(401): exceptionCaughtServerHandler
java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR 2017-02-15 05:48:14,059 [StardogServer.WorkerGroup-39] com.complexible.stardog.protocols.http.server.HttpMessageEncoder:createErrorResponse(293): The result encoder received an error message it could not encode, error was:
java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR 2017-02-15 06:05:35,184 [SPEC-Server-1-10] com.complexible.common.protocols.server.rpc.ServerHandler:exceptionCaught(401): exceptionCaughtServerHandler
java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR 2017-02-15 06:05:35,184 [StardogServer.WorkerGroup-41] com.complexible.stardog.protocols.http.server.HttpMessageEncoder:createErrorResponse(293): The result encoder received an error message it could not encode, error was:
java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR 2017-02-15 06:23:01,097 [SPEC-Server-1-12] com.complexible.common.protocols.server.rpc.ServerHandler:exceptionCaught(401): exceptionCaughtServerHandler
java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR 2017-02-15 06:23:01,097 [StardogServer.WorkerGroup-43] com.complexible.stardog.protocols.http.server.HttpMessageEncoder:createErrorResponse(293): The result encoder received an error message it could not encode, error was:
java.lang.OutOfMemoryError: GC overhead limit exceeded
WARN  2017-02-15 06:43:05,447 [StardogServer.BossGroup-0] io.netty.channel.DefaultChannelPipeline:warn(151): An exception was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR 2017-02-15 06:45:04,759 [SPEC-Server-1-14] com.complexible.common.protocols.server.rpc.ServerHandler:exceptionCaught(401): exceptionCaughtServerHandler
java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR 2017-02-15 06:45:04,760 [StardogServer.WorkerGroup-45] com.complexible.stardog.protocols.http.server.HttpMessageEncoder:createErrorResponse(293): The result encoder received an error message it could not encode, error was:
java.lang.OutOfMemoryError: GC overhead limit exceeded
2017-02-15 14:42:31,905 Stardog.Shutdown.Hook FATAL Unable to register shutdown hook because JVM is shutting down. java.lang.IllegalStateException: Cannot add new shutdown hook as this is not started. Current state: STOPPED
	at org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)
	at org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)
	at org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:240)
	at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:201)
	at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)
	at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
	at org.apache.logging.log4j.LogManager.getContext(LogManager.java:177)
	at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103)
	at org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43)
	at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
	at org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:284)
	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:304)
	at com.complexible.common.io.MMapUtil.<clinit>(MMapUtil.java:37)
	at com.complexible.common.io.AbstractSegmentedFile.close(AbstractSegmentedFile.java:152)
	at com.complexible.stardog.dht.dictionary.MMapValueFile.close(MMapValueFile.java:143)
	at com.complexible.stardog.dht.dictionary.HashDictionary.close(HashDictionary.java:439)
	at com.complexible.stardog.index.dictionary.DelegatingMappingDictionary.close(DelegatingMappingDictionary.java:52)
	at com.complexible.stardog.index.AbstractIndex.close(AbstractIndex.java:509)
	at com.complexible.stardog.index.disk.DiskIndex.close(DiskIndex.java:83)
	at com.complexible.stardog.RdfSystemInfo._close(RdfSystemInfo.java:259)
	at com.complexible.stardog.AbstractSystemInfo.close(AbstractSystemInfo.java:169)
	at com.complexible.stardog.StardogKernel.shutdown(StardogKernel.java:2081)
	at com.complexible.stardog.KernelShutdown.shutdown(KernelShutdown.java:61)
	at com.complexible.stardog.KernelShutdown.access$000(KernelShutdown.java:27)
	at com.complexible.stardog.KernelShutdown$1.run(KernelShutdown.java:42)

Hi Max,

If the query is causing an OOM issue, then killing it would not work, which seems to be what’s happening. The upcoming Stardog 5 contains native memory management and would avoid this altogether. That said, the way the query is written makes it fairly difficult to optimize, which could be leading to the slowness, timeouts, and OOM.

Could you post the query plan for one of these queries (stardog query explain myDb "myQuery")? That would help us to figure out how the query could be better tweaked for optimal performance.

Hi Stephen,

thanks a lot for the fast reply. So I’m really excited about stardog 5 then. I thought stardog uses already its own memory management (leveraging the off-heap memory)? I ran the command you mentioned for the query from my first post, but prefixed some URIs in the output (2 links restriction of this forum).

The Query Plan:

prefix dbo: http://dbpedia.org/ontology/
prefix dbr: http://dbpedia.org/resource/

From all
Distinct [#2.1K]
`─ Projection(?company1) [#2.1K]
   `─ Filter((?company1 != ?company2 && ((?label1 = ?label2 && STRLEN(?label1) > "3"^^xsd:integer) || (?city1 = ?city2 || ?country1 = ?country2)))) [#2.1K]
      `─ HashJoinOuter(?company1) [#4.2K]
         +─ HashJoinOuter(?company2) [#4.2K]
         │  +─ DirectHashJoinOuter(?company2) [#4.2K]
         │  │  +─ LoopJoin(_) [#4.2K]
         │  │  │  +─ MergeJoin(?company2) [#65]
         │  │  │  │  +─ Scan[POS](?company2, dbo:industry, dbr:Advertising) [#65]
         │  │  │  │  `─ Scan[POS](?company2, rdf:type, dbo:Company) [#37K]
         │  │  │  `─ MergeJoin(?company1) [#65]
         │  │  │     +─ Scan[POS](?company1, dbo:industry, dbr:Advertising) [#65]
         │  │  │     `─ Scan[POS](?company1, rdf:type, dbo:Company) [#37K]
         │  │  `─ LoopJoin(_) [#37021.2M]
         │  │     +─ Scan[PSO](?company1, rdfs:label, ?label1) [#192K]
         │  │     `─ Scan[PSO](?company2, rdfs:label, ?label2) [#192K]
         │  `─ LoopJoin(_) [#1148.3M]
         │     +─ Scan[PSO](?company1, dbo:locationCity, ?city1) [#34K]
         │     `─ Scan[PSO](?company2, dbo:locationCity, ?city2) [#34K]
         `─ LoopJoin(_) [#899.3M]
            +─ Scan[PSO](?company2, dbo:locationCountry, ?country2) [#30K]
            `─ Scan[PSO](?company1, dbo:locationCountry, ?country1) [#30K]

Cheers,
Max

Stardog does use off-heap memory presently but in Stardog 5 there will be no GC for query evaluation. See https://blog.stardog.com/saying-goodbye-to-garbage/ for details.

Hi Max,

As Stephen mentioned, the slowness is due to the way the query is written and the query plan reflects this issue. See this blog post about the details of query plans. The issue here is having these kinds of OPTIONAL blocks:

OPTIONAL{
?company1 rdfs:label ?label1 .
?company2 rdfs:label ?label2 .
}

There are no join variables between the triple patterns inside the OPTIONAL which results in very expensive loop joins that compute cartesian products. If there were two OPTIONALs instead of one then the patterns would be joined with other selective patterns in the query. We have an open ticket to do this kind of transformation in the query planner automatically but it is also very easy to do manually.

I did the rewriting for your query and ran it against the version of DBpedia database I have (100M triples). It returned 83 results in under a second. Here is the full rewritten query (note that bound filters are added after splitting the optionals):

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT DISTINCT ?company1 ?company2
WHERE { 

?company1 rdf:type dbo:Company .
?company2 rdf:type dbo:Company .
?company1 dbo:industry dbr:Advertising .
?company2 dbo:industry dbr:Advertising .
 
OPTIONAL{ ?company1 rdfs:label ?label1 }
OPTIONAL{ ?company2 rdfs:label ?label2 } 
OPTIONAL{ ?company1 dbo:locationCity ?city1 }
OPTIONAL{ ?company2 dbo:locationCity ?city2 }
OPTIONAL{ ?company1 dbo:locationCountry ?country1 }
OPTIONAL{ ?company2 dbo:locationCountry ?country2 }

FILTER (?company1!=?company2)
FILTER ((?label1=?label2 && STRLEN(?label1)>3) || 
        (?city1=?city2 && bound(?city1)) || 
        (?country1=?country2 && bound(?country1))) 
}

Best,
Evren