If we copy all the data from the data.resc.info/kro graph into a separate database the performance is fine.
I've looked at the query plans and for both DB's they look the same.
Any hints on how to improve the performance are welcome.
I suspect there’s a difference in the query plans, specifically in the scan index: one scan uses SC and the other SPO. Could you check this and send the plans along if that’s not the case.
Is it true that your named graph contains most of the 19M triples?
no the database consists of 12 Named graphs, having between 4.3M and 15k triples each.
The graph we select from contains about 4.3M triples, moving these to a separate DB helps with performance
This is indeed strange. Usually such large performance difference manifest themselves in plans. Any chance we can get the data (possibly in the obfuscated form) to reproduce this on our end? How long does the query take on the separate databases?
Also, depending on how the data looks like (e.g. the number of distinct subject IRIs), it might make sense to push the distinct into a subquery to first get unique subjects before the expensive filtering.
OK, but that query isn't equivalent to the original one, which only asks for subjects. If you only need subjects, then the second ?s ?p ?o isn't needed and things would be faster. If you also need predicates and objects, then you need it (you can also use DESCRIBE, by the way).
Sure, drop me a line at pavel@stardog.com and we can discuss how to do it, I can also explain the obfuscation thing. I'm on the road today so might be some delays in responding, sorry about that!
You are right, the original query was a describe, but we broke that down to get a minimal query to replicate the problem.
Would there be a performance benefit in describe ?
If I omit the graph clause then the query runs into problems again. In the end causing unrecoverable memory errors where the only resolution is to completely reboot the graph server. This is something which is obviously bothering us.
Any advice is greatly appreciated.
Log output:
Exception in thread "XNIO-1 task-5" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.zip.InflaterInputStream.<init>(InflaterInputStream.java:88)
at java.util.zip.ZipFile$ZipFileInflaterInputStream.<init>(ZipFile.java:393)
at java.util.zip.ZipFile.getInputStream(ZipFile.java:374)
at java.util.jar.JarFile.getManifestFromReference(JarFile.java:199)
at java.util.jar.JarFile.getManifest(JarFile.java:180)
at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:981)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:450)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.logging.log4j.core.impl.MutableLogEvent.getThrownProxy(MutableLogEvent.java:338)
at org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:61)
at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:38)
at org.apache.logging.log4j.core.layout.PatternLayout$PatternSerializer.toSerializable(PatternLayout.java:333)
at org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:232)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:217)
at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:57)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:177)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:170)
at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:161)
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:448)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:433)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
Exception in thread "XNIO-1 I/O-2" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1742)
at java.util.HashMap.putVal(HashMap.java:630)
at java.util.HashMap.put(HashMap.java:611)
at java.util.HashSet.add(HashSet.java:219)
at sun.nio.ch.EPollSelectorImpl.updateSelectedKeys(EPollSelectorImpl.java:131)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:98)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:522)
Exception in thread "XNIO-1 task-9" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "XNIO-1 task-2" java.lang.OutOfMemoryError: GC overhead limit exceeded