Increasing the Entity Expansion Limit


(Nolan Nichols) #1

Hi,

I am trying to load the Biosamples dataset into Stardog 5.3.2 and running into the error below when Adding data from file: v20160912/bibo.owl that indicates a limit to something called entity expansions.

There was a fatal failure during preparation of 95435032-d10f-40bf-8f3f-40de05199b62 org.openrdf.rio.RDFParseException: The parser has encountered more than “100,000” entity expansions in this document; this is the limit imposed by the application. [line 147019, column 53]

Is it possible to configure this setting or is this a limit of Stardog?

Cheers,

Nolan


(Jess Balint) #2

Hi Nolan,

Stardog is using the JDK XML parser and this is a frequent issue with it. The entity expansion limit prevents some forms of DOS attacks. You can increase the limit by adding -Djdk.xml.entityExpansionLimit=0 to your STARDOG_JAVA_ARGS env var and restarting the server.

Jess


(Nolan Nichols) #3

Gotcha, so I did this:

export STARDOG_HOME=/data/stardog
export STARDOG_SERVER_JAVA_ARGS="-Xmx8g -Xms8g -XX:MaxDirectMemorySize=24g -Djdk.xml.entityExpansionLimit=0"

But now seeing a separate parsing issue:

INFO 2018-08-03 23:06:16,744 [Stardog.Executor-7] com.complexible.stardog.index.Index:printInternal(314): Parsing triples: 19% complete in 00:01:25 (67.0M triples - 781.1K triples/sec)
WARN 2018-08-03 23:06:16,829 [Stardog.Executor-13] com.complexible.common.rdf.rio.RDFStreamProcessor:setException(586): Error during loading /data/v20160912/biosd_36.ttl: com.complexible.stardog.index.IndexException: com.complexible.stardog.index.IndexException: com.complexible.stardog.index.IndexException: Write error: 4088 != 8191
org.openrdf.rio.RDFHandlerException: com.complexible.stardog.index.IndexException: com.complexible.stardog.index.IndexException: com.complexible.stardog.index.IndexException: Write error: 4088 != 8191
at com.complexible.stardog.index.IndexUpdaterHandler.handleStatements(IndexUpdaterHandler.java:94) ~[stardog-5.3.2.jar:?]
at com.complexible.stardog.index.IndexUpdaterHandler.handleStatements(IndexUpdaterHandler.java:44) ~[stardog-5.3.2.jar:?]
at com.complexible.common.rdf.rio.RDFStreamProcessor$Consumer.work(RDFStreamProcessor.java:1028) [stardog-utils-rdf-5.3.2.jar:?]
at com.complexible.common.rdf.rio.RDFStreamProcessor$Worker.call(RDFStreamProcessor.java:784) [stardog-utils-rdf-5.3.2.jar:?]
at com.complexible.common.rdf.rio.RDFStreamProcessor$Worker.call(RDFStreamProcessor.java:773) [stardog-utils-rdf-5.3.2.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: com.complexible.stardog.index.IndexException: com.complexible.stardog.index.IndexException: com.complexible.stardog.index.IndexException: Write error: 4088 != 8191
at com.complexible.stardog.index.impl.ConcurrentIndexUpdater.values(ConcurrentIndexUpdater.java:74) ~[stardog-5.3.2.jar:?]
at com.complexible.stardog.index.IndexUpdaterHandler.handleStatements(IndexUpdaterHandler.java:88) ~[stardog-5.3.2.jar:?]
… 8 more


(Jess Balint) #4

Is your disk full…?


(Nolan Nichols) #5

Not from what I see:

ubuntu@stardog:~$ df -h
/dev/vdc 739G 152G 550G 22% /data

ubuntu@stardog:~$ echo $STARDOG_HOME
/data/stardog

Can you confirm that you are able to download this public dataset and load into one of your test instances?

ubuntu@stardog:~$ wget ftp://ftp.ebi.ac.uk/pub/databases/RDF/biosamples/biosd_rdf_v20*160912.tar.bz2


(Nolan Nichols) #6

Any additional thoughts? Anybody else try to load this dataset and able to reproduce my error?


(Jess Balint) #7

Hi Nolan,
Apologies for the delayed response. We’re going to try to load this dataset. Will let you know when we have an update.

Jess


(stephen) #8

Hi Nolan,

I was just able to load your dataset in about 24 minutes on a local 5.3.3 instance. My $STARDOG_SERVER_JAVA_ARGS is set to -Xms8g -Xmx8g -XX:MaxDirectMemorySize=16g -Djdk.xml.entityExpansionLimit=0.

The main difference I probably had that you don’t is that in my stardog.properties file I have memory.mode=bulk_load. This is what we recommend when loading lots of data, and then you can comment it out after the load to give more priority to read operations.

As for the Write error: 4088 != 8191, that does seem like a disk space or corrupted index issue. If you try the load in a fresh STARDOG_HOME with these options set, are you able to load everything?