I am trying to load the Biosamples dataset into Stardog 5.3.2 and running into the error below when Adding data from file: v20160912/bibo.owl that indicates a limit to something called entity expansions.
There was a fatal failure during preparation of 95435032-d10f-40bf-8f3f-40de05199b62 org.openrdf.rio.RDFParseException: The parser has encountered more than "100,000" entity expansions in this document; this is the limit imposed by the application. [line 147019, column 53]
Is it possible to configure this setting or is this a limit of Stardog?
Stardog is using the JDK XML parser and this is a frequent issue with it. The entity expansion limit prevents some forms of DOS attacks. You can increase the limit by adding -Djdk.xml.entityExpansionLimit=0 to your STARDOG_JAVA_ARGS env var and restarting the server.
I was just able to load your dataset in about 24 minutes on a local 5.3.3 instance. My $STARDOG_SERVER_JAVA_ARGS is set to -Xms8g -Xmx8g -XX:MaxDirectMemorySize=16g -Djdk.xml.entityExpansionLimit=0.
The main difference I probably had that you don’t is that in my stardog.properties file I have memory.mode=bulk_load. This is what we recommend when loading lots of data, and then you can comment it out after the load to give more priority to read operations.
As for the Write error: 4088 != 8191, that does seem like a disk space or corrupted index issue. If you try the load in a fresh STARDOG_HOME with these options set, are you able to load everything?