Running out of space when loading Wikidata

Hi,

I am loading Wikidata into Stardog using a Docker container created as follows:

$ docker run -it \
  -v ~/data/wikidata/stardog/stardog-home/:/var/opt/stardog \
  -v ~/data/wikidata/wikidata-20230522-all-BETA-splitted/:/wikidata \
  -p 5820:5820 \
  -e STARDOG_SERVER_JAVA_ARGS="-Xmx60g -Xms60g -XX:MaxDirectMemorySize=160g" \
  stardog/stardog

The /wikidata directory contains a Wikidata dump split into several small ttl.gz files.

I also created a stardog.properties file in the /opt/var/stardog directory with the following lines:

memory.mode=bulk_load
strict.parsing=false

Then, I started loading in the container:

$ /opt/stardog/bin/stardog-admin db create -n wikidata /wikidata/x*.nt.gz

After starting the load, I got the following error message:

java.io.IOException: No space left on device

So, I checked the space in the container, and it says:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
overlay               3.9G  1.9G  1.8G  53% /
tmpfs                  64M     0   64M   0% /dev
shm                    64M     0   64M   0% /dev/shm
/dev/mapper/vg0-lv01  5.4T  2.2T  3.1T  42% /wikidata
/dev/mapper/vg0-lv02  3.9G  1.9G  1.8G  53% /etc/hosts
tmpfs                 126G     0  126G   0% /proc/acpi
tmpfs                 126G     0  126G   0% /proc/scsi
tmpfs                 126G     0  126G   0% /sys/firmware

My first question is why /var/opt/stardog does not appear as a mounted directory with the df command. My second question is which disk is getting out of space. Since /var/opt/stardog corresponds to a folder in the host system, it should have enough space. I guess that it can be related to a temporary directory used by Stardog when loading data. Perhaps, it is the temporary directory for IO using by Java, which is described in the link below, and can be set in stardog.properties using the property java.io.tmpdir. If this is the case, then I could just create a volume from this directory. Am I right?

Try adding the following to your stardog.properties file and rerun.

storage.wal.total.size = 104857600

Matthew

I got the same error:

INFO  2023-08-13T10:59:01,748+0000 [stardog-user-1] com.complexible.stardog.protocols.http.server.StardogUndertowErrorHandler:accept(71): [OPERATION_FAILURE] Server encountered an error
java.io.IOException: No space left on device
        at sun.nio.ch.FileDispatcherImpl.pwrite0(Native Method) ~[?:?]
        at sun.nio.ch.FileDispatcherImpl.pwrite(FileDispatcherImpl.java:68) ~[?:?]
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:109) ~[?:?]
        at sun.nio.ch.IOUtil.write(IOUtil.java:79) ~[?:?]
        at sun.nio.ch.FileChannelImpl.writeInternal(FileChannelImpl.java:850) ~[?:?]
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:836) ~[?:?]
        at com.complexible.common.nio.Channels2.writeFullyAtPosition(Channels2.java:48) ~[stardog-utils-common-9.1.0.jar:?]
        at com.complexible.common.io.disk.ChannelFile.writeInternal(ChannelFile.java:195) ~[stardog-utils-common-9.1.0.jar:?]
        at com.complexible.common.io.disk.ChannelFile.write(ChannelFile.java:191) ~[stardog-utils-common-9.1.0.jar:?]
        at com.complexible.common.io.impl.SinglePage$Writer.flush(SinglePage.java:228) ~[stardog-utils-common-9.1.0.jar:?]
        at com.complexible.stardog.index.disk.compression.AbstractDataCompressor$2.flush(AbstractDataCompressor.java:273) ~[stardog-9.1.0.jar:?]
        at com.complexible.common.io.impl.PageObjectWriterImpl.flushPage(PageObjectWriterImpl.java:109) ~[stardog-utils-common-9.1.0.jar:?]
        at com.complexible.common.io.impl.PageObjectWriterImpl.write(PageObjectWriterImpl.java:72) ~[stardog-utils-common-9.1.0.jar:?]
        at com.complexible.common.io.impl.AbstractObjectWriter.write(AbstractObjectWriter.java:25) ~[stardog-utils-common-9.1.0.jar:?]
        at com.complexible.stardog.index.disk.btree.impl.DiskExternalSorter.writeBucket(DiskExternalSorter.java:191) ~[stardog-9.1.0.jar:?]
        at com.complexible.stardog.index.disk.btree.impl.DiskExternalSorter.flushNow(DiskExternalSorter.java:176) ~[stardog-9.1.0.jar:?]
        at com.complexible.stardog.index.disk.btree.impl.DiskExternalSorter.lambda$flush$0(DiskExternalSorter.java:150) ~[stardog-9.1.0.jar:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74) ~[guava-31.1-jre.jar:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) ~[guava-31.1-jre.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]

Would you obtain a diagnostics report and attach it to a direct message?

Matthew

https://docs.stardog.com/stardog-admin-cli-reference/diagnostics/diagnostics-report

Daniel,

I read the original email this morning before coffee. Missed your direct questions in the last paragraph. My apologies.

I have looked at the diagnostics report and believe you are correct about java.io.tmpdir.

It might be easier to:

  1. create a java_tmp directory within /var/opt/stardog
  2. then add "-Djava.io.tmpdir=/var/opt/stardog/java_tmp" to your existing STARDOG_SERVER_JAVA_ARGS
  3. restart stardog server
  4. try db create again

The stardog.properties setting I sent you earlier can stay. It might help if your total disk usage gets high once the database is properly loading.

You are not seeing /var/opt/stardog in the df -h command because it is mounted to the same location as /wikidata. Here is a portion of the output from the "mount" command:

/dev/mapper/vg0-lv01 /wikidata ext4 rw,relatime,errors=remount-ro 0 0
/dev/mapper/vg0-lv02 /etc/resolv.conf ext4 rw,relatime 0 0
/dev/mapper/vg0-lv02 /etc/hostname ext4 rw,relatime 0 0
/dev/mapper/vg0-lv02 /etc/hosts ext4 rw,relatime 0 0
/dev/mapper/vg0-lv01 /var/opt/stardog ext4 rw,relatime,errors=remount-ro 0 0

Both /var/opt/stardog and /wikidata are mounted on device /dev/mapper/fb0-lv01. So only one is used as the label in df -h command. The same is true for the 3 mounts to vg0-lv02, only one appears in df -h.

Matthew

Thank Mathew! It seems it is working because directory java_tmp is growing. I will inform you if the loading was successful.

Daniel

I confirm that Stardog finished loading the Wikidata dump.

Loaded 17,954,260,186 triples to wikidata from 22,156 file(s) in 17:15:41.580 @ 288.9K triples/sec.
Successfully created database 'wikidata'.

Congratulations and thank you.

Matthew