Bulk Loading with Docker and Stardog

Hello,

I decided to try out the Docker version with Stardog to have a smoother experience (especially around the Java version which is tied to 8 due to JVM things that Stardog does for performance reasons). Somethings are good and some need a bit of polish. Since my dataset is fairly static, I tend to do changes to TRiG files and then do a bulk load when something changes. This is fine for the most part but when using Docker to do this, you need to know a couple of things and there are some problems. First with the tips. If you want to do bulk loading when using a docker container, you need to execute a command like this:

docker exec -it <container id> /opt/stardog/bin/stardog data add <db name> <files>

However, if you have filenames with accents in them (like I do), you will run into the following error:

Malformed input or input contains unmappable characters: <directory and file name>

Partly, this seems due to the fact that the container does not set its LANG environment variable:

[root@.... /]# locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

You can set the LANG variable when you use docker exec by using the -e like so:

docker exec -e LANG="en_US.UTF-8" -it <container id> /opt/stardog/bin/stardog data add

However, this gets me past most of the issues but I am now getting the malformed input error farther along the processing so I will continue to investigate. I hope this helps someone else and maybe someone can look into this and see if I have missed something simple.

You can try adding -e STARDOG_JAVA_ARGS=" -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8"

although I'm not sure why you're having this problem. I'm also not sure what JVM it's running so the second argument might not be applicable but it shouldn't hurt.

Thanks for this!

Sadly, I got the same error as before with this. I was able to create a db with the normal non-Docker version. Using docker exec to get into the container, I ran /usr/bin/java -version I get

[root@... /]# /usr/bin/java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

For the version. I forgot to add that I was able to use the zip file version to load my data so I believe this is specific to the way the Docker container is setup.

can you try adding -e LC_ALL="en_US.UTF-8" ?

Yes, I ran this as well:

docker exec -e LANG="en_US.UTF-8" -e LC_ALL="en_US.UTF-8" -e STARDOG_JAVA_ARGS=" -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8

You might want to copy the STARDOG_JAVA_ARGS and also add them to STARDOG_JAVA_SERVER_ARGS. I don't know if they're additive or exclusive for client/server. ie. S_J_A is just for client and S_J_S_A is just for server or if S_J_A is for the client and the server gets S_J_A + S_J_S_A.

It doesn't sound like the problem is on the server side but I'm just pokin' round here.

EDIT: It looks like if you don't set the server args you get whatever you set for java args. If you do set server args you server runs with exactly what you set server args to be. It uses java args as a default but isn't additive. That makes sense. I don't know why I would think it would work any other way.

Sorry, I have not had a chance to look into this more but I had problems like this before with Stardog. I will try to sort this out at some future point but if anyone else had this problem, they will have at least something to go on.

Thanks for your help.