Creating a database with .gz or .zip does not load any/all the triples

I created two compressed files (archive.zip and archive.ttl.gz) with the same contents (two different files, TestClasses5.ttl and TestDisjoint.ttl). If I double-click either of the compressed files, they expand correctly to the two different files.

But, creating a db in Stardog (4.2.4 or 5.0.1) by loading the .gz file succeeds but nothing is loaded:

./stardog-admin db create -n foo /test/resources/classTestFiles/archive.ttl.gz
Bulk loading data to new database foo.
Errors were encountered during loading:
/test/resources/classTestFiles/archive.ttl.gz: Object for statement missing [line 1]
Loaded 0 triples to foo from 1 file(s) in 00:00:00.246 @ 0.0K triples/sec.
Successfully created database 'foo'.

Creating a db by loading the .zip file also succeeds, but only the first file is loaded:

./stardog-admin db create -n foo /test/resources/classTestFiles/archive.zip
Bulk loading data to new database foo.
Errors were encountered during loading:
/test/resources/classTestFiles/archive.zip: Expected an RDF value here, found '' [line 1]
Loaded 9 triples to foo from 1 file(s) in 00:00:00.276 @ 0.0K triples/sec.
Successfully created database 'foo'.

I am attaching the files that I used to create the two compressed formats.
TestClasses5.ttl (793 Bytes)
TestDisjoint.ttl (533 Bytes)

Maybe I am doing something stupid??
Thanks for any info.
Andrea

There’s nothing immediately wrong. I was able to successfully load the two files, and loaded both compressed versions in 5.0.1. It could something wrong with how you created the compressed files. Can you include the compressed files that you’re having trouble with? It seems like there is some sort of parsing error. The most likely thing for the gz file is if you concatenated the files. There aren’t any blank nodes so you’re safe there but depending on the order and if there are prefixes you can have trouble. I don’t have any guesses about what the problem with the zip file might be other than if it was in a sub directory. I don’t remember if zip files will load recursively or if the files need to be at the top.

I can't send the files via Stardog Union since they aren't allowed
extensions. I uploaded them to Google docs. Here are the links:

archive.ttl.gz - created via the command line, tar -cvzf archive.ttl.gz
TestClasses5.ttl TestDisjoint.ttl
https://drive.google.com/file/d/0B_YmYND84ZBpakU4WUV5RjZMbXM/view?usp=sharing

archive.zip - created via compressing 2 files on the mac in the Finder
window
https://drive.google.com/file/d/0B_YmYND84ZBpbjN5bko3Wmwwb1E/view?usp=sharing

Thanks.

Andrea Westerinen
T: 425.891.8407
arwesterinen@gmail.com or andreaw@ninepts.com
organizingknowledge.blogspot.com

I see the problem. You’ve picked up some hidden OS X files. Specifically

TestClasses5.ttl
__MACOSX/
__MACOSX/._TestClasses5.ttl
TestDisjoint.ttl
__MACOSX/._TestDisjoint.ttl

Stardog does recursively descend the directory structure to load any files it can find and it’s picking these files up. It looks like they’re binary and the load is bailing as soon as it hits one of these files. This is for the zip file, the gz file is failing because it looks like these binary files have been cat’ed together beginning with one of the binary files so it bails before loading anything.

1 Like

Here’s some additional information for you and anyone coming across this in the future. The __MACOSX folders are called resource forks and contain additonal metadata that OSX will use when decompressing the files.

You can either fix this from the command line after the fact with the following

zip -d archive.zip "__MACOSX*"

The command line version of zip should not be aware of resource forks and they shouldn’t be included.

Thanks. Should have looked closer.

Indeed, that was something stupid!

Andrea Westerinen
T: 425.891.8407
arwesterinen@gmail.com or andreaw@ninepts.com
organizingknowledge.blogspot.com