Import compressed CSV file

I am trying to import a large CSV file into Stardog 5.3.5. I followed the recommendations to compress the data (https://www.stardog.com/docs/#_loading_compressed_data), but that gives me

Invalid variable: XYZ
The detailed stack trace for the error is:
java.lang.IllegalArgumentException: Invalid variable: XYZ
	at com.complexible.stardog.virtual.api.RDFGenerator$TemplateParser.valueProviderFor(RDFGenerator.java:114)
[...]

I assume that the TemplateParser tries to scan the CSV file for the header but cannot handle the compressed version. It works with the uncompressed CSV file, so the SMS definition seems to be correct.

I don't think that Stardog supports compressed data from a virtual graph. The section in the documentation that you referenced is for loading rdf data directly into the database.

You could probably do it from the command line with something like the following

stardog-admin virtual import myDB cars.ttl <(unzip -c cars.csv.zip)

You are correct. The compressed data is only mentioned for direct import. But the virtual-import is not really "virtual". The docs state "unlike RDBMS tables, CSV files are supported only for importing into Stardog". It might be beneficial to read from a compressed CSV file during import. The issue seems to be the matching of the SMS template to the columns described in the header of the compressed file.

I'm a little confused, you seem to be saying that reading in compressed csv during virtual graph materialization works but that it's not reading in the file header but then suggest that it would be beneficial to be able to read from a compressed CSV during import. The documentation doesn't specifically say that reading compressed files for virtual import is supported but I'm not surprised that it works.

I'm guessing the reason that it's not reading in the header is because there could be several csv files in the compressed file so it doesn't have any way of knowing which one has the headers. If you know that there is only one file in there you could just use the bash cli command I gave, otherwise I think you'd have to use positional values for your columns.

Sorry for the confusion. I successfully imported an uncompressed CSV file using an SMS template. But trying the same with a compressed CSV file fails. So, this works

$ stardog-admin virtual import myDB cars.ttl cars.csv

and this does not

$ stardog-admin virtual import myDB cars.ttl cars.csv.zip

Since compression is currently not referenced in the section about virtual import of CSV files, it is not a bug but might be a missing feature.

The exception Invalid variable: XYZ seems to indicate that the import fails because the com.complexible.stardog.virtual.api.RDFGenerator$TemplateParser cannot read the compressed CSV file to figure out which template expression matches which column in the file.

Reading multiple files seems to be unrelated to this question, but will be my next step :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.