so ideally, the " was percent encoded in the source data. So I'm wondering if those lines have been ignored, but I don't know if this is possible in Stardog.
If you know that some data is there, maybe you can first try to find the resource backwards, i.e. based on some relation * value? Maybe some label via rdfs:label or similar things?
And I am quite sure the loading process went smooth (no errors in the logs when adding the data)
I have attached a simple example ttl file if someone wants to reproduce this problem. test.ttl (777 Bytes)
Querying the data using a different query is not really an option for my use case...
It would be better for me to remove all these special characters in the data, but that is rather an ugly solution (the data is loaded from the yago2 benchmark dataset)
well, I tested your test dataset with the Apache Jena toolkit from cli via riot --output=N-Triples test.ttl
and it fails as expected with
09:23:45 ERROR riot :: [line: 20, col: 9 ] Illegal character in IRI (codepoint 0x22, '"'): <Leroy_["]...>
so I'm wondering why Stardog loader does not fail here. But yes, maybe I'm missing something, so I'd wait until the smarted people here and the Stardog devs will help you - should not take that long, those people are fast and great.
Ok thanks for testing,
you have set the output argument to N-Triples but the example file is in a turtle format, but I guess the error will be the same.
I have loaded the dataset directly using the command line (and a second time using Stardog studio)
I will wait for the Stardog people to help
it's just the output format in the command line after successful parsing
Well, maybe Apache Jena is too strict (which I doubt) or maybe Stardog is too relaxed. Or it does percent encoding for you, I don't know. But at least the parsing of the SPARQL query fails when using quotes in URIs, and thus you can't query it.
I'm interested in the solution as well, always happy to learn.
I'm running 7.2.0 and can successdfully load the test file with the double quotes
$> stardog data add test test.ttl
$> stardog query test 'select * { ?s ?p ?o }
and there it is, quotes and all...
| s | p | o |
+--------------------------------------------------------+----------------------------------------------+-----------------------------------------+
| http://yago-knowledge.org/resource/Leroy_"Twist"_Casey | http://yago-knowledge.org/resource/hasGender | http://yago-knowledge.org/resource/male |
+--------------------------------------------------------+----------------------------------------------+-----------------------------------------+
Stardog definitely does some checks because if you change the double quotes to spaces it complains but if you change them to single quotes it loads it as well.
I suspect that it complains about the space, not because it's an invalid url character but because it causes a parse error and that possibly it's probably not checking IRI's because it would be very resource intensive when loading data.
Honestly, for me this is more like a workaround, not that I understand why the query parser doesn't fail it this point. That looks like an inconsistent behaviour, doesn't it?
I mean, ideally, you shouldn't be able to load ill-formed data (ok ideally there is no ill-formed data at all) - I understand that the parsing is expensive, but here we have a case where loading works, but then querying doesn't work (without a workaround). So one could ask why the query parser isn't that forgiving as the parser during loading is. So, we don't have a proper round trip here.
Any ideas/comments on this?
Yeah, I agree with your statement.
There is somehow a mismatch between the query parser and data parser (which should be more appropriately handled imo).