Yago TTL loading and strict.parsing

SD 5.0.2 here. I have created a new DB for Yago content with strict.parsing=false, as Yago is rife with IRI encoding issues. Using stardog-admin db create … appears to work as intended: parsing errors are noted but the loading proceeds, e.g.:


/opt/data/cg/yago/taxo/yagoTypes.ttl: IRI includes string escapes: ‘\92’ [line 474592]
/opt/data/cg/yago/core/yagoLabels.ttl: IRI includes string escapes: ‘\92’ [line 30384]
/opt/data/cg/yago/taxo/yagoTaxonomy.ttl: IRI includes string escapes: ‘\110’ [line 475602]
/opt/data/cg/yago/taxo/yagoSimpleTypes.ttl: IRI includes string escapes: ‘\92’ [line 995714]
Loaded 12,956,423 triples to CGDEV from 10 file(s) in 00:02:32.296 @ 85.1K triples/sec.
Successfully created database ‘CGDEV’.

But loading additional files after db creation fails, like so:

pierluigi@elastic-eda-1:/opt/data/cg/yago/geon$ ~/sd/bin/stardog data add CGDEV -u cgdev -P yagoGeonamesClasses.ttl yagoGeonamesClassIds.ttl yagoGeonamesEntityIds.ttl yagoGeonamesGlosses.ttl yagoGeonamesTypes.ttl

Adding data from file: yagoGeonamesTypes.ttl
Adding data from file: yagoGeonamesEntityIds.ttl
Adding data from file: yagoGeonamesGlosses.ttl
Adding data from file: yagoGeonamesClasses.ttl
Adding data from file: yagoGeonamesClassIds.ttl
There was a fatal failure during preparation of 585dc5fd-04b1-4d3e-84fd-ab4330ec15cf org.openrdf.rio.RDFParseException: IRI includes string escapes: ‘\92’ [line 1170651]

I am not entirely sure how strict.parsing is intended to work, but the difference in behavior between these 2 ops is not something I would expect.

Thanks!
Pierluigi

Can you confirm that there are any triples in the database after your db create? If I try to create a db with just yagoLabels.ttl, I see the parse error(s) and get an empty db. This would at least be consistent with the data add behavior you’re seeing.

For what it’s worth, I have pinpointed the cause of this not only to the bad encoding of Yago, but the Turtle parser we use. The version we are using does not contain a workaround for this particular problem.

On another side note, strict.parsing=false is not intended to ignore invalid IRIs; it ignores datatype errors like { :Foo :hasProp "Forty-two"^^xsd:integer }

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.