Same blank node generation for multiple JSONLD datasets

While working on data in JSON-LD and turtle, I found that SPARQL queries are producing incorrect results for JSON-LD while correct results for turtle format.

The following query produces 12 rows for 4 separate datasets in JSON-LD for a SWRL rule while it produces 9 rows for the same datasets in turtle syntax. For example, out of 12 rows the triple indicating http://sampledata.com/id_3 ex:back http://sampledata.com/id_2 should not be inferred.

PREFIX schema: https://schema.org/
PREFIX ex: http://example.org/
SELECT *
WHERE {
?data1 ex:back ?data2 .
}

One of the reasons may be incorrect generation of blank nodes for JSONLD source data. For example, the following SPARQL query shows that instead of generating 4 distinct blank nodes, same blank node 'b0' is generated for each of the distinct datasets.

PREFIX schema: https://schema.org/
PREFIX ex: http://example.org/
SELECT *
WHERE {
?data ex:generatedBy ?node .
}

All the test queries were run on stardog-7.0.1 and the stardog-studio.

I have attached the ontology, 4 distinct datasets in both JSONLD and turtle format, and 4 screenshots of the above results from the queries.testcases-jsonld-vs-turtle.zip (129.1 KB)

Hi,

After looking into this a bit, it seems that this is a general shortcoming of the JSON-LD spec itself. Our parser is using the specified algorithm to generate bnode IDs; it just happens that the spec doesn't account for additional JSON-LD documents being loaded into the same graph, so it recommends that the counter start at 0 for each load.

I've opened up a ticket internally to look into this, but in the short term I would suggest loading the Turtle data whenever possible.

Thanks for the report!