Hello,
I use CONSTRUCT queries to dump triples from my database and the resulting graphes include a lot of duplicate triples. Aren't graphes suppose to be sets of triples? And thus include not duplicates.
I am used to this with SELECT queries and use the DISTINCT modifier to avoid them.
I never had this issue with other triple stores I have used so far.
Thanks for your help.
Laurent.
The main reason for it might be just that deduplication is expensive. Especially for large graphs. Yes, duplicates might be counter-intuitive, but the meaning of the result is the same with and without duplicates, and that's basically all that matters first. Also, none of the RDF languages require uniqueness of triples, so parsers will work as expected (as long as they maintain the triples internally a sets)
Please the the open discussion for SPARQL 1.2 standard regarding this issue: CONSTRUCT DISTINCT & REDUCED · Issue #86 · w3c/sparql-12 · GitHub
+1 to what Lorenz said. CONSTRUCT
outputs some serialisation of an RDF graph and that serialisation may contain duplicate triples. Discussions of that point go back to (at least) 2007: Re: Every CONSTRUCT is DISTINCT? from Lee Feigenbaum on 2007-10-15 (public-rdf-dawg@w3.org from October to December 2007)
The spec is right to not force uniqueness since that (typically) breaks streaming execution. You might get rid of some duplicates by wrapping your WHERE
patterns in SELECT DISTINCT {variables occurring in the CONSTRUCT template}
but, in general, not all.
Of course we can look into your particular query result more closely if you can provide a minimal reproducible test case.
Best,
Pavel
Thank you all for the feedback and the pointers.
It makes sense. I was surprised since I did not experience this behaviour with other triple stores so far.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.