Duplicate triples from CONSTRUCT queries

larry · December 1, 2020, 7:08pm

Hello,
I use CONSTRUCT queries to dump triples from my database and the resulting graphes include a lot of duplicate triples. Aren't graphes suppose to be sets of triples? And thus include not duplicates.
I am used to this with SELECT queries and use the DISTINCT modifier to avoid them.
I never had this issue with other triple stores I have used so far.
Thanks for your help.
Laurent.

LorenzB · December 2, 2020, 7:35am

The main reason for it might be just that deduplication is expensive. Especially for large graphs. Yes, duplicates might be counter-intuitive, but the meaning of the result is the same with and without duplicates, and that's basically all that matters first. Also, none of the RDF languages require uniqueness of triples, so parsers will work as expected (as long as they maintain the triples internally a sets)

Please the the open discussion for SPARQL 1.2 standard regarding this issue: CONSTRUCT DISTINCT & REDUCED · Issue #86 · w3c/sparql-12 · GitHub

pavel · December 2, 2020, 7:56am

+1 to what Lorenz said. CONSTRUCT outputs some serialisation of an RDF graph and that serialisation may contain duplicate triples. Discussions of that point go back to (at least) 2007: Re: Every CONSTRUCT is DISTINCT? from Lee Feigenbaum on 2007-10-15 (public-rdf-dawg@w3.org from October to December 2007)

The spec is right to not force uniqueness since that (typically) breaks streaming execution. You might get rid of some duplicates by wrapping your WHERE patterns in SELECT DISTINCT {variables occurring in the CONSTRUCT template} but, in general, not all.

Of course we can look into your particular query result more closely if you can provide a minimal reproducible test case.

Best,
Pavel

larry · December 2, 2020, 8:02am

Thank you all for the feedback and the pointers.
It makes sense. I was surprised since I did not experience this behaviour with other triple stores so far.

system · December 16, 2020, 8:02am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What will happen if the same csv is loaded twice into the virtual graph? Support	3	240	April 9, 2019
Remote construct query problem Support	3	622	May 12, 2017
How to add data to graph from a CONSTRUCT query Support	2	605	April 20, 2019
Virtual Graph Materialization Support	3	352	May 14, 2020
Poor performance on query for predicates Support	6	680	January 17, 2019

Duplicate triples from CONSTRUCT queries

Related topics