Odd result with short form construct query

Hi,
I don’t know if this is a bug or if I just don’t properly understand how to use CONSTRUCT queries. I have a database that includes these triples (among many others)

<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasNCT> "NCT01661400" .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasPrimaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400PrimaryOutcome1> .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasPrimaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400PrimaryOutcome2> .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasSecondaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400SecondaryOutcome1> .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasSecondaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400SecondaryOutcome2> .

When I run this query

construct   where {
                  ?t fct:hasNCT "NCT01661400". 
                  ?t fct:hasPrimaryOutcome ?pout.       
                  ?t fct:hasSecondaryOutcome ?sout.
}

I get this odd result

<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasNCT> "NCT01661400" .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasPrimaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400PrimaryOutcome1> .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasSecondaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400SecondaryOutcome1> .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasPrimaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400PrimaryOutcome2> .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasPrimaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400PrimaryOutcome1> .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasSecondaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400SecondaryOutcome2> .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasPrimaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400PrimaryOutcome2> .

PrimaryOutcome1 and 2 are listed twice.

If I run this query

construct  {
                ?t fct:hasPrimaryOutcome ?pout.       
                ?t fct:hasSecondaryOutcome ?sout.}
where {
                ?t fct:hasNCT "NCT01661400". 
                ?t fct:hasPrimaryOutcome ?pout.       
                ?t fct:hasSecondaryOutcome ?sout.
}

then I get

<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasSecondaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400SecondaryOutcome1> .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasPrimaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400PrimaryOutcome2> .
<http://stjohns.edu/fullclinicaltrial/NCT01661400> <http://stjohns.edu/fullclinicaltrial/hasSecondaryOutcome> <http://stjohns.edu/fullclinicaltrial/NCT01661400SecondaryOutcome2> .

with no repetition, which is what I expect given the data.

What am I doing wrong?

You’re doing nothing wrong. Given that the result of a CONSTRUCT query is in fact an RDF graph resp. a set of RDF triples, it’s simply a bug to get duplicate triples as result.

In fact, CONSTRUCT in SPARQL does not guarantee that there will be no duplicate triples in the result (in spite of RDF being defined as a set of triples). This is fairly counterintuitive and caused some pretty lengthy discussions on W3C mailing lists during the original standardization process. Please see here:

https://lists.w3.org/Archives/Public/public-rdf-dawg/2007OctDec/0030.html

Stardog applies a kind of a REDUCED filter to prune out some duplicates but there're no guarantees either. We have an open ticket #5681 to substantially improve it.

We'll also look at this example specifically to make sure there's no separate bug. However, in general, an application cannot assume that CONSTRUCT result sets are duplicate-free.

Cheers,
Pavel

OK, interesting. I am new to Sparql so I did not realize this is an issue. I am using the short form mainly because the query specifies a large graph (my examples pruned out most of it) and is really ugly if I have to include it both in the construct clause and the where clause. Thanks

Hi Bonnie,

Which version of Stardog do you use? I get the same results (with duplicates) for both versions of the query using 5.3.3.

I don’t have an explanation why PrimaryOutcome1 isn’t returned for you for your 2nd query, looks like differences in the data.

Thanks,
Pavel

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.