I’m trying to use a virtual graph import from a SQLServer with parent-child relationships. I have one 1 table that describes the parents which should all be subjects with some predicate-object maps. Another table with the children which also have to be added as subjects with some singular predicate-object maps and one with a parentTriplesMap to the parent.
The join is made succesfully and I get a triple for every parent-child relationship. However, the child predicate-object maps are also added as many times as there are parent-child relationships.
From an SQL perspective, this is actually normal behavior (a 1 to many join will be forced to a matrix and replicate the 1 to match the many). However, for triples this shouldn’t have to be and is actually counter-intuative, why would I ever want to have the same exact triple in my graph multiple times?

Is this something that I can configure in Stardog or the R2RML format (sorry, not used SMS yet, if that provides a solution I will definitely switch).

Example data:
Parent = sample container. Child = sample. A sample can be present in multiple containers. A container can only contain a single sample.

Parent works fine, only the correct triples are loaded
predicateObjectMap --> samplename
predicateObjectMap --> samplebarcode
predicateObjectMap --> hasOwner --> Parent
If the sample is present in 5 containers, the samplename and samplebarcode triple are also replicated 5 times in the graph.

I know I can simply a totally seperate <#TriplesMap> based on the same table which will add the parent child relationships and only add samplename and samplebarcode in the current <#TriplesMap> but according to the way I understand the R2RML syntax, this shouldn’t be required.

Hope anyone can help.
Kr, Tim

You can't have the exact same triple triple in a graph multiple times. Adding the same triple multiple times would only result in adding a single triple. Now what it actually does and what the performance impact might be is another story. I think I see what you're getting at and I've often wondered what the difference is in providing an explicit join and simply mapping the values and letting the query planner do it. I would guess is that technically the query planner would come up with the exact same query plan but the explicit one is like a query hint but that's just a guess.

I don't think that SMS provides any more expressiveness than R2RML. It's mostly easier to read and write. (it might be somewhat less expressive as I believe there were some issues with mappings and blank nodes but that may have been fixed)

Hi Zachary,

thanks for your awnser. The issue described was actually using a virtual add not virtual import. When using virtual import the triple duplication is indeed negated by it constantly overwriting itself.
I have discussed this with support and this a performance decision not to include distinct in this situation. I will simply split up this part in 2 separate the ones that are supposed be singular from the others.

Kr, Tim

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.