I wondering which is the best approach to validate data for cases where a unique identifier value is not unique in the data. Take the example data below, where Person_3 has more than one UID. This is easy to detect this violation in SHACL, using sh:minCount and sh:maxCount for the sh:targetClass :Person.
:Person_1
a :Person ;
:hasUniqueID :UID_A .
:Person_2
a :Person ;
:hasUniqueID :UID_B .
:Person_3
a :Person ;
:hasUniqueID :UID_C .
:hasUniqueID :UID_D .
:Person_4
a :Person ;
:hasUniqueID :UID_A
The case for Person_1 and Person_4 sharing the same UID is more difficult.
Possible Solutions:
OPTION 1. SHACL-SPARQL
I am having no success translating this SPARQL query, which correctly detects UID_A as the offender, into SHACL-SPARQL. Where am I going wrong?
SELECT $this (COUNT($this) AS ?count)
WHERE{
?personIRI a :Person ;
:hasUniqueID $this .
} GROUP BY $this
HAVING (?count >1)
OPTION 2 : Use a reasoner?
Define an "owl:inverseOf" for :hasUniqueID as :UIDAssignedTo , turn on the reasoner and do a sh:maxCount 1 for the sh:path :UIDAssignedTo, detecting that:
I shouldn't take credit because it was @evren who pointed out the most elegant solution here: just define sh:maxCount 1 on the path :hasUniqueID/^:hasUniqueID for the target class :Person. This is basically equivalent to declaring :hasUniqueID an inverse functional object property and using that as an ICV constraint (prior to SHACL).
One final thing here is that performance of SHACL validation was substantially improved in Stardog 6.2+. In 6.1 it's still a beta. Depending on the size of your data and/or the number of constraints, you may want to consider an upgrade.