Reification on Inferred Triples

Hello all,

I'm looking for strategies to preserve reified statements when inferring new triples via reasoning. The data model I'm working with requires strict provenance tracking on all statements, which stardog's native reification functionality handles well. I'm hoping to find a path to associate inferred triples with the reified statements of the asserted triples from which they were inferred.

E.g., given the following toy ontology

  :Person a owl:Class .
  :Company a owl:Class .
  :shareholderOf a owl:ObjectProperty ;
    rdfs:domain :Person ;
    rdfs:range :Componany ;
    rdfs:label "Shareholder Of" .
  :hasShareholder owl:inverseOf :shareholderOf ;
    rdfs:label "Has Shareholder" .

  :personA :shareholderOf :companyB .

and inserting the following provenance data

INSERT { ?statement prov:wasAttributedTo :userC }
WHERE { BIND(stardog:identifier(:personA, :shareholderOf, :companyB) as ?statement) }

I can retrieve both the original asserted :shareholderOf and, using reasoning, the inferred :hasShareholder triples, e.g. via select * where { ?s :shareholderOf|:hasShareholder ?o }.

Also, I can retrieve the provenance data for the original asserted :shareholderOf statement with

  :personA :shareholderOf ?o
  BIND(stardog:identifier(:personA, :shareholderOf, ?o) as ?statement)
  ?statement prov:wasAttributedTo ?attribution

> ?o :companyB, ?attribution :userC

However, not surprisingly, asking the same question of the inferred :hasShareholder statement does not retrieve anything

  :companyB :hasShareholder ?o
  BIND(stardog:identifier(:companyB, :hasShareholder, ?o) as ?statement)
  ?statement :wasAttributedTo ?attribution

I'm wondering if there's a way to, generally, match inferred statements w/ the reified statements from which they were derived. Perhaps by building a proof tree to discover the original statements. Curious if anyone has any thoughts about the approach. Open to using any of the APIs.

I think what you’re looking for ia what is provided with the “Stardog reasoning explain” command.

I’d take a look at the javadocs to see if there is an api call for that.

Yes, it looks like using Proof Trees is sufficient to collect all reified statements about inferred triples. Thanks @zachary.whitley

The workflow is roughly:

  • issue query with reasoning enabled
  • for every statement returned, request it's proof tree
  • proof trees contain node statements that are either inferred or asserted. for all asserted statements, query for all reified statements

The above works reliably. The only drawback is that, for the Network API at least, it requires a number of separate requests: the initial search, one Proof Tree request per result statement, and a final request for all reified statements. So, a search that returns 10 statements will require a total of 12 requests. As far as I can tell, the SNARL API will require the same number of requests, though w/ less overhead.

Decreasing the number of requests would require

  • being able to request Proof Trees for multiple statements in one request
  • being able to request a Proof Tree w/i a SPARQL query

I've only run this against a toy dataset. Will need to test against a larger set to see if the above is a scalable strategy.

Looking back over your question I think you might have better luck using named graphs rather than statement identifiers. I think the problem is that you're trying to preserve semantics even after you've reified the statement with the identifier but as far as the reasoner is concerned it's now just an opaque identifier. Leaving it in a named graph would allow you to preserve the semantics.

You also might try something like

BIND(stardog:identifier(:personA, :shareholderOf, ?o) as ?statement) owl:sameAs   BIND(stardog:identifier(:companyB, :hasShareholder, ?o) as ?statement)

You'll need to attribute an inferred statement to every asserted individual in the proof tree since each one possibly contributes to that statement being asserted. In fact you might as well attribute it to the proof tree. It's not a great solution since it's not guaranteed to be consistent but you might get away with just taking the sha1 of the proof tree and using that as the identifier.

Hope that helps. :slight_smile:

Yeah, using named graphs might work. Haven't tried yet; I'd be curious to know how it scales, given that every statement of mine will have reified provenance data. I'd imagine the Context index (or indices?) would be the size of number of non-reified statements, which would be significantly heavier than the other indices (might nonetheless perform fine, will have to test). It would also be easier to serialize (as n-quads or trig).

Aliasing inferred statements via owl:sameAs would be harder, though, as it would essentially force me to materialize all inferred statements on write, which would make future writes harder to keep consistent.

So next steps I guess is perf testing...