Not able to fetch path

Hi,

I have made a stardog db with opennlp and with entity extraction for persons. Then i loaded a text document in the doc store of the db.

stardog-admin db create -o docs.opennlp.models.path=D:\Setups\Stardog\stardog-5.2.3\opennlp -n testDB1

stardog doc put --rdf-extractors tika,entities testDB1 D:\Sample\article1.txt

The text file has the following data:

Navratan knows Mukesh as both are colleagues in ABC. They also have lunch together and are working on the same project for cleint.

Now I am trying to check the path between the entities, but I am not getting any path as the output.

stardog query -f text testDB1 “PATHS START ?x = :Navratan END ?y VIA ?p”

±------±------±------+
| x | p | y |
±------±------±------+
±------±------±------+

Any idea?

Hi Navratan,

Before we dig in into the actual paths query, let’s debug which entities are being extracted from the text. Two questions:

  • what is the content of the D:\Setups\Stardog\stardog-5.2.3\opennlp folder?
  • what is the output of the following query select * where { graph ?g { ?s ?p ?o }}?

-pedro

Hi pedro,

D:\Setups\Stardog\stardog-5.2.3\opennlp folder has the following nlp models:

en-ner-person.bin
en-sent.bin
en-token.bin

Following is the output of the the query “select * where { graph ?g { ?s ?p ?o }}” when run on database testDB1:

g
Sort
s
Sort
p
Sort
o
Sort
stardog:docs:testDB1:article1.txt stardog:docs:testDB1:article1.txt rdf:type stardog:docs:Document
stardog:docs:testDB1:article1.txt stardog:docs:testDB1:article1.txt rdf:type owl:Thing

Kindly suggest

Thanks
Navratan

Hi Navratan,

That last query returns all the data in the database and, as you can see, there is nothing there besides simple metadata about the document itself.
The issue here is that the en-ner-person model doesn't recognise any of the two names in your example (it's specialized in english names). We provide other more general person-identification models, e.g., built from dbpedia, but can't guarantee that they will work with most non-english names, since they are trained with english language texts.
Models for other languages and domains are easy to learn, and I can provide some pointers if needed.

I would recommend changing the content of your example for now, and execute that select * query again, you'll get an idea on what kind of information is being extracted.

-pedro

Hey Pedro,

I changed the content of the file and the executed the select * query again, now it has detected the person names as per the nlp model:

g
Sort
s
Sort
p
Sort
o
Sort
stardog:docs:pathdb:article1.txt stardog:docs:pathdb:article1.txt stardog:docs:hasEntity stardog:docs:entity:f06574bbbfa1a5b474f276714e769027
stardog:docs:pathdb:article1.txt stardog:docs:pathdb:article1.txt stardog:docs:hasEntity stardog:docs:entity:679a56e43cd3beace9e4ba690824b055
stardog:docs:pathdb:article1.txt stardog:docs:pathdb:article1.txt DCMI: Format text/plain; charset=ISO-8859-1
stardog:docs:pathdb:article1.txt stardog:docs:pathdb:article1.txt stardog:docs:fileSize 132
stardog:docs:pathdb:article1.txt stardog:docs:pathdb:article1.txt DCMI: Identifier article1.txt
stardog:docs:pathdb:article1.txt stardog:docs:pathdb:article1.txt rdfs:label article1.txt
stardog:docs:pathdb:article1.txt stardog:docs:entity:f06574bbbfa1a5b474f276714e769027 rdfs:label Nick
stardog:docs:pathdb:article1.txt stardog:docs:entity:679a56e43cd3beace9e4ba690824b055 rdfs:label Mike
stardog:docs:pathdb:article1.txt stardog:docs:entity:f06574bbbfa1a5b474f276714e769027 rdf:type stardog:docs:ner:person
stardog:docs:pathdb:article1.txt stardog:docs:entity:679a56e43cd3beace9e4ba690824b055 rdf:type stardog:docs:ner:person
stardog:docs:pathdb:article1.txt stardog:docs:pathdb:article1.txt rdf:type FOAF Vocabulary Specification
stardog:docs:pathdb:article1.txt stardog:docs:pathdb:article1.txt rdf:type stardog:docs:Document
stardog:docs:pathdb:article1.txt stardog:docs:pathdb:article1.txt rdf:type owl:Thing
stardog:docs:pathdb:article1.txt stardog:docs:entity:f06574bbbfa1a5b474f276714e769027 rdf:type owl:Thing
stardog:docs:pathdb:article1.txt stardog:docs:entity:679a56e43cd3beace9e4ba690824b055 rdf:type owl:Thing

Now, i ran the path query again, stardog query -f text pathdb "PATHS START ?x = :Navratan END ?y VIA ?p"

but its returning 0 paths.

Hi Navratan,

As you can see in the results of the select * query, there is no :Navratan object in the database, therefore the PATHS query won’t be able to return any results.

I’m not sure what are you trying to achieve with the paths query, specifically because entities are leaf nodes in the graph. If you want to simply find which entities are present in the same document, a select query like this will work:

select * where {
    graph ?doc {
        ?doc stardog:docs:hasEntity [ rdfs:label ?label ].
    }
}

Pedro,

I want to see the relationships between various entities in my data, so that’s why was trying to do that with the help of path query. Can you suggest how can I check relationship between entities extracted from my text?

The only relationship you can extract is that two entities are in the same document, which is what the query I previously shared does.

If you wanted to automatically extract semantic relationships from your text, such as A knows B and A works at ABC, that task is called relation extraction, and is something that we don't support at the moment.
Your only option here would be to implement a custom extractor with the logic to extract such relationships.

Ok Pedro, understood for unstructured data.
Is it the same for the structured data as well which is saved in Stardog db in triples format?

If your data is structured as triples, you have a graph, and therefore can write all kinds of queries to find relationships between entities, including path queries.

Ok

Below is the link of a ttl file which i have used to save in stardog db.

https://raw.githubusercontent.com/stardog-union/stardog-examples/develop/examples/docs/blog/person_movie.ttl

now i want to find relationship between entities extracted from this file. Please guide what should be the query for that.

Thanks

Not sure if I understand correctly, but shouldn’t you do entity linking before you can use the mentioned entities in the knowledge graph? Entity extraction is just finding parts of the text that denote entities like persons, places, etc.

Once you did this, a SPARQL like

SELECT ?e1 ?p ?e2 {
 ?e1 ?p ?e2 .
}

is all you need, clearly you have to might need to get the entities itself from a particular graph

select ?mention ?entity where {
  graph <tag:stardog:api:docs:movies:article.txt> {
    ?s rdfs:label ?mention ;
    ?s <http://purl.org/dc/terms/references> ?entity .
  }
}

and finally could wrap this is a combined query.

Basically we are looking to find relationship like A knows B and B works for organization XYZ within both structured and unstructured data. So for ex - if we have added only following ttl file within Stardog DB and want to find the above mentioned relationship then what will be query for same -

https://raw.githubusercontent.com/stardog-union/stardog-examples/develop/examples/docs/blog/person_movie.ttl

and also, if we have uploaded only following Article file and want to find out relationship between George Clooney and Matt Charman. Please provide complete example query and example for same

https://raw.githubusercontent.com/stardog-union/stardog-examples/develop/examples/docs/blog/article.txt

We are stuck on these points, So, if you can provide complete example and queries to achieve same for both structured and unstructured data then it will be really helpful.

Thanks,
Mukesh Gupta

Hi Mukesh,

To retrieve relationships from person_movie.ttl you will first have to describe the relationships you’re looking to find. If you’re looking for explicit relationships such as :actor, :author, :director, you can just write a SPARQL query:

SELECT ?title WHERE {
  ?tom a :Person ;
    rdfs:label "Tom Hanks" .
  ?movie :actor ?tom ;
    rdfs:label ?title
}
ORDER BY ?title

If, however, you’re looking to infer a relationship, you need to define it so the reasoner can find them. For example if I wanted to define :20sActor as an actor who starred in a 1920’s movie, I can do that with a rule:

IF {
  ?movie :actor ?actor ;
    :copyrightYear ?year .
  FILTER(?year >= 1920 && ?year < 1930)
}
THEN {
  ?actor a :20sActor
}

Once that rule was inserted into the DB, I can query (with reasoning) to find instances of :20sActors without that data needing to be stored explicitly in my DB:

SELECT ?name WHERE {
  ?actor a :20sActor ;
    rdfs:label ?name
}
ORDER BY ?name

As for the unstructured data, you will need to do as Pedro suggested and get/use/create extractors that can retrieve the data you’re looking for. For example, I loaded article.txt with the English tika,entity extractors, and can now query over what it found:

select ?type ?label { 
  graph <tag:stardog:api:docs:rtfm:article.txt> { 
  ?s <tag:stardog:api:docs:hasEntity> [ rdf:type ?type; rdfs:label ?label ] 
  }
}

+---------------------------------------+------------------+
|                 type                  |      label       |
+---------------------------------------+------------------+
| tag:stardog:api:docs:ner:organization | "Watergate"      |
| tag:stardog:api:docs:ner:person       | "George Clooney" |
| tag:stardog:api:docs:ner:date         | "last year"      |
| tag:stardog:api:docs:ner:person       | "Grant Heslov"   |
| tag:stardog:api:docs:ner:person       | "Matt Charman"   |
+---------------------------------------+------------------+

As far as I understood, the use-case is to have

  • Turtle file which contains entities with triples about them and
  • a text file which might contain mentions of those entities.

This needs two steps:

  • the entity extraction which finds parts in the text which mention entities
  • entity linking, i.e. map those entities mentions to the RDF entities in the loaded KB

Once both steps are done, a SPARQL query could be used to get relationships between entities mentioned in the given text.

Hi Stephen,

Thanks for sharing your input - Can you please suggest, In order to find the relationship between entities like A and B works for XYZ organization and both lives in same location - basically we want to get the links between nodes like in the following diagram -

If we have structured data (saved in triples format in Stardog DB) - then to achieve same should we used - PATH queries OR GRAPHQL OR any other? And, Please share complete example to find out relationships like in above image.

Also, for unstructured data, we are trying to create Custom Extractor. So, please suggest -

  1. Can we create custom extractor other than in JAVA?

  2. Please share steps to create Custom Extractor, we were following example and facing issues. So, it would be great, if you can share complete steps to implement the below example -

Thanks,
Mukesh Gupta

Mukesh,

If you want the actual paths from one node to another, a PATHS query is most appropriate. GraphQL will only return you objects matching your specified criteria, so in effect, you’d hard code graph structure into your GraphQL template.

Regarding the extractor, you don’t have to use Java, but it would have to be a language that runs on the JVM, such as Kotlin. If you wanted to use something native, or a web service, you’d want a thin wrapper that calls out to the service.

Regarding the example, it’s hard to suggest solutions to whatever problems you’re facing without knowing what issues you’re running into. The example you reference is self-contained, so there’s nothing more to it than what is outlined there.

Hi Mike,

For the example, will I have to first run the build.gradle file and then the java code?

I am trying to run the build file from gradle, but getting the following error:

D:\Setups\gradle\gradle-4.7\bin>gradle -q

FAILURE: Build failed with an exception.

  • Where:
    Build file 'D:\Setups\gradle\gradle-4.7\bin\build.gradle' line: 4

  • What went wrong:
    A problem occurred evaluating root project 'bin'.

Could not find method compile() for arguments [com.complexible.stardog:server:
5.2.3] on object of type org.gradle.api.internal.artifacts.dsl.dependencies.Defa
ultDependencyHandler.

Or is there any other step that I need to implement first? Am I missing out on something?

Thanks
Navratan

Hey Mike,

We are trying to find out the relationships within following file (saved in Stardog DB) -

http://www.learningsparql.com/2ndeditionexamples/ex069.ttl

for ex - We want to know all the relationships of "Richard" (first name), then what should be Path Query for same, if we want to find out all paths and shortest path

Also, Within the following example, finding all the people Alice is connected to and how she is connected to them -

Can you please share the TTL file of this example, so that we can co-relate with this example?

Thanks,
Mukesh Gupta

@navratan22jan

You should be running gradle from the checkout of the stardog-examples repo, not from the gradle installation directory.

@Mukesh

Here’s the snippet of data from the example:

@prefix : <http://api.stardog.com/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix stardog: <tag:stardog:api:> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix paths: <urn:paths:> .

<urn:paths:Alice> <urn:paths:knows> <urn:paths:Bob> .

<urn:paths:Bob> <urn:paths:knows> <urn:paths:David> ;
    <urn:paths:worksWith> <urn:paths:Charlie> .

<urn:paths:Charlie> <urn:paths:parentOf> <urn:paths:Eve> .

<urn:paths:Eve> <urn:paths:knows> <urn:paths:David> .