Lucene search -> correct triple


(Matthias Autrata) #1

When doing a full-text search with Lucene, the typical pattern seems to be

SELECT ?stuff...
 WHERE { 
   ?s ?p ?l. 
   ?s rdf:type ex:WhateverClass.
   (?l ?score) <tag:stardog:api:property:textMatch> 'mac'. }
   ...more...
}

This means that once a string is found, all possible triples have to be found and filtered. Especially for situations where inference is otherwise needed, the queries turn out to be quite slow.

It would be so much easier and probably faster if one could retrieve the target triple instead.

In our specific situation, we used this query as an example:

SELECT ?matches (count(?concept) as ?count)
WHERE {
    ux:physical skos:narrower* ?concept.
   ?type ont:coveredByConcept ?concept .
   ?subject rdf:type ?type .
   ?subject ?predicate ?matches .
   (?matches ?score) <tag:stardog:api:property:textMatch> ('vsin*' 0.8 10) 
}
group by ?matches

It runs in 27s and finds instances of resources that match a certain string where the instances are in a group of classes covered by a term in a taxonomy. This is about 1 order of magnitude too slow for our needs.


(zachary.whitley) #2

I'm not quite sure what you're proposing here. You should be able to write your query for the logical result you're looking for and the query optimizer should worry about how best to physically answer the query. Obviously sometimes it needs a little help which is why there are query hints.

Can you include a copy of the query plan by running stardog query explain ...?