Lucene search -> correct triple

matthias · November 13, 2018, 8:49pm

When doing a full-text search with Lucene, the typical pattern seems to be

SELECT ?stuff...
 WHERE { 
   ?s ?p ?l. 
   ?s rdf:type ex:WhateverClass.
   (?l ?score) <tag:stardog:api:property:textMatch> 'mac'. }
   ...more...
}

This means that once a string is found, all possible triples have to be found and filtered. Especially for situations where inference is otherwise needed, the queries turn out to be quite slow.

It would be so much easier and probably faster if one could retrieve the target triple instead.

In our specific situation, we used this query as an example:

SELECT ?matches (count(?concept) as ?count)
WHERE {
    ux:physical skos:narrower* ?concept.
   ?type ont:coveredByConcept ?concept .
   ?subject rdf:type ?type .
   ?subject ?predicate ?matches .
   (?matches ?score) <tag:stardog:api:property:textMatch> ('vsin*' 0.8 10) 
}
group by ?matches

It runs in 27s and finds instances of resources that match a certain string where the instances are in a group of classes covered by a term in a taxonomy. This is about 1 order of magnitude too slow for our needs.

zachary.whitley · November 13, 2018, 9:21pm

I'm not quite sure what you're proposing here. You should be able to write your query for the logical result you're looking for and the query optimizer should worry about how best to physically answer the query. Obviously sometimes it needs a little help which is why there are query hints.

Can you include a copy of the query plan by running stardog query explain ...?

Topic		Replies	Views
Fast query on properties for large number of nodes Support	6	598	May 31, 2018
"optional" slow query Support	4	371	January 18, 2021
Performance issue with textmatch Bug	1	538	April 10, 2019
Any suggestions to improve performance when Support	6	314	August 15, 2018
Ask about Search feature and some important questions on Stardog Feature Request	4	543	April 1, 2020

Lucene search -> correct triple

Related topics