About textMatch (Lucene) usage

oreore · June 11, 2018, 6:59am

Hi Stardog team,

Suppose that I have some instances having different rdfs labels.
insert{
http://test.com/TEST16 rdfs:Label "username" .
http://test.com/TEST17 rdfs:Label "user_name" .
http://test.com/TEST18 rdfs:Label "user-name" .
http://test.com/TEST19 rdfs:Label "user" .
}
where {}

What I need to do is retrieving all instances using textMatch in the order of text similarity score.
But the following query gives only "user-name" and "user" with the same score as below.

select *
where {
?s rdfs:Label ?o .
(?o ?score) tag:stardog:api:property:textMatch ("user~" 0 100).
}

s	o	score
:TEST18	user-name	2.54044508934021
:TEST19	user	2.54044508934021

So I put wildcard *
select *
where {
?s rdfs:Label ?o .
(?o ?score) tag:stardog:api:property:textMatch ("user*~" 0 100).
}

and now it gives all of them with the same score.

s	o	score
:TEST16	username	1.0
:TEST17	user_name	1.0
:TEST18	user-name	1.0
:TEST19	user	1.0

Can you provide a proper way to differentiate them with different scores?

Thank you in advance.

lorenz_b · June 11, 2018, 7:37am

Given that the fulltext index is based on Lucene, it’s default score just the common information retrieval score which only considers term frequency and document frequency (and some boosting). A String similarity like edit distance etc. is not taken into account, would be too expensive computing it besides the index lookup.

pedro · June 11, 2018, 9:59am

Hi Hwang,

As Lorenz referred, the Lucene score is not a proper text similarity score, it's just a value used by lucene to decide if a result is relevant to a query or not.
If you need an actual similarity score, you can pass the results through a similarity metric, like the ones given by the kibbles-string-metric referred in this post. Just add the release jar to Stardog's classpath, restart the server, and several distance metrics will be available in SPARQL.

-pedro

system · June 25, 2018, 9:59am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Search and SPARQL textmatch return different results Support	5	547	August 15, 2018
Performance issue with textmatch Bug	1	538	April 10, 2019
Full Text Search syntax errors Support	3	698	June 18, 2019
Lucene search -> correct triple Feature Request	1	443	November 13, 2018
What goes with what for text search? Parameters? Waldo? Lucene? Support	2	452	June 17, 2019

About textMatch (Lucene) usage

Related topics