Lucene search with hyphen (dash) (-)

We're having trouble figuring out how to escape the hyphen character when using Lucene search. We're using double backslash to escape:

  1. the Sparql backslash
  2. the hyphen in Lucene
    In this query everything after the hyphen is ignored.

It'd be much appreciated if anybody could give us advice on how to incorporate the dash in the search term or to ignore the dash.

SELECT DISTINCT *  WHERE {
    {
        ?text   tm: "empa\\-reg"^^xsd:string .
        ?trial a linkedct:trial;
        ?predicate1 ?text.
    } 
    UNION 
    {
        ?text   tm: "empa\\-reg"^^xsd:string .
        ?trial a linkedct:trial;
            ?predicate1 [?predicate2 ?text].
    }
}

You'll probably have to write a custom analyser. I don't remember what analyser Stardog uses by default (you can probably find it in the documentation somewhere) but it's probably the Standard analyser and the dash isn't going to be a part of the indexed term. There is an example of writing a custom analyser in the stardog-examples github rep.

Followup:

It looks like StandardAnalyzer is the default. A quick explanation is in the documentation here

An example of implementing a custom analyzer can be found here.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.