Performance issue with textmatch

Hi,

i want to query stardog (6.1.2) on a bsbm ~35m dataset with a faceted-browsing like query, e.g.

PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>



SELECT  distinct ?o ?label ?desc {

    
    FILTER EXISTS {
        ?resource a bsbm:Product.
        ?resource bsbm:productFeature ?feature.
        ?resource bsbm:productPropertyTextual1 ?pf1.
        ?resource bsbm:productPropertyTextual2 ?pf2.
        ?review bsbm:reviewFor ?resource.

    }
    ?resource ?p ?o.
    
    FILTER (isiri(?o))
    optional {
        ?o rdfs:label ?label
    }

    optional {
        ?o rdfs:comment ?comment

    }

    ?o rdfs:label ?osearch.
    ?osearch <tag:stardog:api:property:textMatch> "da*".
    #FILTER (contains(?osearch,"da"))

}

limit 10000

performance dramatically drops, when using textmatch compared to contains.

  • contains ~1.1 s; 1600 results
  • textMatch ~400s; 437 results

As the results appear to be what is expected here, however the performance penalty is quite high.

The problem is much less pronounced, when the text-filtering is issued without other graph patterns, e.g.
PREFIX bsbm-inst: http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/
PREFIX bsbm: http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#

select distinct * {

    ?o rdfs:label ?oSearch.
    ?oSearch <tag:stardog:api:property:textMatch> "da*".
    #FILTER(contains(?oSearch,"da"))
}
limit 100000 

Note:
The double use of rdfs:label might strike as odd, but as the query is generated, this might very well happen.

Hi Jörg,

Welcome to the forum. Thanks for the detailed report. Can you please share the query plans for the textMatch vs contains queries?

Jess