Hi,
i want to query stardog (6.1.2) on a bsbm ~35m dataset with a faceted-browsing like query, e.g.
PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT distinct ?o ?label ?desc {
FILTER EXISTS {
?resource a bsbm:Product.
?resource bsbm:productFeature ?feature.
?resource bsbm:productPropertyTextual1 ?pf1.
?resource bsbm:productPropertyTextual2 ?pf2.
?review bsbm:reviewFor ?resource.
}
?resource ?p ?o.
FILTER (isiri(?o))
optional {
?o rdfs:label ?label
}
optional {
?o rdfs:comment ?comment
}
?o rdfs:label ?osearch.
?osearch <tag:stardog:api:property:textMatch> "da*".
#FILTER (contains(?osearch,"da"))
}
limit 10000
performance dramatically drops, when using textmatch compared to contains.
- contains ~1.1 s; 1600 results
- textMatch ~400s; 437 results
As the results appear to be what is expected here, however the performance penalty is quite high.
The problem is much less pronounced, when the text-filtering is issued without other graph patterns, e.g.
PREFIX bsbm-inst: http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/
PREFIX bsbm: http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
select distinct * {
?o rdfs:label ?oSearch.
?oSearch <tag:stardog:api:property:textMatch> "da*".
#FILTER(contains(?oSearch,"da"))
}
limit 100000
Note:
The double use of rdfs:label might strike as odd, but as the query is generated, this might very well happen.