Fast query on properties for large number of nodes

Leo0909 · May 16, 2018, 11:43pm

Hi, I’m new to StarDog and is running some pre-test for my company. I imported a huge product graph into StarDog (with > 100 million triples). I have a very expensive query described as follows:
Now the graph contains a set of products, each product has a type and set of attribute name-value pairs.
Given a product type, for example “Phone”, I want to find all products with this type and contain an attribute name which contain a string (for example “abcdef”). The worst case is there is no result for this query, however we have to scan all “Phone” nodes and their attributes, which is very expensive since I have 1 million attributes for phones.
is there any good way to do this kind of query or any indices that can support this query?
Let me know if my description is clear.

Best~

lorenz_b · May 17, 2018, 7:01am

Shouldn't you simply write the SPARQL query and let Stardog do the work? Indexing is usually done be default in most triple stores. In general, up to 6 permutations of s,p,o will be indexed separately.
For text containment, a full-text index should be used and then the built-in predicate <tag:stardog:api:property:textMatch> for text lookup in literals.

pavel · May 17, 2018, 7:01am

You may want to enable full-text search [1] for attribute name matching and express the rest using SPARQL. If the keyword is reasonably selective the query shouldn't be particularly slow.

Once you have a query you may share the query plan here (output of stardog query explain command) in case of performance issues.

Best,
Pavel

[1] Home | Stardog Documentation Latest

Leo0909 · May 17, 2018, 6:09pm

Thanks Pavel, the full-text search actually helps and increase the performance to around 20 times.
However it still takes 3 seconds to answer the query(if no match is found) and we want to make this to less than 100ms for our online service. I list the query below,

SELECT ?s ?p ?o WHERE { ?s ?p ?o .
?s http://www.w3.org/1999/02/22-rdf-syntax-ns#type "ProductAttributes".
(?p) tag:stardog:api:property:textMatch 'hasWidth'. }

seems the query explain can not give me the query plan, I got "No driver was found which supports the connection string "SELECT ......"". Am I doing something wrong for query explain?

Best~
Qi

Leo0909 · May 17, 2018, 6:10pm

Full-text index actually helps!

pavel · May 17, 2018, 6:39pm

Here's the description of query explain with examples: Command Line Interface | Stardog Documentation Latest

How many results does ?p tag:stardog:api:property:textMatch ‘hasWidth’ return? If the number of properties is reasonably small for typical keywords (say p1,...,pn), it'd make sense to cache them and rewrite your query to:

SELECT ?s ?o WHERE { 
?s :p1 | :p2 | ... | :pn ?o .
?s http://www.w3.org/1999/02/22-rdf-syntax-ns#type “ProductAttributes”. }

It's possible to preserve values of ?p in the results, it'd take explicit bind patterns.

Cheers,
Pavel

system · May 31, 2018, 6:39pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Will Stardog be faster than Neo4j in this typical case? Support	9	1580	April 23, 2018
Ask about Search feature and some important questions on Stardog Feature Request	4	543	April 1, 2020
Lucene search -> correct triple Feature Request	1	443	November 13, 2018
From 30ms to 7000ms when asking for edgeAttributeProperty? Support	6	386	August 23, 2021
Performance issue with textmatch Bug	1	538	April 10, 2019

Fast query on properties for large number of nodes

Related topics