Hi, I’m new to StarDog and is running some pre-test for my company. I imported a huge product graph into StarDog (with > 100 million triples). I have a very expensive query described as follows:
Now the graph contains a set of products, each product has a type and set of attribute name-value pairs.
Given a product type, for example “Phone”, I want to find all products with this type and contain an attribute name which contain a string (for example “abcdef”). The worst case is there is no result for this query, however we have to scan all “Phone” nodes and their attributes, which is very expensive since I have 1 million attributes for phones.
is there any good way to do this kind of query or any indices that can support this query?
Let me know if my description is clear.
Shouldn't you simply write the SPARQL query and let Stardog do the work? Indexing is usually done be default in most triple stores. In general, up to 6 permutations of s,p,o will be indexed separately.
For text containment, a full-text index should be used and then the built-in predicate <tag:stardog:api:property:textMatch> for text lookup in literals.
You may want to enable full-text search [1] for attribute name matching and express the rest using SPARQL. If the keyword is reasonably selective the query shouldn't be particularly slow.
Once you have a query you may share the query plan here (output of stardog query explain command) in case of performance issues.
Thanks Pavel, the full-text search actually helps and increase the performance to around 20 times.
However it still takes 3 seconds to answer the query(if no match is found) and we want to make this to less than 100ms for our online service. I list the query below,
seems the query explain can not give me the query plan, I got "No driver was found which supports the connection string "SELECT ......"". Am I doing something wrong for query explain?
How many results does ?p tag:stardog:api:property:textMatch ‘hasWidth’ return? If the number of properties is reasonably small for typical keywords (say p1,...,pn), it'd make sense to cache them and rewrite your query to: