Will Stardog be faster than Neo4j in this typical case?

I want to execute such a query in a graph: “Find all products with its type as “SmartPhone”” and their screen resolution is “2000p”. In Neo4j, there is a relationship “has_attributes” between Product and Attributes. Such a query in Neo4j is slow because it has to go through all smart phones in the graph and for any given smart phone, it checks its resolution to see whether it matches ‘2000p’. If the number of smart phones in the graph is large, the query is quite slow, taking 9 seconds in my graph. An index has been created for ‘smartphone’ attribute of Product, so it doesn’t take much time to find all smart phones from all products. Again what takes time is to iterate through all smart phones and check its resolution attributes.

MATCH (s:Product {type:‘SmartPhone’})-[r:has_attributes]->(o:Attributes {Resolution:‘2000p’}) return s, o limit 2

How about the performance of such a similar query in StarDog? It would be great if someone can shed some light on this.

It’s very hard to answer these types of questions in a general way because there are more than one way to model things in Neo4j. Neo4j is a property graph. The most performant way to model this in Neo4j would be to put a SmartPhone label on the Product node, give the node a Resolution property, and create an index for that property on that label - but don’t take Stardog’s advice on how to best model your data in Neo4j, you should reach out to their forum for that. Generally, it’s less complicated in Stardog. There is no mechanism for indicating which properties are indexed and which are not, nor does is make a difference in query performance whether SmartPhone is a type or a property of the Product node. Stardog will optimize the query so that it performs. Why not download an eval and try it?

Paul Jackson

Hi, Jackson:

The reason I create SmartPhone as a property is because something else, and I do have the SmartPhone node type in my graph. It’s hard to say in a few words about the model I have.

It seems to me based your comments that Stardog will automatically optimize any query without requiring user to explicitly creating indexes. Is that right?

Hi Martin,

Yes, that is correct.

Regards,

-Paul

Hi, Paul:

I can provide a little details to showcase the way I am using the query in Neo4j and hopefully you can understand better how a similar situation would be dealt with in Stardog.

My typical query would look like this:

    public String productSku_attrValue_exact_match(String skuid, String attrValue) {

        String query = "MATCH (s:Product {sku_id:'" + skuid + "'})-[r]-> (o:ProductAttributes)" +
                " WHERE any(key in keys(o) WHERE o[key]= '" + attrValue + "') " +
                "return o";
        return query;
    } 

This is one of the functions to produce an actual cypher query based on parameters passed from my frontend code. Because of variables used, some indexes cannot be effectively used in Neo4j and that may cause the slowdown of the query.

Is this a common use case for Stardog users and how is this dealt with in Stardog?

Thanks.

Martin

It looks like a fairly straightforward query so I'd say it's quite common. How you deal with it in Stardog is let Stardog deal with it. The query optimizer should try to arrange things to execute in the fastest way possible. Indexes aren't an issue since basically everything is an index already. I would expect the query would perform quite well but the best way to tell is to download a copy of Stardog and see for yourself. It shouldn't be too difficult to load a minimal data set that would fit the query you've shared and Stardog supports Tinkerpop so you could theoretically execute the query you've got.

Hi Martin,

My understanding is:

  • You have Products that have sku_id properties and relationships with ProductAttributes
  • There are more than one type of relationship that can connect Products to ProductAttributes
  • ProductAttributes has many different properties describing some aspect of a Product. For example, Length, Width, Depth could all be keys, and 3 inches could be the value of any of them.

In your query you want to supply the sku_id of a product and an attrValue of one of its attributes, find products with that sku connected to ProductAttributes have any property value that matches the supplied attrValue, and if found, return those ProductAttributes nodes. For example, providing the sku_id of an iPhone and 3 inches the system could return a ProductAttributes node that has attributes {width=3, length=4.5}. If this is the case, this is not a good way to model it in Neo4j. To find 3 inches quickly you need to put it somewhere Neo4j can find it quickly. One way to do that is to store each attribute in its own node with a fixed properties for the attribute names and values (AttrName=“Width”, AttrValue=“3 inches”) and index on AttrValue. Call that an Attribute (singular) node. Tie the group to an Attributes node which in turn connects to the product node. Neo4j should be able to do that quickly.

You can also do it efficiently in Stardog. The Sparql query (in long form) would look something like this:

Select ?attrs ?attr ?attrName ?attrValue {
  # Find product with correct sku
  ?s a :Product .
  ?s :sku_id '222' .
  ?s :hasAttrs ?attrs .
  # Find the Attributers that has a matching value (sets ?attrs variable)
  ?attrs :hasAttr ?attrTemp .
  ?attrTemp :attrValue '3 in' .
  # Find all attributes given ?attrs matched above
  ?attrs :hasAttr ?attr .
  ?attr :attrName ?attrName .
  ?attr :attrValue ?attrValue .
}

Here’s my test data:

Insert data {
    :P-1 a :Product .
    :P-2 :sku_id '111' .
    :P-2 a :Product .
    :P-2 :sku_id '222' ;
        :hasAttrs :Attrs-2a , :Attrs-2b .
    :Attrs-2a :hasAttr :Attr-2a-len , :Attr-2a-wid .
    :Attrs-2b :hasAttr :Attr-2b-len , :Attr-2b-wid .
    :Attr-2a-len :attrName 'Length' ;
        :attrValue '4.5 in' .
    :Attr-2a-wid :attrName 'Width' ;
        :attrValue '3 in' .
    :Attr-2b-len :attrName 'Height' ;
        :attrValue '4.5 in' .
    :Attr-2b-wid :attrName 'Width' ;
        :attrValue '3.5 in' .
}

Hi, Paul:

"One way to do that is to store each attribute in its own node with a fixed properties for the attribute names and values (AttrName=“Width”, AttrValue=“3 inches”) and index on AttrValue. Call that an Attribute (singular) node. Tie the group to an Attributes node which in turn connects to the product node. "

Do you meant this?
Let’s say if I have at least length, width and height attributes, so I can have 3 node types:
Length(attrName:xxx; attrValue:xxx)
Width(attrName:xxx; attrValue:xxx)
Height(attrName:xxx; attrValue:xxx)

If this is the case, the 3 should be simply collapsed into one node type, since they all share same attributes.

Then what do you mean by “Call that an Attribute (singular) node”? I could create complex labels so in addition to Length, Width and Height, I can add an Attribute label to each:
create (a:Length:Attribute …)
create (a:Length:Attribute …)
create (a:Length:Attribute …)

I think even that I can’t effectively use the index in the attribute nodes due to the variable $attrKey and $attrValue, which has to be used in the WHERE clause in Neo4j. I think that’s the fundamental problem.

I can create index for various attributes of the ProductAttribute node in the current graph, and they can speed up a lot for this query, where ‘resolution’ attribute is indexed, due to the use of ‘USING INDEX’:

    match (s:Product {type:'Phone'})-[r]->(o:Attributes {resolution:'2000'})
    USING INDEX o:Attributes(resolution)
    return s, o limit 2

However, since my query has to accepts variables, it has to change to something like this format:

match (s:Product {type:'Phone'})-[r]->(o:Attributes)
// USING INDEX o:Attributes(resolution) 
WHERE any(key in keys(o) WHERE key=$attrName AND o[key] contains $attrValue)
return s, o limit 2

Now, USING INDEX is invalid in the above query and that slows down significantly.

If Stardog can solve this problem in a straightforword way, that would be great. So I am not really sure whether changing graph model in Neo4j improve things.

Stardog can do it. Did my sample SPARQL queries make sense?

For Neo4j, I was saying 3 nodes with same type (Neo4j calls this “Label”). You would create an AttribValue index on the Attribute label, if I understand your use case correctly.

-Paul

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.