Would the performance be an issue in production?

martin · November 18, 2019, 7:03am

For some example queries in this tutorial,
https://www.stardog.com/tutorials/getting-started-2/

it seems taking a long time. For example:

SELECT ?actor ?name (count(?movie) as ?numMovies) 
WHERE { 
    ?actor :hasName ?name .
    ?actor :actedIn ?movie .
    }
GROUP BY ?actor ?name
ORDER BY DESC(?numMovies)

This query takes over 10k ms, and most of the path queries take over 1K ms. In production system, this would be too bad to be useful. Single query's time should be limited to a few ms, ideally. For several hundred miliseconds, that's usually not tolerable.

Am I missing something in terms of performance?

stephen · November 18, 2019, 12:26pm

You can try restarting your Stardog server with more memory allocated (the defaults are somewhat low). Set the $STARDOG_SERVER_JAVA_ARGS variable to something like -Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g (or higher) depending on the memory limitations of your machine, and you should see increased performance.

martin · November 18, 2019, 9:00pm

Is there a configuration file under the root directory of the installation? Where to set the $STARDOG_SERVER_JAVA_ARGS? Thanks.

stephen · November 19, 2019, 12:34pm

That is set as an environment variable in the shell in which Stardog is started. If Stardog is running under systemd, you would set it in /etc/stardog.env.sh.

martin · November 19, 2019, 7:35pm

I added the following to my ~/.bashrc on Ubuntu and it doesn't seem to take effect:

export STARDOG_JAVA_ARGS="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g"

zachary.whitley · November 19, 2019, 7:39pm

Is it set in the shell that you're running stardog from?

echo $STARDOG_JAVA_ARGS

and did you restart Stardog? You probably want to be setting STARDOG_SERVER_JAVA_ARGS and not STARDOG_JAVA_ARGS.

martin · November 19, 2019, 8:21pm

Yes. I changed it "STARDOG_SERVER_JAVA_ARGS" and restart the server. This query's time is reduced from over 10k to over 6k with the movie data:

CONSTRUCT {
    ?domain ?prop ?range
}
WHERE {
    ?subject ?prop ?object .
    ?subject a ?domain .
    optional {
        ?object a ?oClass .
    }
    bind(if(bound(?oClass), ?oClass, datatype(?object)) as ?range)
    filter (?prop != rdf:type && ?prop != rdfs:domain && ?prop != rdfs:range)
}

But the following query still takes over 10K to complete.

PATHS 
    START ?x {?x :hasName "Kevin Bacon"} 
    END ?y {?y :hasName "Nick Offerman"}
VIA {
    ?movie a :Movie ;
      :hasTitle ?title ;
      :hasYear ?year .
    ?x :actedIn ?movie ;
        :hasName ?xName .
    ?y :actedIn ?movie ;
        :hasName ?yName .
    FILTER (?year >= 2010)
} LIMIT 1

For both queries, they seems slow. My Ubuntu has 64G memory and 8 cores. Are these performances expected? How about on your computer to test? Thanks.

I am testing in the Stardog Studio. After i restarted the server from the command line and relaunch the Studio, is there something special needed to be done, to re-config the studio to work with the newly started database? I just re-launch the Studio and click 'connect' to reconnect the database.

zachary.whitley · November 19, 2019, 8:29pm

I'm seeing 27s and 21s seconds for the two queries with the same 4G/8G memory allocation you're running with.

You can look over the query plan to see what's happening.

martin · November 19, 2019, 9:22pm

  1 Reduced [#3.8M]
  2 `─ Projection(?domain AS ?subject, ?prop AS ?predicate, ?range AS ?object, <tag:stardog:api:context:default> AS ?context) [#3.8M]
  3    `─ Bind(IF(Bound(?oClass), ?oClass, Datatype(?object)) AS ?range) [#3.8M]
  4       `─ HashJoinOuter(?object) [#3.8M]
  5          +─ MergeJoin(?subject) [#3.8M]
  6          │  +─ Scan[PSOC](?subject, rdf:type, ?domain) [#921K]
  7          │  `─ Filter((?prop != rdf:type && (?prop != rdfs:domain && ?prop != rdfs:range))) [#3.8M]
  8          │     `─ Scan[SPO](?subject, ?prop, ?object) [#3.8M]
  9          `─ Scan[PSOC](?object, rdf:type, ?oClass) [#921K]

This is the explanation file for the query. HashJoinOuter was highlighted in red in the plan panel.

pavel · November 20, 2019, 10:42am

Hi Martin,

I'd say that the SELECT and the CONSTRUCT queries probably can't run <1 sec because they process a lot of data (pretty much all of the database). The SELECT query also returns >500K results so there's a noticeable ORDER BY and HTTP overhead, too. But the PATHS query I'd expect to be faster and we'll take a look at what's going on. One thing to notice there is that the problem is related to querying for attributes of intermediate nodes in paths, i.e. this reduced version:

PATHS 
    START ?x {?x :hasName "Kevin Bacon"} 
    END ?y {?y :hasName "Nick Offerman"}
VIA {
    ?movie a :Movie ;
      :hasYear ?year .
    ?x :actedIn ?movie .
    ?y :actedIn ?movie .
    FILTER (?year >= 2010)
}
LIMIT 1

completes in <1s for me.

Thanks for the report,
Pavel

martin · November 20, 2019, 6:24pm

Hi, Pavel:

I know 1ms is too demanding. My problem seems to be the fact that I can't get the performance you and Zachary get by running the two queries. Zachary said he got 27ms and 21ms for the two queries above as shown. And for your reduced version, I still get 2925 ms (see the screenshot). I did set the variable as below:
export STARDOG_HOME="/home/martin/stardog-7.0.3"
export PATH="/home/martin/stardog-7.0.3/bin:$PATH"
export PATH="/opt/gradle-6.0.1/bin:$PATH"
export STARDOG_SERVER_JAVA_ARGS="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g"

$ echo $STARDOG_SERVER_JAVA_ARGS
-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g

I am working on Kubuntu 18.04, with 64G ram and 8 cores.

pavel · November 20, 2019, 6:32pm

Well, Zach said 21s, not 21ms. It's about x2 less for me but in the ballpark. 3s for one path on your machine is definitely too much comparing to 500ms on my OSX laptop (16G RAM). Is Stardog running locally for you? Does the time change if you run the query multiple times?

pavel · November 20, 2019, 6:35pm

Oh, also I have an SSD disk which is quite important. If your home is on an HDD it could affect the results.

martin · November 20, 2019, 6:45pm

I restarted the Studio and the reduced version takes 389ms. I am running stardog locally. Is this normal? But you still get 1s, much lower.

pavel · November 21, 2019, 9:24am

Again, I think there's some confusion re: units here. 389ms is consistent with my experience of <1s, that is, 1000ms. I normally get the path back in about 500ms.

As I said, we created a ticket to look into performance for the full version of this query. Generally querying for properties of intermediate nodes in paths comes at a cost when many paths go through the same nodes (so not just the nodes are repeated but their associated properties too). But it doesn't seem to be the case here since there's only a single path.

Cheers,
Pavel

martin · November 21, 2019, 5:47pm

Hi, Pavel:

Based on your current customer feedback, is that a real problem when a single query takes about 300-500 ms in a production system? If the KG is used as part of an analysis tool, it may be ok for an analyst to wait for several hundreds of millseconds or even a few seconds for the query to be completed. Instead, if the KG is used to support a realtime system, i.e. a chatbot to interact with many customers, would that be too slow? (In such a case, assume that the KG query time alone should be limited to < 100ms). It would be great if you can share some information on this based on your current customer experiences and feedback.

pavel · November 21, 2019, 5:58pm

Well, I don't think people would use path queries to provide data for real time tasks, e.g. for UI stuff. Searching for paths is naturally seen as a more of an analytic task which requires possibly deep graph traversals (unless you constrain it in some specific ways, e.g. with the MAX LENGTH keyword). We have customers who use Stardog to power UI but those queries are typically more like finding properties/connections of specific nodes in the graphs, i.e. low latency requires selective patterns. And then there's typically stuff between a database and the UI layer, e.g. caches, to ensure responsiveness.

Cheers,
Pavel

Topic		Replies	Views
Stardog on VM Linux Ubutu - memory capacity Support	6	235	January 19, 2023
Scalability issues in Stardog Cloud Support	17	435	May 31, 2025
Query performance issues Support	8	1254	February 28, 2017
Query timeouts and killing fails Support	9	1586	May 1, 2022
PATHS ALL query memory leak / unresponsive? Support	5	350	December 20, 2021

Would the performance be an issue in production?

Related topics