SELECT ?actor ?name (count(?movie) as ?numMovies)
WHERE {
?actor :hasName ?name .
?actor :actedIn ?movie .
}
GROUP BY ?actor ?name
ORDER BY DESC(?numMovies)
This query takes over 10k ms, and most of the path queries take over 1K ms. In production system, this would be too bad to be useful. Single query's time should be limited to a few ms, ideally. For several hundred miliseconds, that's usually not tolerable.
You can try restarting your Stardog server with more memory allocated (the defaults are somewhat low). Set the $STARDOG_SERVER_JAVA_ARGS variable to something like -Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g (or higher) depending on the memory limitations of your machine, and you should see increased performance.
That is set as an environment variable in the shell in which Stardog is started. If Stardog is running under systemd, you would set it in /etc/stardog.env.sh.
For both queries, they seems slow. My Ubuntu has 64G memory and 8 cores. Are these performances expected? How about on your computer to test? Thanks.
I am testing in the Stardog Studio. After i restarted the server from the command line and relaunch the Studio, is there something special needed to be done, to re-config the studio to work with the newly started database? I just re-launch the Studio and click 'connect' to reconnect the database.
I'd say that the SELECT and the CONSTRUCT queries probably can't run <1 sec because they process a lot of data (pretty much all of the database). The SELECT query also returns >500K results so there's a noticeable ORDER BY and HTTP overhead, too. But the PATHS query I'd expect to be faster and we'll take a look at what's going on. One thing to notice there is that the problem is related to querying for attributes of intermediate nodes in paths, i.e. this reduced version:
I know 1ms is too demanding. My problem seems to be the fact that I can't get the performance you and Zachary get by running the two queries. Zachary said he got 27ms and 21ms for the two queries above as shown. And for your reduced version, I still get 2925 ms (see the screenshot). I did set the variable as below:
export STARDOG_HOME="/home/martin/stardog-7.0.3"
export PATH="/home/martin/stardog-7.0.3/bin:$PATH"
export PATH="/opt/gradle-6.0.1/bin:$PATH"
export STARDOG_SERVER_JAVA_ARGS="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g"
Well, Zach said 21s, not 21ms. It's about x2 less for me but in the ballpark. 3s for one path on your machine is definitely too much comparing to 500ms on my OSX laptop (16G RAM). Is Stardog running locally for you? Does the time change if you run the query multiple times?
Again, I think there's some confusion re: units here. 389ms is consistent with my experience of <1s, that is, 1000ms. I normally get the path back in about 500ms.
As I said, we created a ticket to look into performance for the full version of this query. Generally querying for properties of intermediate nodes in paths comes at a cost when many paths go through the same nodes (so not just the nodes are repeated but their associated properties too). But it doesn't seem to be the case here since there's only a single path.
Based on your current customer feedback, is that a real problem when a single query takes about 300-500 ms in a production system? If the KG is used as part of an analysis tool, it may be ok for an analyst to wait for several hundreds of millseconds or even a few seconds for the query to be completed. Instead, if the KG is used to support a realtime system, i.e. a chatbot to interact with many customers, would that be too slow? (In such a case, assume that the KG query time alone should be limited to < 100ms). It would be great if you can share some information on this based on your current customer experiences and feedback.
Well, I don't think people would use path queries to provide data for real time tasks, e.g. for UI stuff. Searching for paths is naturally seen as a more of an analytic task which requires possibly deep graph traversals (unless you constrain it in some specific ways, e.g. with the MAX LENGTH keyword). We have customers who use Stardog to power UI but those queries are typically more like finding properties/connections of specific nodes in the graphs, i.e. low latency requires selective patterns. And then there's typically stuff between a database and the UI layer, e.g. caches, to ensure responsiveness.