I'm having perfomance issues with queries that yield many results. My database contains 11 million triples in two named graphs. All queries in the largest of the graphs, containing 7 million triples, take 400 miliseconds or more when limited to 1000 results in Stardog studio, and several seconds in queries without the limit in my application.
My use case is a semantic web portal that queries a sparql endpoint and loads all instances in a dataset into a map. A demo website of the framework I'm using, SampoUI, has achieved acceptable performance using Fuseki instead of Stardog, in a dataset that is larger than mine: https://sampo-ui.demo.seco.cs.aalto.fi/en/perspective3/faceted-search/map
Can you share a couple of queries with poor performance? You can use the query profiler (from CLI, your application, or directly in Studio with a larger LIMIT) and share its output with us.
I thought it was an issue with any SPARQL query, even the most basic ones:
SELECT * FROM <http://data.stadnamn.uib.no/stedsnavn/rygh> WHERE { ?s ?p ?o }
When querying the same data in Fuseki however, I found that Stardog performs better than Fuseki as the limit increases. I suspect the Fuseki server of the demo website I linked to caches the results of the query that returns more than 300 000 coordinates for the cluster map. Is there some way to accomplish this with Stardog?
Fuseki does however perform better for queries that return few results. When adding LIMIT 100 to the query above, it takes 29ms in Fuseki and 390ms in Stardog.
In my application, I would like the following query to perform well when getting 180 000 coordinates:
When querying the same data in Fuseki however, I found that Stardog performs better than Fuseki as the limit increases. I suspect the Fuseki server of the demo website I linked to caches the results of the query that returns more than 300 000 coordinates for the cluster map. Is there some way to accomplish this with Stardog?
No, Stardog doesn't provide any functionality for caching query results. I imagine it can be set up externally with something like Memcached.
Fuseki does however perform better for queries that return few results. When adding LIMIT 100 to the query above, it takes 29ms in Fuseki and 390ms in Stardog.
390ms definitely sounds a lot for reading 100 triples. How exactly did you measure it? Did you warm up the system or average over multiple runs? Did you measure on the client side (eg Studio) or use the profiler to measure on the server side?
With a limit of 200 000 it has taken between 5 and 12 seconds to run the query in Stardog studio.
First of all, 5 to 12 seconds is a very large variance. We need to establish the source of it. I suggest using query explain --profile (or the built-in Studio profiler) multiple times to run this query and sharing all outputs with us. You can also access it through the Java API from your application. Then we can tell what is happening on the server side. At this point, there's not enough evidence there's much in common between this issue and the 390ms to run the simple query.
Well "normal" depends on environment, particularly hardware. I don't see any particular problems in the query plan, other than spending ~25% of execution time on sorting data (which is a CPU-bound activity).
For now let's get back to the time variance on the client side. Do I understand correctly that when you use the CLI and the profiler, the server-side execution time is consistently around 2.5s but when you measure in Studio, i.e. on the client, it varies between 5 and 12s?
In Stardog Studio it is now consistently around 5s. I had a few execution times above 10s yesterday, before it dropped to 5s. Unfortunately I didn't look at the query profiler when it happened.
If you ran a traceroute from your current location to stardog cloud, I believe you'll find either a few significantly slow network hops along the way, or a very long series of hops - as Stardog cloud is currently hosted in US west cloud servers. We expect to offer something more local for our EU friends in the not too distant future.
If you have demand there for a potential enterprise deal, enterprise prospects may get approved for a trial license that would enable you to run the Stardog server locally. Otherwise, Stardog cloud typically provides a good customer journey from learning on free, to prototyping on Essentials, to larger enterprise environments later on. Note that the front end tools that you are using in Stardog Cloud are all single page applications that run in your browser, so that would bring this to locally for you instead of around-the-world.
A very low cost short term solution may be to look at a VPN client for your machine, and see if any of the VPNs have a better series of hops.