From the Stardog mailing list: Original thread is Redirecting to Google Groups (way too many emails to copy over to here)
Hi there,
I'm new to using Stardog and have been playing around with a very small number of triples (<100,000), yet I am still encountering problems with query performance.
Running this query to return all triples takes about 8 seconds: select * where {?s ?p ?o .}
Running more complex queries causes the server to time out.
I'm wondering whether the issue might be in my database settings, or perhaps in memory allocation. I have 4 gb of heap and 4 gb of non-heap memory allocated. When I reduce these values to 2 gb, I still have the same problem.
I am using reasoning type EL and sameAs reasoning "Full." Query All Graphs is set to "On." All index settings are set to the default.
Stardog server is running on an AWS instance with 7.5 GB of memory and 4 cores (c4.xlarge).
Another thing: I tried to generate a query plan to determine whether the problem was in my queries, not the software. However when I run "stardog-admin query list -u username" and enter my password in order to get the query ID, it returns "0 queries running," even when I have a query running which has been running for about 10 seconds (so before the timeout hits).
First of all, let me thank you all for the amazing support on this problem that I’ve received over the last several days.
Over the last few days, my team and I have been able to improve Stardog’s performance by following a few steps.
We parsed down the ontology file we were using to include only the axioms we actually needed to run the query.
We eliminated inefficient functions such as “regex,” replacing with XML “contains” function
This allowed us to run a few queries which had previously timed out, including the one which I posted previously. However, we are still having trouble running some of our more complicated queries.
For instance, the following query to return all patient codes (CRIDs) who have not taken any of a set of specific medications before their “Recruit Date” times out when run on our database.
select (count(distinct ?crid) as ?count) where {
?crid a obib:CRID .
?medRec obib:hasPart ?crid .
?crid obib:denotes ?person .
?person a obib:homoSapien .
?medRec a obib:medicalRecord .
minus {
?medRec obib:hasPart ?dataItem .
?dataItem a obib:dataItem .
?dataItem obib:hasPart ?medNameURI .
?medNameURI a carnival:medname .
?dataItem obib:hasPart ?orderDateURI .
?orderDateURI a obib:dateOfDataEntry .
?orderDateURI obib:hasSpecifiedValue ?orderDate .
?person obib:participatesIn ?formFilling .
?formFilling a obib:formFilling .
?recruitDateURI obib:isAbout ?formFilling .
?recruitDateURI a obib:dateOfDataEntry .
?recruitDateURI obib:hasSpecifiedValue ?recruitDate .
?medNameURI obib:hasSpecifiedValue ?medName .
values ?medsfilter {“med1” “med2” “med3”}
filter (contains(lcase(?medName), ?medsfilter))
filter (?recruitDate-?orderDate < “P0D”^^xsd:dayTimeDuration)
}}
Any ideas what could be causing the slowdown? It’s a relatively small number of triples and we’ve reduced the number of axioms to only the essential ones. At first I thought replacing the terms from terms.ttl with OBIB ontology ID’s would help, but still got the time out when I tried this approach.
From my experience with the ontology (no data), the ?person a obib:homoSapien . BGP was the bottleneck in the reasoner, presumably from the sheer number of axioms related to it.
Did you try the odd workaround I suggested in the previous post? If you try that and the query still times out, are you able to run a stardog query explain -r command to see the query plan?
I followed your steps and entered the exported data into a new graph using the TRIG format. Tested the new graph with "Select * where {?s ?p ?o .} and returned 0 results. I switched on “Query All Graphs” and repeated the query, which then showed me all triples in the graph.
Next I ran the query which I posted above. Query still timed out after 1 minute.
Not sure why Query All Graphs needed to be turned on here. It seems like there should only be one graph, which is automatically set as the default for queries to run against?
Without "Query All Graphs," your SPARQL query will only be run on the "default" graph (i.e., triples that aren't in a named graph or otherwise have no context), unless you specify a graph in your query. More info about this can be found in the docs.
While the query times out, do you get any other output for stardog query explain -r after "The Query Plan:?"