Query performance issues

From the Stardog mailing list: Original thread is https://groups.google.com/a/clarkparsia.com/forum/?hl=en#!topic/stardog/JCvzubSWCl4 (way too many emails to copy over to here)


Hi there,

I’m new to using Stardog and have been playing around with a very small number of triples (<100,000), yet I am still encountering problems with query performance.
Running this query to return all triples takes about 8 seconds: select * where {?s ?p ?o .}
Running more complex queries causes the server to time out.
I’m wondering whether the issue might be in my database settings, or perhaps in memory allocation. I have 4 gb of heap and 4 gb of non-heap memory allocated. When I reduce these values to 2 gb, I still have the same problem.
I am using reasoning type EL and sameAs reasoning “Full.” Query All Graphs is set to “On.” All index settings are set to the default.
Stardog server is running on an AWS instance with 7.5 GB of memory and 4 cores (c4.xlarge).

Another thing: I tried to generate a query plan to determine whether the problem was in my queries, not the software. However when I run “stardog-admin query list -u username” and enter my password in order to get the query ID, it returns “0 queries running,” even when I have a query running which has been running for about 10 seconds (so before the timeout hits).

Hi,

This may sound a little strange, but if you try this, do you get the same result, or does your query finish?

  1. Do a stardog data export of your database (The one that contains the obib ontology, the terms.ttl, and the data)
  2. Create a new database and use that exported rdf file to do so stardog-admin db create -n myNewAwesomeDb myExport.ttl
  3. Try running the query (with reasoning) on that new database.

First of all, let me thank you all for the amazing support on this problem that I’ve received over the last several days.

Over the last few days, my team and I have been able to improve Stardog’s performance by following a few steps.

  1. We parsed down the ontology file we were using to include only the axioms we actually needed to run the query.
  2. We eliminated inefficient functions such as “regex,” replacing with XML “contains” function

This allowed us to run a few queries which had previously timed out, including the one which I posted previously. However, we are still having trouble running some of our more complicated queries.

For instance, the following query to return all patient codes (CRIDs) who have not taken any of a set of specific medications before their “Recruit Date” times out when run on our database.

select (count(distinct ?crid) as ?count) where {
?crid a obib:CRID .
?medRec obib:hasPart ?crid .
?crid obib:denotes ?person .
?person a obib:homoSapien .
?medRec a obib:medicalRecord .
minus {
?medRec obib:hasPart ?dataItem .
?dataItem a obib:dataItem .
?dataItem obib:hasPart ?medNameURI .
?medNameURI a carnival:medname .
?dataItem obib:hasPart ?orderDateURI .
?orderDateURI a obib:dateOfDataEntry .
?orderDateURI obib:hasSpecifiedValue ?orderDate .
?person obib:participatesIn ?formFilling .
?formFilling a obib:formFilling .
?recruitDateURI obib:isAbout ?formFilling .
?recruitDateURI a obib:dateOfDataEntry .
?recruitDateURI obib:hasSpecifiedValue ?recruitDate .
?medNameURI obib:hasSpecifiedValue ?medName .
values ?medsfilter {“med1” “med2” “med3”}
filter (contains(lcase(?medName), ?medsfilter))
filter (?recruitDate-?orderDate < “P0D”^^xsd:dayTimeDuration)
}}

Any ideas what could be causing the slowdown? It’s a relatively small number of triples and we’ve reduced the number of axioms to only the essential ones. At first I thought replacing the terms from terms.ttl with OBIB ontology ID’s would help, but still got the time out when I tried this approach.

Hi,

From my experience with the ontology (no data), the ?person a obib:homoSapien . BGP was the bottleneck in the reasoner, presumably from the sheer number of axioms related to it.

Did you try the odd workaround I suggested in the previous post? If you try that and the query still times out, are you able to run a stardog query explain -r command to see the query plan?

Trying your workaround suggestion: When I try to export using the command: stardog data export database_name db_data.ttl -g ALL -u admin

I get the error: Cannot export as Turtle, it does not support contexts but the database contains one or more named graphs

Also attempted to specify RDF/XML and N3 output formats, and received a similar error message.

On a side note, all of my data is in named graphs: is your workaround getting at the idea that this structure is inefficient?

Apologies, I am just used to working with Turtle format, which doesn’t support named graphs.

Try exporting the data in N-Quads (.nq) or TriG (.trig) format.

I followed your steps and entered the exported data into a new graph using the TRIG format. Tested the new graph with "Select * where {?s ?p ?o .} and returned 0 results. I switched on “Query All Graphs” and repeated the query, which then showed me all triples in the graph.

Next I ran the query which I posted above. Query still timed out after 1 minute.

Not sure why Query All Graphs needed to be turned on here. It seems like there should only be one graph, which is automatically set as the default for queries to run against?

Without “Query All Graphs,” your SPARQL query will only be run on the “default” graph (i.e., triples that aren’t in a named graph or otherwise have no context), unless you specify a graph in your query. More info about this can be found in the docs.

While the query times out, do you get any other output for stardog query explain -r after “The Query Plan:?”

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.