Hi there. I have a database of events that have occurred, and each event has a datetime associated with it, and is in its own named graph. The following query runs hourly to build a report of what has happened in the last hour. The problem I'm having is that the more data in the database, the longer this query takes (for instance, it took 15 seconds this time). Is there a more efficient way to say, "Only bring back objects from the last hour"?
select distinct ?username where {
graph ?g {
?user a <tag:champtc:enterprise#Employee> .
?user <tag:champtc:enterprise#hasAccountName> ?username .
?t <tag:champtc:core#hasEventTime> ?time .
BIND( (NOW() - "P0DT1H"^^xsd:dayTimeDuration) AS ?beginTime ) .
FILTER( ?time >= ?beginTime )
}
} GROUP BY ?username
Thanks for the reply, Pavel. There are multiple connected objects in the graph that is being searched, and the object that has the time has an unknown type (it could be any number of things) and is not necessarily the same object that the user property is on. The query is trying to say, "Return the unique usernames of any named graphs that have a time within the last hour."
Even if I can make that part of the query more specific, is there a better way to do the "in the last hour" part without having to first select everything and then filter out the ones I don't want?
I hope you don't mind my jumping on your question but it's related to some things I've been thinking about lately. I don't think Stardog has any sort of indexing for datatype. Is that correct? I think you're going to have a similar problem filtering with dates that you have filtering on strings that is solved for the latter by full text search. I had asked a question a while ago about hooking into bulk loading because I had wanted to explore timeseries indexes and possibly see if you could use http://www.chronix.io/ for time.
Would it work better if you translated dateTime into epoc and do a numeric filter? I thought you might be able to do a xsd:integer(now()) but it looks like that doesn't work so you'd need to write a custom function.
Well, the "within the last hour" condition is tricky because Stardog does not maintain indexes over dates. There're, of course, tricks you can do but they'd require changes in modeling, e.g. storing the date and the hour for each time instances (as integer literals), and then using it in the query, e.g.
OK, would it be easy for you to try 5.3.4 and see how it performs? There's a query optimization issue which we recently fixed and which is relevant to your case (the top join should be on the graph variable and not a loop join).
Just to be clear: it won't completely solve your problem, i.e. query runtime will increase as the amount of data grows. But it should be better than now, especially if named graphs are relatively small compared to the size of the database.
Upgrading to 5.3.4 helped, and what really speeds it up is knowing that our named graphs are all named after the primary type of object they are, so I added a ?g a ... statement in there so it wasn't looking through all of the subjects.