Speed up sampling query?

I have a DB with 63 categories and some 5+ million entities that are classified by these categories. I tried to generate a list of every category and a single example of an entity for each one via

select ?category (sample(?entity) as ?example) where {
   ?category a :Category .
   ?entity :categorizedBy ?category
} group by ?category

This query takes 10+ seconds, as the query profile shows it scanning all 5+ million entries just to return 63. I could run this faster if I just ran a primary query to get a list of categories and then issued 63 'LIMIT 1' queries for the examples. Is there a performant way for me to do this in a single query? It seems that nothing should prevent eager aggregation from short-circuiting most of the time consumed.

Hi Boris,

right, your suggested alternative yields the same desired output (in case the randomness of sample is not of relevance). You should be able to do this using the Stored Query Service. The stored query would look something like this:

SELECT ?example {
    ?example :categorizedBy ?category
}
LIMIT 1

and the main query:

PREFIX sqs: <tag:stardog:api:sqs:>
SELECT ?category ?example {
    ?category a :Category .
    SERVICE<query://ExampleOfCategory> {
       [] sqs:var:category ?category 
    }
}

However, note that you will (most likely) always get the same example instance per category, since the Scans access the in sorted order. If that's not an issue, then this could be a viable alternative.

Best regards
Lars

1 Like

@Lars_Heling I actually tried this out, and it, unfortunately, doesn't work as I hoped it would (i.e. invoke the stored sub-query N times, once for each bound ?category). Instead, it's just a shortcut for not including the sub-query text, so it generates a single categorized item.

I think the main query should be using sqs:input (cf Correlated Subqueries)

PREFIX sqs: <tag:stardog:api:sqs:>
SELECT ?category ?example {
    ?category a :Category .
    SERVICE<query://ExampleOfCategory> {
       [] sqs:input ?category 
    }
}

Did you try this as well?

1 Like

Yes @Lars_Heling, I tried this but I am not satisfied with this.
After this I try: How To Sample Rows in SQL 273X Faster | Sisense.