As basic as it may sound, you cannot do it in pure SPARQL with a single query. A LIMIT requires a subquery but subqueries in SPARQL are uncorrelated i.e. they're executed once while you need the (say, authors) subquery to execute once per blog post (so you get 2 authors per post, not 2 authors in total).
Again, you need correlated subqueries which in Stardog means Stored Query Service. Something like:
select * {
{ select * { ?post a :Post } # some pattern to select posts, e.g. by date range, etc.
limit 3 }
service<query://authors> { [] sqs:input ?post ; sqs:vars ?author } # select 2 authors max per post
service<query://topics> { [] sqs:input ?author ; sqs:vars ?topic } # select 1 topic per author
}
I haven't tried it so syntax could be slightly off but hopefully you get the idea.
Yeah, it's an old post which we wrote at the time when we first realised that supporting path queries as subqueries (in, say, SELECT queries) is going to be difficult because their results do not quite fit into the SPARQL notion of "solution" (a fixed-size set of variable bindings). That was before SQS. SQS works a bit differently than was envisioned in the post: the solution is still fixed-size but actual values could be arrays. If you execute path (sub)queries and project ?path, you may see those arrays in their internal representation. There're some functions to handle them: str, stardog:length, stardog:any, and stardog:all.
Yes, the SPARQL 1.2 issue is exactly there to add support for correlated subqueries (actually "lateral" in Postgres would be more accurate because they're not limit to a single column value). As you discovered, its use cases could be very basic. It'd be a pretty major extension though so we decided not to wait but instead added support to SQS. Supporting correlated subqueries without service (i.e. beyond SQS) would require extensions to SPARQL syntax, we decided to avoid that for the time being.
Hmm, I wonder if I can make this generic enough to work for any combination of classes and properties without having to predefine Stored Queries?
It also needs to tap into the language fallback functionality:
So a complete, more generic example would be:
The class :Country has properties :name, :headOfState.
The class :Person has properties :name, :birthDate
We can determine a priori that :name is an rdf:langString and want to use that information to apply a fallback.
We want to get the top 5 Countries ranked by population and we want to get 3 Heads of State ranked by birthDate.
So basically:
?country a :Country;
:population ?population .
:name ?name . // Apply Fallback
ORDER BY ?population
LIMIT 5
// Correlated Query:
?country :headOfState ?Person;
:birthDate ?birthDate .
:name ?name . // Apply Fallback
ORDER BY ?birthDate
LIMIT 3
Can this be achieved? I.e. how customizable is the SQS? Can it accept entire graph patterns?
(We have 1000s of variations on this theme so a generic solution is important )
Just one more thought ... if SQS is the only option ... is it possible to create those temporarily on the fly without much penalty?
Ex:
"Scan Query For Correlated Subqueries"
Create SQS
Run query
Clean up SQS if not needed
(It seems to take too many milliseconds using stardog-admin for this workaround approach to be valid in the wild ... But trying to think of generic solutions ...)
SQS is currently the only way to use correlated subqueries in Stardog. We're definitely interested in supporting correlated execution for general subqueries but we aren't yet ready to offer a syntactic extension for that. I guess it's technically possible to provide a SERVICE similar to SQS but where the query would be represented explicitly in the body (rather than referenced by name), and evaluate that in the correlated way... but we don't support it yet.
SQS is very flexible since you can use arbitrary graph patterns in your stored queries.
I understand the inconvenience of having to store queries on the fly. However if you're concerned with latency, you should do it programmatically using one of our supported APIs (Java or directly over HTTP) since invoking stardog-admin requires a launch of a client JVM which is probably where the milliseconds are spent.
We're definitely interested in supporting correlated execution for general subqueries but we aren't yet ready to offer a syntactic extension for that.
Thanks! Many +1 on this one
I guess it's technically possible to provide a SERVICE similar to SQS but where the query would be represented explicitly in the body (rather than referenced by name)
Hehe, that was what I was thinking. Good to know.
However if you're concerned with latency, you should do it programmatically using one of our supported APIs
Nice, yes, I tried and it seems to give me a decent 100-200ms performance to store a query. So we could perhaps split these queries into 2 steps:
Hash each type of subquery on the fly and upsert it by hashed ID to SQS
LIMIT does just works on the number bindings, it doesn't know the semantics you might be interested in. If there are multiple authors for a single blog post, indeed the number of possible bindings might already be more than the LIMIT defined. IF you have 5 author this will lead to 5 bindings aka rows.
In that case the way to go is to get the blog posts in a subquery first, then get for those blog posts the data:
SELECT * {
{SELECT * {?blogPost a :BlogPost } LIMIT 5} #get 5 blog posts here
?blogPost :author ?author .
}