Limit and sort per level?

Perhaps a basic SPARQL question but ...

We have:
:BlogPost :author :Author
:Author :interest :Topic

  1. How do we retrieve max 3 Blog posts with max 2 Authors each and max 1 Topic each? (Ideally with sorting possible at each level.)

  2. How would we retrieve the above so that nesting it (into something like the below) would be facilitated:

items = [ 
   {
     id: "my-blog-post"
     type: "BlogPost"
     author: [ { 
       id: "john-stevenson",
       type: "Author",
       interest: [ { 
           id : "economy",
           type: "Topic"
        } ]
      ... 1 more
   } ]
  ... 2 more
]

Thanks!

As basic as it may sound, you cannot do it in pure SPARQL with a single query. A LIMIT requires a subquery but subqueries in SPARQL are uncorrelated i.e. they're executed once while you need the (say, authors) subquery to execute once per blog post (so you get 2 authors per post, not 2 authors in total).

Again, you need correlated subqueries which in Stardog means Stored Query Service. Something like:

select * {
  { select * { ?post a :Post } # some pattern to select posts, e.g. by date range, etc. 
    limit 3 } 
  service<query://authors> { [] sqs:input ?post ; sqs:vars ?author } # select 2 authors max per post
  service<query://topics> { [] sqs:input ?author ; sqs:vars ?topic } # select 1 topic per author
}

I haven't tried it so syntax could be slightly off but hopefully you get the idea.

Cheers,
Pavel

Aha! Thank you for the clear explanation!

After some more digging I also found this:

Is the array() an option to SQS?

Was also mentioned here:

Yeah, it's an old post which we wrote at the time when we first realised that supporting path queries as subqueries (in, say, SELECT queries) is going to be difficult because their results do not quite fit into the SPARQL notion of "solution" (a fixed-size set of variable bindings). That was before SQS. SQS works a bit differently than was envisioned in the post: the solution is still fixed-size but actual values could be arrays. If you execute path (sub)queries and project ?path, you may see those arrays in their internal representation. There're some functions to handle them: str, stardog:length, stardog:any, and stardog:all.

Yes, the SPARQL 1.2 issue is exactly there to add support for correlated subqueries (actually "lateral" in Postgres would be more accurate because they're not limit to a single column value). As you discovered, its use cases could be very basic. It'd be a pretty major extension though so we decided not to wait but instead added support to SQS. Supporting correlated subqueries without service (i.e. beyond SQS) would require extensions to SPARQL syntax, we decided to avoid that for the time being.

Thank you.

Hmm, I wonder if I can make this generic enough to work for any combination of classes and properties without having to predefine Stored Queries?

It also needs to tap into the language fallback functionality:

So a complete, more generic example would be:

  1. The class :Country has properties :name, :headOfState.

  2. The class :Person has properties :name, :birthDate

  3. We can determine a priori that :name is an rdf:langString and want to use that information to apply a fallback.

  4. We want to get the top 5 Countries ranked by population and we want to get 3 Heads of State ranked by birthDate.

So basically:

?country a :Country;
   :population ?population .
   :name ?name . // Apply Fallback

   ORDER BY ?population
   LIMIT 5

// Correlated Query:

?country  :headOfState ?Person;
   :birthDate ?birthDate .
   :name ?name . // Apply Fallback

   ORDER BY ?birthDate
   LIMIT 3

Can this be achieved? I.e. how customizable is the SQS? Can it accept entire graph patterns?
(We have 1000s of variations on this theme so a generic solution is important :sweat_smile:)

Thanks!

Just one more thought ... if SQS is the only option ... is it possible to create those temporarily on the fly without much penalty?

Ex:

  1. "Scan Query For Correlated Subqueries"
  2. Create SQS
  3. Run query
  4. Clean up SQS if not needed

(It seems to take too many milliseconds using stardog-admin for this workaround approach to be valid in the wild ... But trying to think of generic solutions ...)

SQS is currently the only way to use correlated subqueries in Stardog. We're definitely interested in supporting correlated execution for general subqueries but we aren't yet ready to offer a syntactic extension for that. I guess it's technically possible to provide a SERVICE similar to SQS but where the query would be represented explicitly in the body (rather than referenced by name), and evaluate that in the correlated way... but we don't support it yet.

SQS is very flexible since you can use arbitrary graph patterns in your stored queries.

I understand the inconvenience of having to store queries on the fly. However if you're concerned with latency, you should do it programmatically using one of our supported APIs (Java or directly over HTTP) since invoking stardog-admin requires a launch of a client JVM which is probably where the milliseconds are spent.

Best,
Pavel

1 Like

We're definitely interested in supporting correlated execution for general subqueries but we aren't yet ready to offer a syntactic extension for that.

Thanks! Many +1 on this one :wink:

I guess it's technically possible to provide a SERVICE similar to SQS but where the query would be represented explicitly in the body (rather than referenced by name)

Hehe, that was what I was thinking. Good to know.

However if you're concerned with latency, you should do it programmatically using one of our supported APIs

Nice, yes, I tried and it seems to give me a decent 100-200ms performance to store a query. So we could perhaps split these queries into 2 steps:

  1. Hash each type of subquery on the fly and upsert it by hashed ID to SQS
  2. Run the query using said hashes

ps. I was able to use the POST to add a new SQS. But the PUT doesn't update the query body for me (still returning 204 as if the query worked). Could it be a bug?
https://stardog-union.github.io/http-docs/#operation/updateStoredQuery

Thanks!

One more related question ...

Does this limitation also prevent us from getting correct number of results at the top level?

For example if we want to get exactly 5 blogs posts but also fetch related all related authors in the same request?

SELECT * {
   
   ?blogPost a :BlogPost ;
       :author ?author .
}
LIMIT 5

How could we ensure to always get at most 5 blog posts here? Would this use a "normal subquery"?

LIMIT does just works on the number bindings, it doesn't know the semantics you might be interested in. If there are multiple authors for a single blog post, indeed the number of possible bindings might already be more than the LIMIT defined. IF you have 5 author this will lead to 5 bindings aka rows.

In that case the way to go is to get the blog posts in a subquery first, then get for those blog posts the data:

SELECT * {
{SELECT * {?blogPost a :BlogPost } LIMIT 5} #get 5 blog posts here
?blogPost :author ?author .
}

2 Likes

Thank you! That clarifies this part :pray:t2:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.