Binds in parallel

Samur_Araujo · September 6, 2017, 1:13pm

Hi all, I have a query like this:

select ?s ?value where {
?s http://geophy.io/ontologies/system#value ?o .
bind(concat (?o, "%" ) as ?value) .
}

The problem with this query is that all bind + concat are executed sequentially and this is slow (I wanna use more complex functions that concat. I use concat as an example).

is there a way to for the bind and concat to run in parallel for all possible values of ?o ?

Best,

stephen · September 6, 2017, 2:16pm

Hi,

How many results is your query returning, on average? How long does the query take to complete? Is ?o ever bound to any non-literal values?

Samur_Araujo · September 7, 2017, 7:46am

Hi Stephen, we have around 1000 resources and all ?o are strings (which may represent a uri as well).

It become slow when I use a complex function on the bind. E.g:

select ?s ?value where {
?s http://geophy.io/ontologies/system#value ?o .
bind(restcall(?o) as ?value) .
}

restcall should make a http request and return a number to bind to ?value.

As a http request is slow, this only performs if the bind(restcall(?o) as ?value) would be executed in parallel.

is this possible somehow?

Best,

Samur_Araujo · September 7, 2017, 7:47am

Notice that parallel means all restcall executed in multithread way, as I assumes all ?o are resolved already.

Samur_Araujo · September 7, 2017, 8:04am

At the moment, the query takes 1 minute to complete.

pavel · September 7, 2017, 8:41am

Hi Samur,

No, this is currently not possible. We might consider something like that in the future, for really expensive UDFs, but for now your best course of action is to run multiple queries in parallel, each reading some non-overlapping part of the relevant data (i.e. ?s http://geophy.io/ontologies/system#value ?o), and doing the HTTP requests.

Alternatively you may try to collect all those 1000 resources in one query and batch them for HTTP requests, if possible. That’s likely to scale better than doing 1000 parallel HTTP requests.

Cheers,
Pavel

lorenz_b · September 7, 2017, 11:04am

Hi Pavel,

just a question from my side. I know that in general databases are able to parallelize the UNION operator. Does this also hold for Stardog? If so, one could also compute n subqueries for each chunk of the data and combine those results by the UNION operator. Indeed, that’s more or less the same as executing n separate queries.

Kind regards,
Lorenz

pavel · September 7, 2017, 11:10am

Yeah, that’s a good point. We have looked into it at some point in the context of reasoning (since query rewriting transforms the original query into a UCQ, union of conjunctive queries). But at this point this isn’t enabled in Stardog and queries run single-threaded. It’s certainly something we plan to return to in some not very distant future.

Cheers,
Pavel

system · September 21, 2017, 11:10am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
BIND of a variable with CONCATs Bug	4	869	July 24, 2018
Query Throughtput Stardog Support	5	527	May 16, 2017
Bind join algorithm Support	2	923	February 27, 2019
Query still slow after optimize db Support	8	418	February 4, 2021
Query endpoint optimisation Support	4	475	July 22, 2020

Binds in parallel

Related topics