Query Throughtput Stardog

Samur_Araujo · May 1, 2017, 10:44am

Hi all, I would like to have a thoughtput of 100k queries per second for queries of the type:

Where ID will range from 1..100M

Currently I can have a max throughput of 1000 queries per second. The queries return 26 triples, in average .

This could be translated to a 26K triples per second, which seems to be very very slow.

The task I have is that I need to return all triples of a large amount of specific resources.

We have tried these queries as well:
"0 - Triples: 31186 Time elapsed (1000 runs): 1.73s, describe http://geophy.io/buildings/id"
"1 - Triples: 31250 Time elapsed (1000 runs): 1.48s, select * where { http://geophy.io/buildings/id ?p ?o } "
"2 - Triples: 31250 Time elapsed (1000 runs): 2.75s, CONSTRUCT { ?s ?p ?o } where {?s ?p ?o filter (?s= http://geophy.io/buildings/id )}"
"3 - Triples: 31250 Time elapsed (1000 runs): 2.69s, CONSTRUCT { http://geophy.io/buildings/id ?p ?o } where {http://geophy.io/buildings/id ?p ?o }"

Is there any strategy that I could use to speed up the query throughput ?

pavel · May 1, 2017, 11:51am

My suggestion would be to batch resources on the client side and run fewer queries of the form

SELECT * WHERE {
?s ?p ?o  .
VALUES ?s { IRI_1 IRI_2 ... IRI_n }
}
ORDER BY ?s

This should improve throughput by reducing the number of client/server exchanges. You can run experiments for different values of n.

Cheers,
Pavel

evren · May 1, 2017, 1:23pm

Note that, you can also combine VALUES with DESCRIBE:

DESCRIBE ?s VALUES ?s { IRI_1 IRI_2 ... IRI_n }

Other possible ways to improve performance:

Execute queries in parallel. Depending on how many cores you have on the client and the server you can use 4, 8 or maybe more threads that will improve the throughput.
Experiment with different formats. NTriples vs Turtle might make a difference but which one will be faster depends on your triples.
Increasing the server heap size might improve caching efficiency depending on your data size and heap.
Alternatively using a caching layer between Stardog and your app would help too.

If none of these suggestions work then you might change your data layout such that the resources you want to export are stored in a specific named graph and you can use data export -g named-graph or the equivalent SPARQL query.

Best,
Evren

Samur_Araujo · May 2, 2017, 7:20am

Thank you all. This definitely improved the performance.

hmottestad · May 2, 2017, 12:06pm

Running without transactions will also speed things up. So if you are doing .begin() and .commit() around your query then removing those will help speed things up.

system · May 16, 2017, 12:17pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query performance issues Support	8	1248	February 28, 2017
Scalability issues in Stardog Cloud Support	16	418	November 8, 2024
Query still slow after optimize db Support	8	417	February 4, 2021
Query endpoint optimisation Support	4	474	July 22, 2020
Query Performance Has Changed Support	5	376	October 21, 2019

Query Throughtput Stardog

Related topics