A little baffled about results of performance on large databases using Stardog BI

I was experimenting with using Stardog BI as a federation tool on databases with large tables, with the goal of seeing if Stardog might be a more scalable federation engine than some of the dedicated federation engines, like Trino/Starburst .. especially when performing queries that involve joins on large tables.

  • I have a stack of the form: Stardog BI => Stardog VKG => Postgres.
  • The VKG, and corresponding model, were set up to produce an essentially "isomorphic mapping" of the underlying database.
  • The thinking was that stardog's delegation-oriented approach might be more performant when large tables are involved, because, by using a re-write and delegate strategy, it can benefit from the capabilities of SQL server that is feeding the VKG.

tl;dr:
The BI server seemed to partially deliver on this hope. The BI Server does produce SPARQL that works well when executed directly by the Stardog engine. However, in the particular test that I performed, the results never seemed to find their way back to the BI server. It's unclear why this is so ..

Details
For the test, I was using a Postgres server hosted by RNACentral (hh-pgsql-public.ebi.ac.uk). I focused on the pfmegrnargs.rnacen.rna table as a model organism, since it has about 39 million rows.

I made a simple VKG, and a corresponding model on top of a handful of the pfmegrnargs.rnacen tables, including the rna table.

Then I used DBeaver to attempt the following query on our Stardog BI server (this query timed out):

select * from 
rnacentral.`pfmegrnargs.rnacen.rna` as rna1
inner join
(SELECT upi
FROM rnacentral.`pfmegrnargs.rnacen.rna` where seq_long is not NULL limit 100) as rna2
on rna1.upi = rna2.upi

To test that there was not something intrinsically wrong with the arrangement, I ran the following two queries:

-- test inner query .. about 5 seconds
SELECT upi
FROM rnacentral.`pfmegrnargs.rnacen.rna` where seq_long is not NULL limit 100
-- cousin of the left side of join .. about 750ms
select * from 
rnacentral.`pfmegrnargs.rnacen.rna` limit 100 

But here's where it seems to get interesting ..
I took the SPARQL that Stardog BI had generated, and ran it in Stardog Studio. That only took about 2 seconds to yield a result. I took the SQL from Stardog's query plan, and executed it directly on the Postgres server, and it was only a few hundred ms faster.

That leaves me puzzled as to why the BI Server timed out. Given that the VKG was quick to poduce a fairlycompact result, I would have expected the BI server to have succeeded too, and to have added a "small but reasonable" overhead in order to transform the results of the SPARQL query into a SQL Result Table.

# SPARQL generated using STARDOG BI, when fed the following SQL
#SQL:
#/* ApplicationName=DBeaver 23.2.1 - SQLEditor <Script-18.sql> */ select * from 
#rnacentral.`pfmegrnargs.rnacen.rna` as rna1
#inner join
#(SELECT upi
#FROM rnacentral.`pfmegrnargs.rnacen.rna` where seq_long is not NULL limit 100) as rna2
#on rna1.upi = rna2.upi

#SPARQL:
prefix : <http://api.stardog.com/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix stardog: <tag:stardog:api:>
prefix so: <https://schema.org/>
SELECT ?pfmegrnargs_rnacen_rna_id ?pfmegrnargs_rnacen_rna_crc64 ?pfmegrnargs_rnacen_rna_len ?pfmegrnargs_rnacen_rna_md5 ?pfmegrnargs_rnacen_rna_seq_long ?pfmegrnargs_rnacen_rna_seq_short ?pfmegrnargs_rnacen_rna_timestamp ?pfmegrnargs_rnacen_rna_upi ?pfmegrnargs_rnacen_rna_userstamp ?pfmegrnargs_rnacen_rna_0_upi 
WHERE {

  {

    {
      ?pfmegrnargs_rnacen_rna_id rdf:type <tag:stardog:designer:test_rnacentral:model:pfmegrnargs.rnacen.rna> . 
    }
    OPTIONAL {
      ?pfmegrnargs_rnacen_rna_id <tag:stardog:designer:test_rnacentral:model:crc64> ?pfmegrnargs_rnacen_rna_crc64 . 
    }
    OPTIONAL {
      ?pfmegrnargs_rnacen_rna_id <tag:stardog:designer:test_rnacentral:model:len> ?pfmegrnargs_rnacen_rna_len . 
    }
    OPTIONAL {
      ?pfmegrnargs_rnacen_rna_id <tag:stardog:designer:test_rnacentral:model:md5> ?pfmegrnargs_rnacen_rna_md5 . 
    }
    OPTIONAL {
      ?pfmegrnargs_rnacen_rna_id <tag:stardog:designer:test_rnacentral:model:seq_long> ?pfmegrnargs_rnacen_rna_seq_long . 
    }
    OPTIONAL {
      ?pfmegrnargs_rnacen_rna_id <tag:stardog:designer:test_rnacentral:model:seq_short> ?pfmegrnargs_rnacen_rna_seq_short . 
    }
    OPTIONAL {
      ?pfmegrnargs_rnacen_rna_id <tag:stardog:designer:test_rnacentral:model:timestamp> ?pfmegrnargs_rnacen_rna_timestamp . 
    }
    OPTIONAL {
      ?pfmegrnargs_rnacen_rna_id <tag:stardog:designer:test_rnacentral:model:upi> ?pfmegrnargs_rnacen_rna_upi . 
    }
    OPTIONAL {
      ?pfmegrnargs_rnacen_rna_id <tag:stardog:designer:test_rnacentral:model:userstamp> ?pfmegrnargs_rnacen_rna_userstamp . 
    }
    {
      SELECT ?pfmegrnargs_rnacen_rna_0_upi 
      WHERE {

        {
          ?pfmegrnargs_rnacen_rna_0_id rdf:type <tag:stardog:designer:test_rnacentral:model:pfmegrnargs.rnacen.rna> . 
        }
        OPTIONAL {
          ?pfmegrnargs_rnacen_rna_0_id <tag:stardog:designer:test_rnacentral:model:crc64> ?pfmegrnargs_rnacen_rna_0_crc64 . 
        }
        OPTIONAL {
          ?pfmegrnargs_rnacen_rna_0_id <tag:stardog:designer:test_rnacentral:model:len> ?pfmegrnargs_rnacen_rna_0_len . 
        }
        OPTIONAL {
          ?pfmegrnargs_rnacen_rna_0_id <tag:stardog:designer:test_rnacentral:model:md5> ?pfmegrnargs_rnacen_rna_0_md5 . 
        }
        OPTIONAL {
          ?pfmegrnargs_rnacen_rna_0_id <tag:stardog:designer:test_rnacentral:model:seq_long> ?pfmegrnargs_rnacen_rna_0_seq_long . 
        }
        OPTIONAL {
          ?pfmegrnargs_rnacen_rna_0_id <tag:stardog:designer:test_rnacentral:model:seq_short> ?pfmegrnargs_rnacen_rna_0_seq_short . 
        }
        OPTIONAL {
          ?pfmegrnargs_rnacen_rna_0_id <tag:stardog:designer:test_rnacentral:model:timestamp> ?pfmegrnargs_rnacen_rna_0_timestamp . 
        }
        OPTIONAL {
          ?pfmegrnargs_rnacen_rna_0_id <tag:stardog:designer:test_rnacentral:model:upi> ?pfmegrnargs_rnacen_rna_0_upi . 
        }
        OPTIONAL {
          ?pfmegrnargs_rnacen_rna_0_id <tag:stardog:designer:test_rnacentral:model:userstamp> ?pfmegrnargs_rnacen_rna_0_userstamp . 
        }
        FILTER (Bound(?pfmegrnargs_rnacen_rna_0_seq_long))
      }
      LIMIT 100
    }
  }
  FILTER (?pfmegrnargs_rnacen_rna_upi = ?pfmegrnargs_rnacen_rna_0_upi)
}