Stardog Database Size

Good afternoon,

Is there any command to know the database and triple size in bytes ?
I tried to get the size using the database folder size, but after a big data load the folder still has the same size (20kb).

Thanks.

You can run stardog data size --exact myDb on the CLI

Hi Stephen, that command shows only the triple count. I need the size in Bytes.

:~$ stardog data size --exact myDb

Size: 6,776 triples.

Thanks.

Sorry, you can't do it. Stardog 7 manages data for all databases in a single RocksDB instance with multiple column families and there's no user-exposed way to calculate the size for just one. If there's only one database, then the size of the data dir in $STARDOG_HOME won't be far off.

Why do you need to know it?

Best,
Pavel

Hello Pavel
I need to calculate the size of the disk that I have to buy to maintain my databases using the number of triples.
If I have a database with one billion triples expecting to reach 10 billion, what is the disk space required?
Is there any calculation available based on the number of triples?

OK, that makes sense. The thing about size estimation is that it's highly dependent on two things:

  1. the number of distinct RDF terms in your data (roughly, how sparse the graph is)
  2. the average length of IRIs and literals in your data

These things vary a lot across datasets. Some datasets can use super long literals for like DNA sequences or something. So your best course of action is to measure the size of your $STARDOG_HOME for increasingly large subsets of your data and then extrapolating that to 10B or whatever. It's typically easy to make such experiments in the cloud, e.g. EC2.

Note that during bulk loading Stardog can use extra space for partially sorted indexes which will then be cleaned up when the process is complete.

Cheers,
Pavel

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.