Incremental server backups

In Stardog 7.0 we introduced the server backup functionality that differs from the existing database backups in two important ways:

  1. Server backups contain all the databases including the system database so you do not need to back up each database separately.
  2. Server backups are incremental so when you create a new backup it will incrementally back up only the information that has changed since the last backup.

Server backup can be performed with the following command:

$ stardog-admin server backup

First time this command is executed, it will create the directory $STARDOG_HOME/.backup and create a full backup all databases. The subsequent executions of the command will incrementally update the backup directory with changes that have occurred since the previous backup command. The backup location can be changed by passing a directory as the command argument.

A natural step after the creation of the backup is to ship it to another machine or to an external storage service like S3 or Google Cloud Storage to avoid data loss if there is a catastrophic disk failure. Uploading the whole backup directory every time would defeat the benefits of incremental backups. Luckily you can use a tool like rclone that can incrementally synchronize the contents of a directory with 40 different storage systems. Running the following command would copy only the incremental changes after each backup:

$ rclone sync $STARDOG_HOME/.backup $TARGET_LOCATION

If you are running Stardog HA cluster and shipping your backups to an external storage system it would be unnecessary for each node to create backups and upload multiple copies of the backup to the same central location. There are two possible ways to avoid this redundancy:

  1. Create the backup in a standby node. Standby nodes do not replicate backup requests to cluster nodes so a single backup directory will be created in the standby node.
  2. Disable the replication of backup commands by setting pack.backups.replicated.scheme=none in stardog.properties.

Either of these options will ensure the Stardog node receiving the backup request will create the backup locally but will not replicate the request to other nodes in the cluster. However, in both cases, you will still need to make sure that the same Stardog node will receive the backup requests because otherwise there won't be a backup to incrementally update. This means you cannot send the backup command to the load balancer which would forward the request to a randomly chosen Stardog node every time.

Of course there are other ways to get the backups off the disk. For example, cloud providers like AWS and Azure provide services to create snapshots from a volume (and even do it incrementally). So you can create the server backup in a dedicated volume and snapshot that volume.

We hope you find this information useful and please let us know in the forum if you have any questions or comments about backups. Note that you can also find more information in the docs about backups and restores.

Best,
Evren