Backing Up & Restoring the Cluster

Overview

Kinetica provides two means of backing up data:

Database Backup

SQL commands can be used to initiate full, incremental, & differential hot backups and restorations of schema objects and data within the database.

Objects Backed UpObjects Not Backed Up
  • Contexts
  • Credentials
  • Data Sinks
  • Data Sources
  • Roles
  • SQL Procedures
  • Streams
  • Tables
  • Users
  • Views
  • Function Environments
  • Graphs
  • KiFS files
  • ML models/containers
  • Resource groups
  • Symbols
  • UDFs

Note

Backup Types

Three hot backup types are supported for database objects & data:

  • full - snapshot of the given database objects & data
  • incremental - changes in the database objects & data since the last backup of any kind
  • differential - changes in the database objects & data since the last full backup

Backup Storage

Database backup files will be transferred to the target specified in the given data sink. There, they will be stored under two levels of directories: the top-level directory will be the name of the database backup and the subdirectory will be the timestamp the backup was taken; e.g.:

/<backup_name>/ki_backup_info.json
/<backup_name>/<backup_name>.mdb
/<backup_name>/<backup_timestamp>/<backup_timestamp>.mdb
/<backup_name>/<backup_timestamp>/rank-<rank_number>/tom-<TOM_number>/*

A full backup will result in the creation of a directory with the corresponding backup name, as well as a backup timestamp directory with the full backup files. An incremental or differential backup for a given baseline backup will result in the creation of a timestamp directory (under the existing baseline backup's directory) containing all the files for that backup.

Backup Use Case

A typical usage of the backup feature is:

  • take an initial full backup
  • schedule an incremental or differential backup

Initial Full Backup

To create the initial full backup, run a CREATE BACKUP statement, in SQL, that specifies:

  • the name to use for the backed-up database object set
  • the data sink that will be used to transfer the backed-up files to the remote store (e.g., s3)
  • the set of database objects to back up

For example, to create an initial backup with the following parameters:

  • daily_backup - name of the backup file set
  • backup_ds - data sink targeting the remote file service
  • example_backup - name of the schema to back up

Create Full Backup Example
1
2
3
CREATE BACKUP daily_backup
DATA SINK = backup_ds
OBJECTS (ALL = example_backup)

Schedule Iterative Backups

To schedule incremental backups after the initial full backup is done, create a SQL procedure that specifies:

  • the name of the backed-up database object set (same as the full backup)
  • the data sink that will be used to transfer the backed-up files to the remote store (same as the full backup)
  • the schedule for running the incremental backups

For example, to schedule incremental backups with the following parameters:

  • daily_backup - name of the backup file set
  • backup_ds - data sink targeting the remote file service
  • 1 DAY - daily backup interval
  • STARTING AT...2025-01-01 - starting at a date in the past causes the backup to happen at the next possible time interval
  • STARTING AT...00:00:00 - schedule the backup to occur at midnight

Schedule Incremental Backups Example
1
2
3
4
5
6
CREATE PROCEDURE scheduled_backup
BEGIN 
    BACKUP daily_backup
    DATA SINK = backup_ds
END
EXECUTE FOR EVERY 1 DAY STARTING AT '2025-01-01 00:00:00';

System Backup

You can back up Kinetica data through Workbench. Backups can be used to perform a complete data restoration from the point in time the backup was taken. A backup can be requested while the cluster is in a running state; there is no need to suspend it first. Restoring data from a backup, on the other hand, does require the cluster to be suspended first.

Clicking on Snapshots, will display two lists:

  • Activity - Backup jobs that either are in progress or have failed (not shown if none are in either state)
  • Backups - Completed backups, ready to be used to restore the cluster

To perform a complete backup of all data in the cluster, click Backup Now.

To cancel an in-progress or failed backup, click the Cancel for that job's entry in the list.

To perform a complete restoration of all data from a snapshot, click the Restore for that backup's entry in the list, making sure the cluster has been suspended first. To delete a given backup, click its associated trash can icon.