Aggregate a database backup chain

Command

The aggregate command turns a a chain of backup artifacts into a single full backup artifact.

backup chain aggregation

The benefits of aggregating a backup chain are notably:

  • Reduces the size of backup artifacts in a given backup folder.

  • Keeps the recovery time objective (RTO) low by generating a single backup artifact ready to be restored. As part of the aggregation, transactions contained in the differential backups are applied to the store contained in the full backup artifact. This operation is called recovery and can be costly.

  • Reduces the risk of losing chain’s links.

Syntax

neo4j-admin database aggregate-backup [-h] [--expand-commands]
                                      [--verbose] [--keep-old-backup[=true|false]]
                                      [--parallel-recovery[=true|false]]
                                      [--additional-config=<file>] --from-path=<path>
                                      [--temp-path=<path>] [<database>]

Description

Aggregates a chain of backup artifacts into a single artifact.

Parameters

Table 1. neo4j-admin database aggregate-backup parameters
Parameter Description

[<database>]

Name of the database for which to aggregate the artifacts. Can contain * and ? for globbing.

Options

Table 2. neo4j-admin database aggregate-backup options
Option Description Default

--additional-config=<file>[1]

Configuration file with additional configuration.

--expand-commands

Allow command expansion in config value evaluation.

--from-path=<path>

Accepts either a path to a single artifact file or a folder containing backup artifacts.

When a file is supplied, the <database> parameter should be omitted. It is possible to aggregate backup artifacts from AWS S3 buckets, Google Cloud storage buckets, and Azure buckets using the appropriate URI as the path.

-h, --help

Show this help message and exit.

--keep-old-backup[=true|false]

If set to true, the old backup chain is not removed.

false

--parallel-recovery[=true|false]

Allow multiple threads to apply pulled transactions to a backup in parallel. For some databases and workloads, this may reduce aggregate times significantly. Note: this is an EXPERIMENTAL option. Consult Neo4j support before use.

false

--temp-path=<path>

Introduced in 5.24 Provide a path to a temporary empty directory for storing backup files until the command is completed. The files will be deleted once the command is finished.

--verbose

Enable verbose output.

1. See Tools → Configuration for details.

The --from-path=<path> option can also load backup artifacts from AWS S3 buckets (from Neo4j 5.19), Google Cloud storage buckets (from Neo4j 5.21), and Azure buckets (from Neo4j 5.24). For more information, see Aggregating a backup chain located in a cloud storage.

Neo4j 5.24 introduces the --temp-path option to address potential issues related to disk space when performing backup-related commands, especially when cloud storage is involved.

If --temp-path is not set, a temporary directory is created inside the directory specified by the --from-path option.

If you don’t provide the --from-path option or if your provided path points to a cloud storage bucket, a temporary folder is created inside the current working directory for Neo4j. This fallback option can cause issues because the local filesystem (or the partition where Neo4j is installed) may not have enough free disk to accommodate the intermediate computation.

Therefore, it is strongly recommended to provide a --temp-path option.

Examples

Aggregating a backup chain located in a given folder

The following is an example of how to perform aggregation of a set of backups located in a given folder for the neo4j database:

bin/neo4j-admin database aggregate-backup --from-path=/mnt/backups/ neo4j

The command first looks inside the /mnt/backups/ directory for a backup chain for the database neo4j. If found, it is then aggregated into a single backup artifact.

Aggregating a backup chain identified using a given backup file

The following is an example of how to perform aggregation of a set of backups identified using a given backup file for the neo4j database:

bin/neo4j-admin database aggregate-backup --from-path=/mnt/backups/neo4j-2022-10-18T13-00-07.backup

The command checks the /mnt/backups/ directory for a backup chain including the file neo4j-2022-10-18T13-00-07.backup, for the database neo4j. If found, it is then aggregated into a single backup artifact. This option is only available in Neo4j 5.2 and later.

Aggregating a backup chain located in a cloud storage

The following examples show how to perform aggregation of a set of backups located in a cloud storage.

Neo4j uses the AWS SDK v2 to call the APIs on AWS using AWS URLs. Alternatively, you can override the endpoints so that the AWS SDK can communicate with alternative storage systems, such as Ceph, Minio, or LocalStack, using the system variables aws.endpointUrls3, aws.endpointUrlS3, or aws.endpointUrl, or the environments variables AWS_ENDPOINT_URL_S3 or AWS_ENDPOINT_URL.

  1. Install the AWS CLI by following the instructions in the AWS official documentation — Install the AWS CLI version 2.

  2. Create an S3 bucket and a directory to store the backup files using the AWS CLI:

    aws s3 mb --region=us-east-1 s3://myBucket
    aws s3api put-object --bucket myBucket --key myDirectory/

    For more information on how to create a bucket and use the AWS CLI, see the AWS official documentation — Use Amazon S3 with the AWS CLI and Use high-level (s3) commands with the AWS CLI.

  3. Verify that the ~/.aws/config file is correct by running the following command:

    cat ~/.aws/config

    The output should look like this:

    [default]
    region=us-east-1
  4. Configure the access to your AWS S3 bucket by setting the aws_access_key_id and aws_secret_access_key in the ~/.aws/credentials file and, if needed, using a bucket policy. For example:

    1. Use aws configure set aws_access_key_id aws_secret_access_key command to set your IAM credentials from AWS and verify that the ~/.aws/credentials is correct:

      cat ~/.aws/credentials

      The output should look like this:

      [default]
      aws_access_key_id=this.is.secret
      aws_secret_access_key=this.is.super.secret
    2. Additionally, you can use a resource-based policy to grant access permissions to your S3 bucket and the objects in it. Create a policy document with the following content and attach it to the bucket. Note that both resource entries are important to be able to download and upload files.

      {
          "Version": "2012-10-17",
          "Id": "Neo4jBackupAggregatePolicy",
          "Statement": [
              {
                  "Sid": "Neo4jBackupAggregateStatement",
                  "Effect": "Allow",
                  "Action": [
                      "s3:ListBucket",
                      "s3:GetObject",
                      "s3:PutObject",
                      "s3:DeleteObject"
                  ],
                  "Resource": [
                      "arn:aws:s3:::myBucket/*",
                      "arn:aws:s3:::myBucket"
                  ]
              }
          ]
      }
  5. Then, use the following command to aggregate the backup chain located in a given folder in your AWS S3 bucket. The example assumes that you have a backup chain located in the myBucket/myDirectory folder identifiable by the file myBackup.backup:

    bin/neo4j-admin database aggregate-backup --from-path=s3://myBucket/myDirectory/myBackup.backup mydatabase
  1. Ensure you have a Google account and a project created in the Google Cloud Platform (GCP).

    1. Install the gcloud CLI by following the instructions in the Google official documentation — Install the gcloud CLI.

    2. Create a service account and a service account key using Google official documentation — Create service accounts and Creating and managing service account keys.

    3. Download the JSON key file for the service account.

    4. Set the GOOGLE_APPLICATION_CREDENTIALS and GOOGLE_CLOUD_PROJECT environment variables to the path of the JSON key file and the project ID, respectively:

      export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
      export GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID
    5. Authenticate the gcloud CLI with the e-mail address of the service account you have created, the path to the JSON key file, and the project ID:

      gcloud auth activate-service-account service-account@example.com --key-file=$GOOGLE_APPLICATION_CREDENTIALS --project=$GOOGLE_CLOUD_PROJECT

      For more information, see the Google official documentation — gcloud auth activate-service-account.

    6. Create a bucket in the Google Cloud Storage using Google official documentation — Create buckets.

    7. Verify that the bucket is created by running the following command:

      gcloud storage ls

      The output should list the created bucket.

  2. Then, use the following command to aggregate the backup chain located in a given folder in your Google storage bucket. The example assumes that you have a backup chain located in the myBucket/myDirectory folder identifiable by the file myBackup.backup:

    bin/neo4j-admin database aggregate-backup --from-path=gs://myBucket/myDirectory/myBackup.backup mydatabase
  1. Ensure you have an Azure account, an Azure storage account, and a blob container.

    1. You can create a storage account using the Azure portal.
      For more information, see the Azure official documentation on Create a storage account.

    2. Create a blob container in the Azure portal.
      For more information, see the Azure official documentation on Quickstart: Upload, download, and list blobs with the Azure portal.

  2. Install the Azure CLI by following the instructions in the Azure official documentation — Azure official documentation.

  3. Authenticate the neo4j or neo4j-admin process against Azure using the default Azure credentials.
    See the Azure official documentation on default Azure credentials for more information.

    az login

    Then you should be ready to use Azure URLs in either neo4j or neo4j-admin.

  4. To validate that you have access to the container with your login credentials, run the following commands:

    # Upload a file:
    az storage blob upload --file someLocalFile  --account-name accountName - --container someContainer --name remoteFileName  --auth-mode login
    
    # Download the file
    az storage blob download  --account-name accountName --container someContainer --name remoteFileName --file downloadedFile --auth-mode login
    
    # List container files
    az storage blob list  --account-name someContainer --container someContainer  --auth-mode login
  5. Then, use the following command to aggregate the backup chain located in a given folder in your Azure blob storage container. The example assumes that you have a backup chain located with a myStorageAccount/myContainer/myDirectory folder identifiable by the file myBackup.backup:

    bin/neo4j-admin database aggregate-backup --from-path=azb://myStorageAccount/myContainer/myDirectory/myBackup.backup mydatabase