Back up, aggregate, and restore (online)
For performing backups, Neo4j uses the Admin Service, which is only available inside the Kubernetes cluster and access to it should be guarded. For more information, see Accessing Neo4j. |
Prepare to back up a database(s) to a cloud provider (AWS, GCP, and Azure) bucket
You can perform a backup of a Neo4j database(s) to any cloud provider (AWS, GCP, and Azure) bucket using the neo4j/neo4j-admin Helm chart. From Neo4j 5.10, the neo4j/neo4j-admin Helm chart also supports performing a backup of multiple databases. From 5.13, the neo4j/neo4j-admin Helm chart also supports workload identity integration for GCP, AWS, and Azure. From 5.14, the neo4j/neo4j-admin Helm chart also supports MinIO (an AWS S3-compatible object storage API) for Non-TLS/SSL endpoints.
Prerequisites
Before you can back up a database and upload it to your bucket, verify that you have the following:
-
A cloud provider bucket (AWS, GCP, or Azure) with read and write access to be able to upload the backup.
-
Credentials to access the cloud provider bucket, such as a service account JSON key file for GCP, a credentials file for AWS, or storage account credentials for Azure.
-
A service account with workload identity if you want to use workload identity integration to access the cloud provider bucket.
-
For more information on setting up a service account with workload identity on GCP and AWS, see:
-
For more information on setting up an Azure storage account with workload identity, Microsoft Azure → Use Microsoft Entra Workload ID with Azure Kubernetes Service (AKS)
-
-
A Kubernetes cluster running on one of the cloud providers with the Neo4j Helm chart installed. For more information, see Quickstart: Deploy a standalone instance or Quickstart: Deploy a cluster.
-
MinIO server (an AWS S3-compatible object storage API) if you want to push your backups to a MinIO bucket. For more information, see MinIO official documentation.
-
The latest Neo4j Helm charts. You can update the repository to get the latest charts using
helm repo update
.
Create a Kubernetes secret
You can create a Kubernetes secret with the credentials that can access the cloud provider bucket using one of the following options:
Create the secret named gcpcreds
using your GCP service account JSON key file.
The JSON key file contains all the details of the service account that has access to the bucket.
kubectl create secret generic gcpcreds --from-file=credentials=/path/to/gcpcreds.json
-
Create a credentials file in the following format:
[ default ] region = us-east-1 aws_access_key_id = <your-aws_access_key_id> aws_secret_access_key = <your-aws_secret_access_key>
-
Create the secret named
awscreds
via the credentials file:kubectl create secret generic awscreds --from-file=credentials=/path/to/your/credentials
-
Create a credentials file in the following format:
AZURE_STORAGE_ACCOUNT_NAME=<your-azure-storage-account-name> AZURE_STORAGE_ACCOUNT_KEY=<your-azure-storage-account-key>
-
Create the secret named
azurecred
via the credentials file:kubectl create secret generic azurecred --from-file=credentials=/path/to/your/credentials
Configure the backup parameters
You can configure the backup parameters in the backup-values.yaml file either by using the secretName
and secretKeyName
parameters or by mapping the Kubernetes service account
to the workload identity integration.
The following examples show the minimum configuration required to perform a backup to a cloud provider bucket. For more information about the available backup parameters, see Backup parameters. |
Configure the backup-values.yaml file using the secretName
and secretKeyName
parameters
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.10.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin" #This is the Neo4j Admin Service name.
database: "neo4j,system"
cloudProvider: "gcp"
secretName: "gcpcreds"
secretKeyName: "credentials"
consistencyCheck:
enabled: true
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.10.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
database: "neo4j,system"
cloudProvider: "aws"
secretName: "awscreds"
secretKeyName: "credentials"
consistencyCheck:
enabled: true
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.10.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
database: "neo4j,system"
cloudProvider: "azure"
secretName: "azurecreds"
secretKeyName: "credentials"
consistencyCheck:
enabled: true
Configure the backup-values.yaml file using service account workload identity integration
In certain situations, it may be useful to assign a Kubernetes Service Account with workload identity integration to the Neo4j backup pod. This is particularly relevant when you want to improve security and have more precise access control for the pod. Doing so ensures that secure access to resources is granted based on the pod’s identity within the cloud ecosystem. For more information on setting up a service account with workload identity, see Google Kubernetes Engine (GKE) → Use Workload Identity, Amazon EKS → Configuring a Kubernetes service account to assume an IAM role, and Microsoft Azure → Use Microsoft Entra Workload ID with Azure Kubernetes Service (AKS).
To configure the Neo4j backup pod to use a Kubernetes service account with workload identity, set serviceAccountName
to the name of the service account to use.
For Azure deployments, you also need to set the azureStorageAccountName
parameter to the name of the Azure storage account, where the backup files will be uploaded.
For example:
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.13.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin" #This is the Neo4j Admin Service name.
database: "neo4j,system"
cloudProvider: "gcp"
secretName: ""
secretKeyName: ""
consistencyCheck:
enabled: true
serviceAccountName: "demo-service-account"
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.13.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
database: "neo4j,system"
cloudProvider: "aws"
secretName: ""
secretKeyName: ""
consistencyCheck:
enabled: true
serviceAccountName: "demo-service-account"
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.13.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
database: "neo4j,system"
cloudProvider: "azure"
azureStorageAccountName: "storageAccountName"
consistencyCheck:
enabled: true
serviceAccountName: "demo-service-account"
The /backups mount created by default is an emptyDir type volume. This means that the data stored in this volume is not persistent and will be lost when the pod is deleted. To use a persistent volume for backups add the following section to the backup-values.yaml file:
tempVolume:
persistentVolumeClaim:
claimName: backup-pvc
You need to create the persistent volume and persistent volume claim before installing the neo4j-admin Helm chart. For more information, see Volume mounts and persistent volumes. |
Configure the backup-values.yaml file for using MinIO
This feature is available from Neo4j 5.14.
MinIO is an AWS S3-compatible object storage API.
You can specify the minioEndpoint
parameter in the backup-values.yaml file to push your backups to your MinIO bucket.
This endpoint must be a s3 API endpoint or else the backup Helm chart will fail.
Only non-TLS/SSL endpoints are supported.
For example:
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.14.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
minioEndpoint: "http://demo.minio.svc.cluster.local:9000"
database: "neo4j,system"
cloudProvider: "aws"
secretName: "awscreds"
secretKeyName: "credentials"
consistencyCheck:
enabled: true
Prepare to back up a database(s) to on-premises storage
This feature is available from Neo4j 5.16.
You can perform a backup of a Neo4j database(s) to on-premises storage using the neo4j/neo4j-admin Helm chart.
When configuring the backup-values.yaml file, keep the “cloudProvider” field empty and provide a persistent volume in the tempVolume
section to ensure the backup files are persistent if the pod is deleted.
You need to create the persistent volume and persistent volume claim before installing the neo4j-admin Helm chart. For more information, see Volume mounts and persistent volumes. |
For example:
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.16.0"
jobSchedule: "* * * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
backoffLimit: 3
backup:
bucketName: "my-bucket"
databaseAdminServiceName: "standalone-admin"
database: "neo4j,system"
cloudProvider: ""
consistencyCheck:
enabled: true
tempVolume:
persistentVolumeClaim:
claimName: backup-pvc
Backup parameters
To see what options are configurable on the Helm chart use helm show values
and the Helm chart neo4j/neo4j-admin.
From Neo4j 5.10, the neo4j/neo4j-admin Helm chart also supports assigning your Neo4j pods to specific nodes using nodeSelector
labels, and from Neo4j 5.11, using affinity/anti-affinity rules or tolerations.
For more information, see Assigning backup pods to specific nodes and the Kubernetes official documentation on Affinity and anti-affinity rules and Taints and Tolerations.
For example:
helm show values neo4j/neo4j-admin
## @param nameOverride String to partially override common.names.fullname
nameOverride: ""
## @param fullnameOverride String to fully override common.names.fullname
fullnameOverride: ""
# disableLookups will disable all the lookups done in the helm charts
# This should be set to true when using ArgoCD since ArgoCD uses helm template and the helm lookups will fail
# You can enable this when executing helm commands with --dry-run command
disableLookups: false
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.21.0"
podLabels: {}
# app: "demo"
# acac: "dcdddc"
podAnnotations: {}
# ssdvvs: "svvvsvs"
# vfsvswef: "vcfvgb"
# define the backup job schedule . default is * * * * *
jobSchedule: ""
# default is 3
successfulJobsHistoryLimit:
# default is 1
failedJobsHistoryLimit:
# default is 3
backoffLimit:
#add labels if required
labels: {}
backup:
# Ensure the bucket is already existing in the respective cloud provider
# In case of azure the bucket is the container name in the storage account
# bucket: azure-storage-container
bucketName: ""
#address details of the neo4j instance from which backup is to be done (serviceName or ip either one is required)
#ex: standalone-admin.default.svc.cluster.local:6362
# admin service name - standalone-admin
# namespace - default
# cluster domain - cluster.local
# port - 6362
#ex: 10.3.3.2:6362
# admin service ip - 10.3.3.2
# port - 6362
databaseAdminServiceName: ""
databaseAdminServiceIP: ""
#default name is 'default'
databaseNamespace: ""
#default port is 6362
databaseBackupPort: ""
#default value is cluster.local
databaseClusterDomain: ""
# specify minio endpoint ex: http://demo.minio.svc.cluster.local:9000
# please ensure this endpoint is the s3 api endpoint or else the backup helm chart will fail
# as of now it works only with non tls endpoints
# to be used only when aws is used as cloudProvider
minioEndpoint: ""
#name of the database to backup ex: neo4j or neo4j,system (You can provide command separated database names)
# In case of comma separated databases failure of any single database will lead to failure of complete operation
database: ""
# cloudProvider can be either gcp, aws, or azure
# if cloudProvider is empty then the backup will be done to the /backups mount.
# the /backups mount can point to a persistentVolume based on the definition set in tempVolume
cloudProvider: ""
# name of the kubernetes secret containing the respective cloud provider credentials
# Ensure you have read,write access to the mentioned bucket
# For AWS :
# add the below in a file and create a secret via
# 'kubectl create secret generic awscred --from-file=credentials=/demo/awscredentials'
# [ default ]
# region = us-east-1
# aws_access_key_id = XXXXX
# aws_secret_access_key = XXXX
# For AZURE :
# add the storage account name and key in below format in a file create a secret via
# 'kubectl create secret generic azurecred --from-file=credentials=/demo/azurecredentials'
# AZURE_STORAGE_ACCOUNT_NAME=XXXX
# AZURE_STORAGE_ACCOUNT_KEY=XXXX
# For GCP :
# create the secret via the gcp service account json key file.
# ex: 'kubectl create secret generic gcpcred --from-file=credentials=/demo/gcpcreds.json'
secretName: ""
# provide the keyname used in the above secret
secretKeyName: ""
# provide the azure storage account name
# this to be provided when you are using workload identity integration for azure
azureStorageAccountName: ""
#setting this to true will not delete the backup files generated at the /backup mount
keepBackupFiles: true
#Below are all neo4j-admin database backup flags / options
#To know more about the flags read here : https://neo4j.com/docs/operations-manual/current/backup-restore/online-backup/
pageCache: ""
includeMetadata: "all"
type: "AUTO"
keepFailed: false
parallelRecovery: false
verbose: true
heapSize: ""
# https://neo4j.com/docs/operations-manual/current/backup-restore/aggregate/
# Performs aggregate backup. If enabled, NORMAL BACKUP WILL NOT BE DONE only aggregate backup
# fromPath supports only s3 or local mount. For s3 , please set cloudProvider to aws and use either serviceAccount or creds
aggregate:
enabled: false
verbose: true
keepOldBackup: false
parallelRecovery: false
# Only AWS S3 or local mount paths are supported
# For S3 provide the complete path , Ex: s3://bucket1/bucket2
fromPath: ""
# database name to aggregate. Can contain * and ? for globbing.
database: ""
#Below are all neo4j-admin database check flags / options
#To know more about the flags read here : https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin/consistency-checker/
consistencyCheck:
enable: false
checkIndexes: true
checkGraph: true
checkCounts: true
checkPropertyOwners: true
#The database name for which consistency check needs to be done.
#Defaults to the backup.database values if left empty
#The database name here should match with one of the database names present in backup.database. If not , the consistency check will be ignored
database: ""
maxOffHeapMemory: ""
threads: ""
verbose: true
# Set to name of an existing Service Account to use if desired
# Follow the following links for setting up a service account with workload identity
# Azure - https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=go
# GCP - https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
# AWS - https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html
serviceAccountName: ""
# Volume to use as temporary storage for files before they are uploaded to cloud. For large databases local storage may not have sufficient space.
# In that case set an ephemeral or persistent volume with sufficient space here
# The chart defaults to an emptyDir, use this to overwrite default behavior
#tempVolume:
# persistentVolumeClaim:
# claimName: backup-pvc
# securityContext defines privilege and access control settings for a Pod. Making sure that we don't run Neo4j as root user.
securityContext:
runAsNonRoot: true
runAsUser: 7474
runAsGroup: 7474
fsGroup: 7474
fsGroupChangePolicy: "Always"
# default ephemeral storage of backup container
resources:
requests:
ephemeralStorage: "4Gi"
cpu: ""
memory: ""
limits:
ephemeralStorage: "5Gi"
cpu: ""
memory: ""
# nodeSelector labels
# please ensure the respective labels are present on one of nodes or else helm charts will throw an error
nodeSelector: {}
# label1: "true"
# label2: "value1"
# set backup pod affinity
affinity: {}
# podAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# - labelSelector:
# matchExpressions:
# - key: security
# operator: In
# values:
# - S1
# topologyKey: topology.kubernetes.io/zone
# podAntiAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - weight: 100
# podAffinityTerm:
# labelSelector:
# matchExpressions:
# - key: security
# operator: In
# values:
# - S2
# topologyKey: topology.kubernetes.io/zone
#Add tolerations to the Neo4j pod
tolerations: []
# - key: "key1"
# operator: "Equal"
# value: "value1"
# effect: "NoSchedule"
# - key: "key2"
# operator: "Equal"
# value: "value2"
# effect: "NoSchedule"
Back up your database(s)
To back up your database(s), you install the neo4j-admin Helm chart using the configured backup-values.yaml file.
-
Install neo4j-admin Helm chart using the backup-values.yaml file:
helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml
The neo4j/neo4j-admin Helm chart installs a cronjob that launches a pod based on the job schedule. This pod performs a backup of one or multiple databases, a consistency check of the backup file(s), and uploads them to the cloud provider bucket.
-
Monitor the backup pod logs using
kubectl logs pod/<neo4j-backup-pod-name>
to check the progress of the backup. -
Check that the backup files and the consistency check reports have been uploaded to the cloud provider bucket or on-premises storage.
Aggregate a database backup chain
The aggregate backup command turns a backup chain into a single backup file. This is useful when you have a backup chain that you want to restore to a different cluster, or when you want to archive a backup chain. For more information on the benefits of the aggregate backup chain operation, its syntax and available options, see Aggregate a database backup chain.
The neo4j-admin Helm chart supports aggregating a backup chain stored in an AWS S3 bucket or a local mount. If enabled, normal backup will not be done, only aggregate backup. |
-
To aggregate a backup chain stored in an AWS S3 bucket or a local mount, you need to provide the following information in your backup-values.yaml file:
If your backup chain is stored on AWS S3, you need to set cloudProvider to
aws
and use eithercreds
orserviceAccount
to connect to your AWS S3 bucket. For example:Connect to your AWS S3 bucket using theawscreds
secretneo4j: image: "neo4j/helm-charts-backup" imageTag: "5.21.0" jobSchedule: "* * * * *" successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 backoffLimit: 3 backup: cloudProvider: "aws" secretName: "awscreds" secretKeyName: "credentials" aggregate: enabled: true verbose: false keepOldBackup: false parallelRecovery: false fromPath: "s3://bucket1/bucket2" # Database name to aggregate. Can contain * and ? for globbing. database: "neo4j" resources: requests: ephemeralStorage: "4Gi" limits: ephemeralStorage: "5Gi"
Connect to your AWS S3 bucket usingserviceAccount
neo4j: image: "neo4j/helm-charts-backup" imageTag: "5.21.0" jobSchedule: "* * * * *" successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 backoffLimit: 3 backup: cloudProvider: "aws" aggregate: enabled: true verbose: false keepOldBackup: false parallelRecovery: false fromPath: "s3://bucket1/bucket2" # Database name to aggregate. Can contain * and ? for globbing. database: "neo4j" #The service account must already exist in your cloud provider account and have the necessary permissions to manage your S3 bucket, as well as to download and upload files. See the example policy below. #{ # "Version": "2012-10-17", # "Id": "Neo4jBackupAggregatePolicy", # "Statement": [ # { # "Sid": "Neo4jBackupAggregateStatement", # "Effect": "Allow", # "Action": [ # "s3:ListBucket", # "s3:GetObject", # "s3:PutObject", # "s3:DeleteObject" # ], # "Resource": [ # "arn:aws:s3:::mybucket/*", # "arn:aws:s3:::mybucket" # ] # } # ] #} serviceAccountName: "my-service-account" resources: requests: ephemeralStorage: "4Gi" limits: ephemeralStorage: "5Gi"
neo4j: image: "neo4j/helm-charts-backup" imageTag: "5.21.0" successfulJobsHistoryLimit: 1 failedJobsHistoryLimit: 1 backoffLimit: 1 backup: aggregate: enabled: true verbose: false keepOldBackup: false parallelRecovery: false fromPath: "/backups" # Database name to aggregate. Can contain * and ? for globbing. database: "neo4j" tempVolume: persistentVolumeClaim: claimName: aggregate-pv-pvc resources: requests: ephemeralStorage: "4Gi" limits: ephemeralStorage: "5Gi"
-
Install the neo4j-admin Helm chart using the configured backup-values.yaml file:
helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml
-
Monitor the pod logs using
kubectl logs pod/<neo4j-aggregate-backup-pod-name>
to check the progress of the aggregate backup operation. -
Verify that the aggregated backup file has replaced your backup chain in the cloud provider bucket or on-premises storage.
Restore a single database
To restore a single offline database or a database backup, you first need to delete the database that you want to replace unless you want to restore the backup as an additional database in your DBMS.
Then, use the restore command of neo4j-admin
to restore the database backup.
Finally, use the Cypher command CREATE DATABASE name
to create the restored database in the system
database.
Delete the database that you want to replace
Before you restore the database backup, you have to delete the database that you want to replace with that backup using the Cypher command DROP DATABASE name
against the system
database.
If you want to restore the backup as an additional database in your DBMS, then you can proceed to the next section.
For Neo4j cluster deployments, you run the Cypher command |
-
Connect to the Neo4j DBMS:
kubectl exec -it <release-name>-0 -- bash
-
Connect to the
system
database usingcypher-shell
:cypher-shell -u neo4j -p <password> -d system
-
Drop the database you want to replace with the backup:
DROP DATABASE neo4j;
-
Exit the Cypher Shell command-line console:
:exit;
Restore the database backup
You use the neo4j-admin database restore
command to restore the database backup, and then the Cypher command CREATE DATABASE name
to create the restored database in the system
database.
For information about the command syntax, options, and usage, see Restore a database backup.
For Neo4j cluster deployments, restore the database backup on each cluster server. |
-
Run the
neo4j-admin database restore
command to restore the database backup:neo4j-admin database restore neo4j --from-path=/backups/neo4j --expand-commands
-
Connect to the
system
database usingcypher-shell
:cypher-shell -u neo4j -p <password> -d system
-
Create the
neo4j
database.For Neo4j cluster deployments, you run the Cypher command
CREATE DATABASE name
only on one of the cluster servers.CREATE DATABASE neo4j;
-
Open the browser at http://<external-ip>:7474/browser/ and check that all data has been successfully restored.
-
Execute a Cypher command against the
neo4j
database, for example:MATCH (n) RETURN n
If you have backed up your database with the option
--include-metadata
, you can manually restore the users and roles metadata. For more information, see Restore a database backup → Example.
To restore the |