Restore backups and snapshots

Restoring your Neo4j database from a backup or snapshot is likely to cause loss of data and corrupted data in your CDC application. The following sections outline the issues you may run into if you are using CDC and intend to restore a database backup, and how to mitigate them.

Unexpected change events

When restoring a Neo4j database from a previous snapshot, the changes that have happened since the snapshot was taken are reverted on the database. There is no process for letting your CDC client application know that these changes "never happened". Effectively, your CDC client application expects certain changes to have happened, although according to your Neo4j database they "never happened".

Some examples of problematic scenarios are:

  1. Entities being created "a second time"
    A node/relationship created between the snapshot and restore operation exists in your CDC application but not in the Neo4j database. If the node/relationship is (re-)created in your Neo4j database, your CDC application sees a duplicate node/relationship.

  2. "Missing" entities
    A node/relationship that was dropped between the snapshot and restore operation does not exist in your CDC application but does exist in the Neo4j database (again). If the node/relationship is updated, your CDC application sees an update to a previously deleted entity.

  3. Out of sync attributes
    Changes on attributes may have been rolled back in the restore operation. For example, a Movie node in your Neo4j database may have a rating of 5 stars, whereas the latest change seen by your CDC application altered the rating from 5 stars to 3 stars.

If your CDC application relies on an internal state, you probably need to restore the state of your CDC application as if it was working with a new database with existing data. For details on this process, see Initialize CDC applications from an existing database.

Loss of change events

The Neo4j restore procedures replace the existing database with a new database. This database has a new transaction log and any change identifier (cursor) generated from the old transaction log is not usable with the new database. Furthermore, ensure that the transaction log retention setting of the restored database matches that of your existing database.

The following steps outline how to mitigate this issue.

  1. Set the database into read-only mode.

  2. Take note of the CDC mode of your database.

  3. Verify that your CDC application is caught up and has consumed all changes in the database.

  4. Stop your CDC application.

  5. Restore your database from backup.

  6. Set the restored database into read-only mode.

  7. Ensure that the restored database has the same CDC mode value as your existing database.

  8. Restart your CDC application, using db.cdc.earliest as the starting change identifier.

  9. Enable writes to the restored database.

  10. Verify that changes are consumed as by your CDC application.

Inconsistent elementIds

The elementIds returned by db.cdc.query are not guaranteed to be stable across operations that change the database. After restoring a backup, new changes to existing elements are published under a new elementId.

You should use logical/business keys to identify elements instead of elementId, see The role of elementIds and key properties.