CDC on Neo4j Aura

Neo4j extracts CDC information from the transaction log. However, by default the transaction log does not contain information directly usable by CDC. For CDC to work, the transaction log need to be enriched with further information. This is applied as an extra configuration option to each database. As soon as CDC is enabled, the database is ready to answer CDC queries from client applications.

CDC has three working modes:

  • OFF — CDC is disabled (default).

  • DIFF — Changes are captured as the difference between before and after states of each changed entity (i.e. they only contain removals, updates and additions).

  • FULL — Changes are recorded as a complete copy of the before and after states of each changed entity (i.e. the contain the full node/relationship, regardless of the extent to which they were altered).

Enable/Toggle CDC mode

Admin users can tweak the CDC mode for a database through the setting Edit CDC Mode, accessible via the Aura instance options. Non-admin users may view the current CDC mode, but may not edit it.

Modifying CDC mode from DIFF to FULL or vice-versa immediately changes the structure of captured changes. Your CDC application must be able to deal with the change of format.

Disable CDC

Admin users can disable CDC for a database through the setting Edit CDC Mode, accessible via the Aura instance options. Set the mode to OFF. Only admin users can disable CDC mode for a database.

Disabling CDC immediately breaks the continuity of change events. Change identifiers generated before disabling can no longer be used and, even if CDC is re-enabled, the previously-generated change identifiers remain invalid. Disabling and then re-enabling CDC is equivalent to enabling it for the first time: there is no memory of previous changes.

CDC is automatically disabled for:

  • New instances

  • Cloned instances

  • Paused instances

  • Instances restored from a snapshot

Key considerations

Paused/Resumed databases

Although it is possible to pause a database that has CDC mode set to DIFF or FULL, the CDC mode gets set to OFF when the database is resumed.

When an instance is resumed, it behaves similarly to restoring a snapshot.

Security

CDC returns all changes in the database and is not limited to the entities which a certain user is allowed to access. In order to prevent unauthorized access, the procedure db.cdc.query requires admin privileges and should be configured for least privilege access.

For a regular user to be able to run db.cdc.query, the user must have been granted execute privileges as well as boosted execute privileges.

GRANT EXECUTE PROCEDURE db.cdc.query ON DBMS TO $role;
GRANT EXECUTE BOOSTED PROCEDURE db.cdc.query ON DBMS TO $role;

Non-boosted execute privileges are usually part of the PUBLIC role, in which case they do not need to be granted a second time.

Furthermore, the user does not have access to a database unless they have been granted access.

GRANT ACCESS ON DATABASE $database TO $role

Usually the PUBLIC role already has access to the default database.

The procedures db.cdc.current and db.cdc.earliest do not require admin privileges. In order to execute these, access to the database and regular execution privileges are sufficient.

For more details regarding procedure privileges in Neo4j, see Operations Manual → Manage procedure and user-defined function permissions.

Transaction log retention

Since CDC information is stored in transaction log entries, the time for which the logs are retained dictates how much back in time your application may query for CDC data.

As new transactions come in, the server writes them to a log file. When that file exceeds 256MB, it creates a new file and continues writing there (transactions are never broken across files though, so if the current log is 255MB when a new 5MB transaction comes in, the file will grow to 260MB before rotating to a new file).

Every 15 minutes or 100000 transactions (whichever happens first), the server goes through the transaction log files from newer to oldest. When the sum of the scanned files is greater than 2GB, all subsequent files are deleted, including the latest scanned one, so that the total size is again below 2GB. Files older than 1 day are also deleted.

In other words, the server keeps at most 2GB worth of transaction logs, as long as they are all more recent than 1 day. Regarding size, the server keeps at least 2GB - 256MB worth of transaction logs (256MB are always needed for the current log file to grow into) — some may however be deleted if they are older than 1 day. In case large transactions made log files grow larger than 256MB, the minimum total retained size may be smaller than 2GB - 256MB.