Monitor replication status
Neo4j 5.24 introduces the dbms.cluster.statusCheck()
procedure to monitor the ability to replicate in clustered databases.
The procedure identifies which members of a clustered database are up-to-date and can participate in successful replication. Therefore, it is useful in determining the fault tolerance of a clustered database. Additionally, you can use the procedure to identify the leader of a clustered database within the cluster.
The procedure replicates a dummy transaction within the cluster and verifies that it can be replicated and applied. Since the status check does not replicate an actual transaction, it does not guarantee write availability, as other factors in the write path (e.g., database issues) may block transactions. However, a healthy status typically indicates write availability in most cases. |
Cluster status check
Syntax
CALL dbms.cluster.statusCheck(databases :: LIST<STRING>, timeoutMilliseconds = null :: INTEGER)
Input arguments
Name | Type | Description |
---|---|---|
|
List<String> |
Databases for which the status check should run. Providing an empty list runs the status check for all clustered databases on that server, i.e. it does not run on singles or secondaries. |
|
Integer |
How long to allow for replication, before returning it was unsuccessful. Default value is 1000 milliseconds. |
Return arguments
The procedure returns a row for all primary members of all the requested databases where each row consists of:
Name | Type | Description |
---|---|---|
|
String |
The database for which a |
|
String |
The UUID of the server, which did or did not participate in a successful replication of the |
|
String |
The friendly name of the server, or its UUID if no name is set. |
|
String |
The address of the Bolt port for the server. |
|
Boolean |
Indicates if the server (on which the procedure is run) can replicate a transaction. |
|
String |
The status of each primary member. |
|
String |
The server id of the perceived leader of each primary member. |
|
Integer |
The term of the perceived leader of each primary member. If the members report different leaders, the one with the highest term should be trusted. |
|
Boolean |
Whether a server is the requester or not. |
|
String |
Contains the error message if one is present. An example of an error is that one or more of the requested databases do not exist on the requester. |
Possible values of replicationSuccessful
-
TRUE
— if this server managed to replicate the dummy transaction to a majority of cluster members within the given timeout. -
FALSE
— if it failed to replicate within the timeout. The value is the same column-wise. A failed replication can either indicate a real issue in the cluster (e.g., no leader) or that this server is too far behind in applying updates and can’t replicate.
Possible values of memberStatus
-
APPLYING
means that the member can replicate and is actively applying transactions. -
REPLICATING
means that the member can participate in replicating but cannot apply. This state is uncommon, but may happen while waiting for the database to start and accept transactions. -
UNAVAILABLE
means that the member is either too far behind the leader or unreachable. They are unhealthy and cannot add to the fault-tolerance.
Possible values of requester
-
TRUE
— for the server on which the procedure is run. -
FALSE
— on the remaining servers.
In general, you can use the replicationSuccessful
field to determine overall write-availability, whereas the memberStatus
field can be checked in order to see whether the database is fault-tolerant or not.
Example
Running the status check
When running the cluster status check against a server, expect similar output to the following:
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| database | serverId | serverName | address | replicationSuccessful | memberStatus | recognisedLeader | recognisedLeaderTerm | requester | error |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "neo4j" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | "localhost:7682" | TRUE | "APPLYING" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | 4 | FALSE | NULL |
| "neo4j" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | "localhost:7681" | TRUE | "APPLYING" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | 4 | TRUE | NULL |
| "neo4j" | "58c70f4b-910d-4d0e-b0f2-3084554079ec" | "58c70f4b-910d-4d0e-b0f2-3084554079ec" | "localhost:7683" | TRUE | "APPLYING" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | 4 | FALSE | NULL |
| "system" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | "565130e8-b8f0-41ad-8f9d-c660bd8d5519" | "localhost:7681" | TRUE | "APPLYING" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | 1 | TRUE | NULL |
| "system" | "58c70f4b-910d-4d0e-b0f2-3084554079ec" | "58c70f4b-910d-4d0e-b0f2-3084554079ec" | "localhost:7683" | TRUE | "APPLYING" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | 1 | FALSE | NULL |
| "system" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | "localhost:7682" | TRUE | "APPLYING" | "d3fe2e6a-494d-4ab8-81b1-7de2ce31ce11" | 1 | FALSE | NULL |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+