Deploy an analytics cluster
In analytic use cases, the analytic queries can be intensive enough to warrant running them on separate servers, away from the servers handling the transactional workload. This example shows how to set up a cluster to achieve that. Information on using Neo4j’s graph data science library in a cluster can be found here: Neo4j Graph Data Science Library Manual → GDS with Neo4j cluster.
The analytics cluster can be set up in two ways, with fault tolerance or without fault tolerance. Bear in mind that the GDS library does not support fault tolerance and therefore GDS should only be deployed on servers configured for analytic queries, as described below.
With fault tolerance
Deploy the cluster
Starting with Neo4j 5.23, you can select between the discovery services: v1 or v2 — which allows servers to communicate with each other. Discovery service v2 is recommended for all new deployments, as v1 has been considered deprecated since Neo4j 5.23. For more details on the Neo4j discovery services, see Cluster server discovery.
In this example, three servers named server01.example.com
, server02.example.com
and server03.example.com
are configured as the transactional part of the cluster.
Two more servers names server04.example.com
and server05.example.com
are configured for the analytical queries.
Neo4j Enterprise Edition is installed on all five servers.
They are configured by preparing neo4j.conf on each server.
Key points:
-
All servers include all members in their discovery list.
-
The servers for analytics have mode constraints configured that restrict their hosting mode to secondary to prevent them from participating in normal write operations.
-
In the example below, you set
dbms.cluster.discovery.resolver_type=LIST
.
To use the discovery service v2:
-
You have to set the configuration of
dbms.cluster.discovery.version
toV2_ONLY
. -
Instead of
dbms.cluster.discovery.endpoints
, usedbms.cluster.discovery.v2.endpoints
.
server.default_listen_address=0.0.0.0
server.default_advertised_address=server01.example.com
dbms.cluster.discovery.v2.endpoints=server01.example.com:6000,server02.example.com:6000,server03.example.com:6000,server04.example.com:6000,server05.example.com:6000
dbms.cluster.discovery.version=V2_ONLY
server.default_listen_address=0.0.0.0
server.default_advertised_address=server02.example.com
dbms.cluster.discovery.v2.endpoints=server01.example.com:6000,server02.example.com:6000,server03.example.com:6000,server04.example.com:6000,server05.example.com:6000
dbms.cluster.discovery.version=V2_ONLY
server.default_listen_address=0.0.0.0
server.default_advertised_address=server03.example.com
dbms.cluster.discovery.v2.endpoints=server01.example.com:6000,server02.example.com:6000,server03.example.com:6000,server04.example.com:6000,server05.example.com:6000
dbms.cluster.discovery.version=V2_ONLY
server.default_listen_address=0.0.0.0
server.default_advertised_address=server04.example.com
dbms.cluster.discovery.v2.endpoints=server01.example.com:6000,server02.example.com:6000,server03.example.com:6000,server04.example.com:6000,server05.example.com:6000
dbms.cluster.discovery.version=V2_ONLY
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY
server.default_listen_address=0.0.0.0
server.default_advertised_address=server05.example.com
dbms.cluster.discovery.v2.endpoints=server01.example.com:6000,server02.example.com:6000,server03.example.com:6000,server04.example.com:6000,server05.example.com:6000
dbms.cluster.discovery.version=V2_ONLY
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY
The Neo4j servers are ready to be started. The startup order does not matter.
After the cluster has started, it is possible to connect to any of the instances and run SHOW SERVERS
to check the status of the cluster.
This shows information about each member of the cluster:
SHOW SERVERS;
+-----------------------------------------------------------------------------------------------------------+ | name | address | state | health | hosting | +-----------------------------------------------------------------------------------------------------------+ | "f3bd1199-bc6f-4a38-b25c-5f7588df5182" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] | | "b331e481-c2ba-4b4e-82f3-bb51fe171483" | "server02:7687" | "Enabled" | "Available" | ["system"] | | "bd80e8fd-a51b-406a-9ed4-42daf4792aa6" | "server03:7687" | "Enabled" | "Available" | ["system"] | | "df3758b1-337f-4b8a-a9de-8e745ca96549" | "server04:7687" | "Free" | "Available" | ["system"] | | "9207bfd9-aa1b-40c2-b965-edcd3955a20e" | "server05:7687" | "Free" | "Available" | ["system"] | +-----------------------------------------------------------------------------------------------------------+
For more extensive information about each server, use the SHOW SERVERS YIELD *
command:
SHOW SERVERS YIELD *;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | serverId | name | address | state | health | hosting | requestedHosting | tags | allowedDatabases | deniedDatabases | modeConstraint | version | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "f3bd1199-bc6f-4a38-b25c-5f7588df5182" | "f3bd1199-bc6f-4a38-b25c-5f7588df5182" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] | ["system", "neo4j"] | [] | [] | [] | "NONE" | "5.8.0" | | "b331e481-c2ba-4b4e-82f3-bb51fe171483" | "b331e481-c2ba-4b4e-82f3-bb51fe171483" | "server02:7687" | "Enabled" | "Available" | ["system"] | ["system"] | [] | [] | [] | "NONE" | "5.8.0" | | "bd80e8fd-a51b-406a-9ed4-42daf4792aa6" | "bd80e8fd-a51b-406a-9ed4-42daf4792aa6" | "server03:7687" | "Enabled" | "Available" | ["system"] | ["system"] | [] | [] | [] | "NONE" | "5.8.0" | | "df3758b1-337f-4b8a-a9de-8e745ca96549" | "df3758b1-337f-4b8a-a9de-8e745ca96549" | "server04:7687" | "Free" | "Available" | ["system"] | [] | [] | [] | [] | "SECONDARY" | "5.8.0" | | "9207bfd9-aa1b-40c2-b965-edcd3955a20e" | "9207bfd9-aa1b-40c2-b965-edcd3955a20e" | "server05:7687" | "Free" | "Available" | ["system"] | [] | [] | [] | [] | "SECONDARY" | "5.8.0" | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
In this example, three servers named server01.example.com
, server02.example.com
and server03.example.com
are configured as the transactional part of the cluster.
Two more servers names server04.example.com
and server05.example.com
are configured for the analytical queries.
Neo4j Enterprise Edition is installed on all five servers.
They are configured by preparing neo4j.conf on each server.
Key points:
-
All servers include all members in their discovery list.
-
The servers for analytics have mode constraints configured that restrict their hosting mode to secondary to prevent them from participating in normal write operations.
-
In the example below, you set
dbms.cluster.discovery.resolver_type=LIST
.
server.default_listen_address=0.0.0.0
server.default_advertised_address=server01.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000,server04.example.com:5000,server05.example.com:5000
server.default_listen_address=0.0.0.0
server.default_advertised_address=server02.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000,server04.example.com:5000,server05.example.com:5000
server.default_listen_address=0.0.0.0
server.default_advertised_address=server03.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000,server04.example.com:5000,server05.example.com:5000
server.default_listen_address=0.0.0.0
server.default_advertised_address=server04.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000,server04.example.com:5000,server05.example.com:5000
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY
server.default_listen_address=0.0.0.0
server.default_advertised_address=server05.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000,server04.example.com:5000,server05.example.com:5000
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY
The Neo4j servers are ready to be started. The startup order does not matter.
After the cluster has started, it is possible to connect to any of the instances and run SHOW SERVERS
to check the status of the cluster.
This shows information about each member of the cluster:
SHOW SERVERS;
+-----------------------------------------------------------------------------------------------------------+ | name | address | state | health | hosting | +-----------------------------------------------------------------------------------------------------------+ | "f3bd1199-bc6f-4a38-b25c-5f7588df5182" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] | | "b331e481-c2ba-4b4e-82f3-bb51fe171483" | "server02:7687" | "Enabled" | "Available" | ["system"] | | "bd80e8fd-a51b-406a-9ed4-42daf4792aa6" | "server03:7687" | "Enabled" | "Available" | ["system"] | | "df3758b1-337f-4b8a-a9de-8e745ca96549" | "server04:7687" | "Free" | "Available" | ["system"] | | "9207bfd9-aa1b-40c2-b965-edcd3955a20e" | "server05:7687" | "Free" | "Available" | ["system"] | +-----------------------------------------------------------------------------------------------------------+
For more extensive information about each server, use the SHOW SERVERS YIELD *
command:
SHOW SERVERS YIELD *;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | serverId | name | address | state | health | hosting | requestedHosting | tags | allowedDatabases | deniedDatabases | modeConstraint | version | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "f3bd1199-bc6f-4a38-b25c-5f7588df5182" | "f3bd1199-bc6f-4a38-b25c-5f7588df5182" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] | ["system", "neo4j"] | [] | [] | [] | "NONE" | "5.8.0" | | "b331e481-c2ba-4b4e-82f3-bb51fe171483" | "b331e481-c2ba-4b4e-82f3-bb51fe171483" | "server02:7687" | "Enabled" | "Available" | ["system"] | ["system"] | [] | [] | [] | "NONE" | "5.8.0" | | "bd80e8fd-a51b-406a-9ed4-42daf4792aa6" | "bd80e8fd-a51b-406a-9ed4-42daf4792aa6" | "server03:7687" | "Enabled" | "Available" | ["system"] | ["system"] | [] | [] | [] | "NONE" | "5.8.0" | | "df3758b1-337f-4b8a-a9de-8e745ca96549" | "df3758b1-337f-4b8a-a9de-8e745ca96549" | "server04:7687" | "Free" | "Available" | ["system"] | [] | [] | [] | [] | "SECONDARY" | "5.8.0" | | "9207bfd9-aa1b-40c2-b965-edcd3955a20e" | "9207bfd9-aa1b-40c2-b965-edcd3955a20e" | "server05:7687" | "Free" | "Available" | ["system"] | [] | [] | [] | [] | "SECONDARY" | "5.8.0" | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Create new databases in the cluster
As mentioned in the Introduction, a server in a cluster can either host a database in primary or secondary mode. The primaries provide write operations and fault tolerance (if there are more than one of them). The secondaries provide somewhere for read queries to run, including the analytic queries for this example.
In the system
database from the previous example, execute the following Cypher command to create a new database:
CREATE DATABASE bar
TOPOLOGY 3 PRIMARIES 2 SECONDARIES
Without fault tolerance
If fault tolerance is not a priority, it is possible to have a single server handling write operations for the DBMS. This means fewer servers are needed, but a failure can affect the entire DBMS.
If the single writer server or process fails, all databases will be unavailable for writes. They may be less available for reads as well, since that server manages the DBMS. |
The following example shows how to set up a non-fault tolerant analytics cluster with three members.
Deploy the cluster
In this example, three servers named server01.example.com
, server02.example.com
and server03.example.com
are configured.
Neo4j Enterprise Edition is installed on all three servers.
They are configured by preparing neo4j.conf on each server.
Note that server01.example.com
is different from the others, and is the only server where write operations take place.
The other servers are able to execute read queries, and if using GDS, to write results back to the writing server.
Key points:
-
The writer server only has itself in the list of discovery. This means it does not seek out the other members when it starts, they have to discover it. This is required in order to have a cluster with only a single primary for the
system
database. -
The servers for analytics have mode constraints configured that restrict their hosting mode to secondary to prevent them from participating in normal write operations.
-
In the example below, you set
dbms.cluster.discovery.resolver_type=LIST
.
To use the discovery service v2:
-
You have to set the configuration of
dbms.cluster.discovery.version
toV2_ONLY
. -
Instead of
dbms.cluster.discovery.endpoints
, usedbms.cluster.discovery.v2.endpoints
.
server.default_listen_address=0.0.0.0
server.default_advertised_address=server01.example.com
# Only has self in this list
dbms.cluster.discovery.v2.endpoints=server01.example.com:6000
dbms.cluster.discovery.version=V2_ONLY
server.default_listen_address=0.0.0.0
server.default_advertised_address=server02.example.com
dbms.cluster.discovery.v2.endpoints=server01.example.com:6000,server02.example.com:6000,server03.example.com:6000
dbms.cluster.discovery.version=V2_ONLY
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY
server.default_listen_address=0.0.0.0
server.default_advertised_address=server03.example.com
dbms.cluster.discovery.v2.endpoints=server01.example.com:6000,server02.example.com:6000,server03.example.com:6000
dbms.cluster.discovery.version=V2_ONLY
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY
The Neo4j servers are ready to be started. The startup order does not matter.
After the cluster has started, it is possible to connect to any of the instances and run SHOW SERVERS
to check the status of the cluster.
This shows information about each member of the cluster:
SHOW SERVERS;
+-----------------------------------------------------------------------------------------------------------+ | name | address | state | health | hosting | +-----------------------------------------------------------------------------------------------------------+ | "d6fbe54b-0c6a-4959-9bcb-dcbbe80262a4" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] | | "e56b49ea-243f-11ed-861d-0242ac120002" | "server02:7687" | "Free" | "Available" | ["system"] | | "73e9a990-0a97-4a09-91e9-622bf0b239a4" | "server03:7687" | "Free" | "Available" | ["system"] | +-----------------------------------------------------------------------------------------------------------+
For more extensive information about each server, use the SHOW SERVERS YIELD *
command:
SHOW SERVERS YIELD *;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | serverId | name | address | state | health | hosting | requestedHosting | tags | allowedDatabases | deniedDatabases | modeConstraint | version | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "d6fbe54b-0c6a-4959-9bcb-dcbbe80262a4" | "d6fbe54b-0c6a-4959-9bcb-dcbbe80262a4" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] | ["system", "neo4j"] | [] | [] | [] | "NONE" | "5.8.0" | | "e56b49ea-243f-11ed-861d-0242ac120002" | "e56b49ea-243f-11ed-861d-0242ac120002" | "server02:7687" | "Free" | "Available" | ["system"] | ["system"] | [] | [] | [] | "SECONDARY" | "5.8.0" | | "73e9a990-0a97-4a09-91e9-622bf0b239a4" | "73e9a990-0a97-4a09-91e9-622bf0b239a4" | "server03:7687" | "Free" | "Available" | ["system"] | ["system"] | [] | [] | [] | "SECONDARY" | "5.8.0" | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
In this example, three servers named server01.example.com
, server02.example.com
and server03.example.com
are configured.
Neo4j Enterprise Edition is installed on all three servers.
They are configured by preparing neo4j.conf on each server.
Note that server01.example.com
is different from the others, and is the only server where write operations take place.
The other servers are able to execute read queries, and if using GDS, to write results back to the writing server.
Key points:
-
The writer server only has itself in the list of discovery. This means it does not seek out the other members when it starts, they have to discover it. This is required in order to have a cluster with only a single primary for the
system
database. -
The servers for analytics have mode constraints configured that restrict their hosting mode to secondary to prevent them from participating in normal write operations.
-
In the example below, you set
dbms.cluster.discovery.resolver_type=LIST
.
server.default_listen_address=0.0.0.0
server.default_advertised_address=server01.example.com
# Only has self in this list
dbms.cluster.discovery.endpoints=server01.example.com:5000
server.default_listen_address=0.0.0.0
server.default_advertised_address=server02.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY
server.default_listen_address=0.0.0.0
server.default_advertised_address=server03.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY
The Neo4j servers are ready to be started. The startup order does not matter.
After the cluster has started, it is possible to connect to any of the instances and run SHOW SERVERS
to check the status of the cluster.
This shows information about each member of the cluster:
SHOW SERVERS;
+-----------------------------------------------------------------------------------------------------------+ | name | address | state | health | hosting | +-----------------------------------------------------------------------------------------------------------+ | "d6fbe54b-0c6a-4959-9bcb-dcbbe80262a4" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] | | "e56b49ea-243f-11ed-861d-0242ac120002" | "server02:7687" | "Free" | "Available" | ["system"] | | "73e9a990-0a97-4a09-91e9-622bf0b239a4" | "server03:7687" | "Free" | "Available" | ["system"] | +-----------------------------------------------------------------------------------------------------------+
For more extensive information about each server, use the SHOW SERVERS YIELD *
command:
SHOW SERVERS YIELD *;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | serverId | name | address | state | health | hosting | requestedHosting | tags | allowedDatabases | deniedDatabases | modeConstraint | version | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "d6fbe54b-0c6a-4959-9bcb-dcbbe80262a4" | "d6fbe54b-0c6a-4959-9bcb-dcbbe80262a4" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] | ["system", "neo4j"] | [] | [] | [] | "NONE" | "5.8.0" | | "e56b49ea-243f-11ed-861d-0242ac120002" | "e56b49ea-243f-11ed-861d-0242ac120002" | "server02:7687" | "Free" | "Available" | ["system"] | ["system"] | [] | [] | [] | "SECONDARY" | "5.8.0" | | "73e9a990-0a97-4a09-91e9-622bf0b239a4" | "73e9a990-0a97-4a09-91e9-622bf0b239a4" | "server03:7687" | "Free" | "Available" | ["system"] | ["system"] | [] | [] | [] | "SECONDARY" | "5.8.0" | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Create new databases in the cluster
As mentioned in the Introduction, a server in a cluster can either host a database in primary or secondary mode. For transactional workloads, a database topology with several primaries is preferred for fault tolerance and automatic failover. The database topology may prioritize secondaries over primaries if the workload is more analytical. Such configuration is optimized for scalability but it is not fault-tolerant and does not provide automatic failover.
In the system
database on the writer from the previous example, execute the following Cypher command to create a new database:
CREATE DATABASE bar
TOPOLOGY 1 PRIMARY 2 SECONDARIES
Startup time
The instance may appear unavailable while it is joining the cluster. If you want to follow along with the startup, you can see the messages in neo4j.log. |
Running analytic queries
If running large normal Cypher queries, it is possible to use server tags to identify the large servers, and a routing policy to direct the read queries towards those servers. See Multi-data center routing for more details.
If using GDS, follow the guidance in Neo4j Graph Data Science Library Manual → GDS with Neo4j cluster.