Capacity Planning Example
Here is a back of the napkin example of capacity planning for a Neo4j workload for the following list of requirements:
Requirements
Requirement | Value |
---|---|
Number of total users |
100-200 (end users, most likely accessing via front end applications) |
Number of visits (read/queries) per day per user |
5 |
Number of Nodes |
50-75 MM |
Number of Relationships |
100 – 150 MM |
Number of Properties per Node |
Min 1, Max 50, Avg 5 |
Number # of Properties per Relationship |
Min 0, Max: 20, Avg: 2 |
Average request time |
500 ms |
Queries per second at peak |
200/second |
Frequency of batch inserts and updates |
4-5 times daily |
Batch size assume 10% of volumes provided |
~ 20 GB a day , 5 million nodes |
Max processing/ingest for delta volumes |
One hour |
RR |
in US + EU AWS |
DR |
DR In 2 US availability zones |
Analysis
1) Estimating an initial database size of about 38GB (see table below) - assuming:
-
20% for indexes
-
Max # of nodes and relationships with Avg props per node & relations
Number |
Bytes/Object |
Space(GBs) |
Properties subtotal |
|
Nodes |
75000000 |
15 |
1.048 |
|
Relationships |
150000000 |
34 |
4.750 |
|
Props / node |
5 |
41 |
14.319 |
|
Props / rel |
2 |
41 |
11.455 |
25.774 |
Index (percentage) |
20 |
6.314 |
||
Total |
37.886 |
2) Assuming daily loads of 5M nodes per day (or 10% - and let’s assume, we need to accommodate future growth of another 50% for the next year.)
3) We then arrive of an estimating about 100GB of total memory per instance [ 5 GB(OS) + 60 GB(data + indexes + 50%growth) + 30GB(Heap) ~ 100GB of total memory ]
4) Lastly, we estimate we will need about 10 CPU cores(or 20 vCPU cores) to accommodate peak demand of 200 queries/second with a response time of 500ms, among a cluster with 3 cores and 2 RRs(see below):
-
Number of concurrent requests per second 200/sec
-
Workload/Query processing time (w=0.50 sec)
-
CPU load factor 0.5 ( c=.5 ; Tha is CPU’s will be 50% busy on average)
-
Number of instance failures to design for F=1
-
Core count = r x w / c = 200 x .5 / .5 = 200
-
Cluster size = 3 cores + 2 RR
-
Core count per machine = 200/5 = 40 (or 80 vCPU cores)
-
Estimate:
-
Cluster configuration (5 Instances) x (80 vCPU cores per instance with 100GB of RAM)
-
Was this page helpful?