Neo4j Performance Tuning

Goals
This guide gives an overview on memory configuration, logical log configuration and Linux open files configuration.
Prerequisites
You should know how to download and install Neo4j on your system. If you are a developer you should be accustomed to the graph data model and have written your Neo4j application.

Intermediate

Memory Configuration guidelines

An in-depth description on this topic is available in the Neo4j Operations Manual.

There are three types of memory to consider: OS Memory, Page Cache and Heap Space.

OS Memory Sizing

Some memory must be reserved for all activities on the server that are not Neo4j related. One Gigabyte 1G is good enough for most cases when Neo4j is the only server running on that machine.

Please make sure that the configured memory doesn’t exceed available RAM, otherwise the OS starts to swap to disk which heavily affects operation performance.

Page Cache Sizing

neo4j memory usage

The page cache is used to cache the Neo4j data as stored on disk. Ensuring that most of the graph data from disk is cached in memory will help avoid costly disk access. You can determine the total memory needed by summing up the sizes of the $NEO4J_HOME/data/graph.db/neostore.\*.db files and adding e.g. 20% for growth.

To estimate the necessary page-cache size for a dataset to be imported, it is useful to run an import with a fraction (e.g. 1/100th) of the data and then multiply the resulting store-size by that fraction (x 100).

The size of the page cache, e.g. 5G is defined by the parameter dbms.memory.pagecache.size in the file $NEO4J_HOME/conf/neo4j.conf. By default, if not configured, Neo4j will use 50% of the available RAM in the system for its page cache.

Heap Sizing

The size of the available heap memory is an important aspect for the performance of Neo4j.

Generally speaking, it is beneficial to configure a large enough heap space to sustain concurrent operations. For many setups, a heap size between 8G and 16G is large enough to run Neo4j reliably.

The heap memory size is determined by the parameters in $NEO4J_HOME/conf/neo4j.conf, namely dbms.memory.heap.initial_size and dbms.memory.heap.max_size providing the heap size as a number and a unit, for example 16G. It is recommended to set these two parameters to the same size.

For a more thorough discussion on this topic, refer to the heap memory configuration section in the Neo4j Operations Manual. That section also contains information about heap memory distribution and gabarge collection tuning.

Logical Logs

Logical transaction logs in Neo4j are used in scenarios when the database needs to be recovered after a unclean shutdown. They are also used for online backup operations, especially for incremental backups.

These transaction log files are rotated after surpassing a certain size (25 Mb in size). The amount of log files or the used space can be configured.

It is recommended that the dbms.tx_log.rotation.retention_policy parameter is set to 7 days

Number of Open Files

The usual default of 1024 for the open file limit is often not enough, especially when many indexes are used or a server installation sees too many connections (network sockets also count against that limit). Users are therefore encouraged to increase that limit to a realistic value of 40000 or more, depending on usage patterns. Setting this value via the ulimit command is only possible for the current session.

To set the value system wide you have to follow the instructions for your platform.