Analyzing a java heap dump
The purpose of this article is to help you go through the acquired heapdump with Eclipse MAT. It covers how to parse a large heap files and what to look for.
When you experience an OutOfMemory exception, it will produce a .hprof file if you have the below settings in the neo4j.conf file:
dbms.jvm.additional=-XX:+HeapDumpOnOutOfMemoryError
You can also add tweak the below settings to specify the directory path but ensure that you have enough disk space when such error occurs. |
dbms.jvm.additional=-XX:HeapDumpPath=/var/tmp/dumps
dbms.jvm.additional=-XX:OnOutOfMemoryError="tar cvzf /var/tmp/dump.tar.gz /var/tmp/dump;split -b 1G /var/tmp/dump.tar.gz;"
This file is the image of the heap part of the java process running on your system. The structure of the file depends on the JVM vendor you are running neo4j with.
Oracle JDK, Open JDK will produce hprof files and can be analyzed with most available tools. For IBM heap dumps, you need to parse it with IBM heap analyzer or other proprietary tool.
Change the settings in MemoryAnalyzer.ini
On your local environment
You need to allocate as much memory to the process as heap dump filesize you have.
IE: allocate 17GB if the heap is about 15GB.
For large heap dumps (> 25G), see next section.
Edit MemoryAnalyzer.ini (on macOS, it is located in /Applications/mat.app/Contents/Eclipse/MemoryAnalyzer.ini)
Add or change the settings:
-Xms10G -Xmx25G
On a remote machine
It’s better to upload it to an instance with a lot of disk and RAM on AWS/GCP/etc. If you choose AWS, use a spot instance.
Then you need to attach the EBS storage, create a 250GB volume, attach it to the EC2 instance. Format the volume and mount it on your Amazon Linux instance.
Note down both instanceid and storageid to make sure the ressource have properly been discarded after usage.
If the heap is about 61GB, you need twice as much disk space for parsing. As illustrated below:
$ du -ch java_pid19820*
116M java_pid19820.a2s.index
5.6G java_pid19820.domIn.index
17G java_pid19820.domOut.index
61G java_pid19820.hprof #original heap dump
256K java_pid19820.i2sv2.index
11G java_pid19820.idx.index
29G java_pid19820.inbound.index
197M java_pid19820.index
4.5G java_pid19820.o2c.index
12G java_pid19820.o2hprof.index
11G java_pid19820.o2ret.index
29G java_pid19820.outbound.index
988K java_pid19820.threads
68K java_pid19820_Component_Report_sel.zip
180G total
-
Pre-requisite Install java and make sure to have 250GB space available
-
Download MemoryAnalyzer tool for linux: download
-
Unzip it in a directory
-
Edit MemoryAnalyzer.ini to adjust both -Xms and -Xmx memory settings :
-startup plugins/org.eclipse.equinox.launcher_1.5.0.v20180512-1130.jar --launcher.library plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.700.v20180518-1200 -vmargs -Xms30G -Xmx100G
Parse the file on a remote machine
This step is optional if you run Eclipse MAT on your local machine and have enough resources. The index files will be created when opening the heapdump file if they are missing.
Run ./ParseHeapDump.sh heapdump.hprof
It is located in the folder mat of Eclipse Mat tar.gz installation file
Synchronize your local directory with the remote one
To speed up things, you can use rsync over ssh. The advantage is that you can recover if you have a crash and -z flag enables compression.
Example:
# on the remote machine
$ mkdir ${REMOTE_DIR}/parsed_files
$ mv *.index ${REMOTE_DIR}/parsed_files/
# on your local machine
$ rsync -P -e "ssh -i ${PATH_TO_KEY}" ec2-user@${REMOTE_IP}:${REMOTE_DIR}/heapdump.zip .
$ rsync -Prz -e "ssh -i ${PATH_TO_KEY} ec2-user@${REMOTE_IP}:${REMOTE_DIR}/parsed_files/ .
Open Eclipse MAT
To open the heapdump, go to File > Open Heap Dump (Not Acquire Heap Dump) and browse to your heapdump location.
No need to open an existing report, press cancel if you have a modal dialog.
In the Overview tab, left-click on the largest object(s)
Choose "list objects" > "with outgoing references".
It will open a new tab with the list of all the elements.
Expand the first level then expand everything at the second level.
Cypher query string
There are a lot of objects in a heap dump, no need to go through the Object[],byte[],Strings, etc.
You might want to filter for the class that contain PreParsed. Once found, list their outgoing references to cross check of the one that has the most instances. A new tab will open and you will be able to see the rawStatement of the Cypher queries.
Check the thread dumps
With thread dumps that has been taken before the heap dump
The garbage collector will not be able to collect the thread objects until the threading system also dereferences the object, which won’t happen if the thread is alive.
So if you have a large amount of memory in the heap, there should be a potentially long running thread associated to your large object.
To find it, look for the thread name in the thread dumps.
$ grep neo4j.BoltWorker-394 *
5913-tdump-201903291746.log:"neo4j.BoltWorker-394 [bolt]" #620 daemon prio=5 os_prio=0 tid=0x00007fb737619800 nid=0x8cec waiting on condition [0x00007fb38d00f000]
5913-tdump-201903291751.log:"neo4j.BoltWorker-394 [bolt] [/www.xxx.yyy.zzz:57570] " #620 daemon prio=5 os_prio=0 tid=0x00007fb737619800 nid=0x8cec runnable [0x00007fb38d00b000]
5913-tdump-201903291756.log:"neo4j.BoltWorker-394 [bolt] [/www.xxx.yyy.zzz:57570] " #620 daemon prio=5 os_prio=0 tid=0x00007fb737619800 nid=0x8cec runnable [0x00007fb38d00b000]
Note that the thread dumps are included in the heap dump. They are available in plain text in the file but you don’t have the STATE information in Eclipse Mat. You can have them with other tools such as VisualVM:
$ head -10 java_pid19820.threads
Thread 0x7fd64b0e1610
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter()Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node; (AbstractQueuedSynchronizer.java:1855)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(J)J (AbstractQueuedSynchronizer.java:2068)
at java.util.concurrent.LinkedBlockingQueue.poll(JLjava/util/concurrent/TimeUnit;)Ljava/lang/Object; (LinkedBlockingQueue.java:467)
at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run()V (CachedExecutorServiceDelegate.java:210)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:624)
at java.lang.Thread.run()V (Thread.java:748)
at com.hazelcast.util.executor.HazelcastManagedThread.executeRun()V (HazelcastManagedThread.java:76)
at com.hazelcast.util.executor.HazelcastManagedThread.run()V (HazelcastManagedThread.java:92)
Was this page helpful?