Neo4j Research

Welcome to Neo4j Research, where we turn science into technology.

At Neo4j, we build graph data products that delight our users and strive to always surpass customer expectations. Our products are built on a rich, collaborative history of computer science research and solid engineering.

Neo4j Research is where we explore future possibilities, both to enhance current products and to discover new opportunities. As multi-disciplinary computer scientists, we work with product teams at Neo4j as well as leading research institutions around the world with the common goal to accelerate graph technology.

Projects

At Neo4j, we perform systems research on all parts of the graph data stack. We are currently working on a diverse set of projects that target temporal graph use-cases, leaderless transaction processing methods, and novel query runtimes based on dynamic programming languages.

Our aim is to understand how to build graph processing systems for modern cloud environments that are more capable than the current state-of-the-art and a departure from the classic (relational) approach.

Current graph database runtimes are built using the same techniques and principles as relational databases which can inhibit their performance and functionality. The fundamental issue is that graph runtimes have to handle a lot of irregularity, stemming from both schema-optionality and irregularity of workload and topology from a machine point of view.

To solve these problems we are building a next-generation query runtime that is inspired by dynamic programming languages technology. It allows us to optimize schema-less graphs through dynamic code optimization and to scale processing by adopting new compute paradigms such as disaggregated compute or accelerated computing with specialized hardware.

Modern graph database management systems (DBMSs) allow users to model real-world interactions as a set of nodes and relationships at a billions-to-trillion scale. However, existing systems ignore the temporal dimension of data: how a graph evolved over time. Lacking native temporal support, ad-hoc strategies are implemented that only achieve good performance depending on the size of the effective graph workload, such as local pattern matching or global graph algorithms.

To tackle this problem, we designed Aion, a transactional temporal graph DBMS that generalizes previous approaches for labeled property graphs (LPGs). Aion is built directly atop Neo4j and adopts a hybrid temporal storage approach.  For point lookups and small subgraph queries, it uses LineageStore that indexes graph updates by entity identifiers. For queries that require full graph reconstruction at arbitrary time points, it uses TimeStore that indexes updates by time.

To enable incremental graph computations for improved latency, Aion introduces a compute-efficient in-memory LPG representation. Our experiments so far show that Aion achieves up to 7x higher throughput against existing non-transactional temporal systems and provides up to an order of magnitude speedup over Neo4j with minimal storage overhead.

Transaction protocols have historically been decoupled from the data models they support. Consequently graph databases either support one of two suboptimal choices: either protocols that are too strict which sacrifice performance while maintaining correctness, or too loose which offers better performance but corrupts data in normal operation.

In the long term we need better options. We are investigating one such approach called “Conjunction of Majorities” where transaction messages carry metadata about their predecessors. Participants use this metadata to compare against their local state to determine compatibility. For single-shard transactions, if a majority of participants discover that a transaction is compatible then it can proceed through a conventional two-phase consensus protocol. For multi-shard transactions each shard must have a majority and hence “conjunction of majorities” in the general case.
We have undertaken a theoretical investigation to establish the limits of the approach with respect to correctness (specifically reciprocal consistency for graphs) and global constraints. We are also building a prototype system to evaluate the performance of the approach in real-world conditions.

Publications

Neo4j has a strong publication history, and often collaborates with universities and other industrial researchers.

2024

Hardware-Efficient Data Imputation through DBMS Extensibility
George Theodorakis et al

An Empirical Evaluation of Variable-length Record B+Trees on a Modern Graph Database System
George Theodorakis, James Clarkson, and Jim Webber

Seraph: Continuous Queries on Property Graph Streams
Stefan Plantikow and Hannes Voigt

Aion: Efficient Temporal Graph Data Management

George Theodorakis, James Clarkson, and Jim Webber 

BIFROST: A Future Graph Database Runtime

James Clarkson, George Theodorakis, and Jim Webber 

2023

Analysis of an Epoch Commit Protocol for Distributed Processing Systems

Jim Webber et al

2022

A Performance Study of Epoch-based Commit Protocols in Distributed OLTP Databases

Jack Waudby and Jim Webber

Pick & Mix Isolation Levels: Mixed Serialization Graph Testing

Jack Waudby and Jim Webber

2021

A GraphBLAS implementation in Pure Java

Florentin Dörre, Martin Junghanns et al

PG-Keys: Keys for Property Graphs

Keith Hare et al

The Future is Big Graphs! A Community View on Graph Processing Systems

Stefan Plantikow, Petra Selmer, Hannes Voigt et al

2020

Modeling the Gradual Degradation of Eventually-Consistent Distributed Graph Databases

Jim Webber et al

The Future is Big Graphs! A Community View on Graph Processing Systems

Stefan Plantikow, Petra Selmer, Hannes Voigt et al

2019

Big Graph Processing Systems

Hannes Voigt et al

Efficient Query Processing for Dynamically Changing Datasets

Hannes Voigt et al

Schema Validation and Evolution for Graph Databases

Peter Furniss, Alastair Green, Hannes Voigt et al

Period Index: A Learned 2D Hash Index for Range and Duration Queries

Hannes Voigt et al

Understanding Trolls with Efficient Analytics of Large Graphs in Neo4j

David Allen, Amy Hodler, Michael Hunger, William Lyon, Mark Needham, Hannes Voigt et al

Graph Query Languages

Hannes Voigt et al

Updating Graph Databases with Cypher

Alastair Green, Tobias Lindaaker, Stefan Plantikow, Mats Rydberg, Petra Selmer, Andrés Taylor et al

Approximate Querying for the Property Graph Language Cypher

Petra Selmer et al

2018

Cypher: An Evolving Query Language for Property Graphs

Alastair Green, Tobias Lindaaker, Stefan Plantikow, Mats Rydberg, Petra Selmer, Andrés Taylor et al

Declarative and distributed graph analytics with GRADOOP

Martin Junghanns, Max Kießling et al

openCypher: New Directions in Property Graph Querying

Alastair Green, Martin Junghanns, Max Kießling, Tobias Lindaaker, Stefan Plantikow, Petra Selmer

2017

ACTiCLOUD: Enabling the Next Generation of Cloud Applications

Jim Webber, Davide Grohmann et al

2016

Investigations on Path Indexing for Graph Databases

Jonathan Sumrall, Johan Svensson, Magnus Vejlstrup, Chris Vest, Jim Webber et al

2012

A Programmatic Introduction to Neo4j

Jim Webber

The Graph Traversal Pattern

Peter Neubauer et al

Funding

Neo4j is built upon a solid research foundation. Research is a collaborative endeavor, and we work alongside colleagues in academia to push forward the boundaries of graph data. We offer research funding across a range of activities, from masters level through to project and program funding.

M.Sc. Dissertations

Prospective masters students in computing science or allied disciplines are invited to contact us about thesis-level project opportunities. Students will be supported by our R&D team to build and evaluate real-world database implementations for their thesis. 

Hannes Voigt + others
In the past Neo4j has hosted students from TU Eindhoven, KTH, University of Leipzig, TU Munich, LTH.

Ph.D. scholarships

Building on our successful track record of collaboration with leading research-intensive universities, Neo4j are able to offer a limited number of bursaries for Ph.D. studentships to investigate areas of research interest in graph databases and related fields. Available bursaries are announced through partner universities.

Previously Neo4j has sponsored students from Newcastle University and Birkbeck University of London.

Post-Doctoral funding

Neo4j researchers collaborate with leading research institutions on the most challenging graph database research problems. Funding is made available to partner universities for post-doctoral staff to work on medium-term systems research in graph databases.

Presently Neo4j is working with Newcastle University, LIRIS, and UC Berkeley.

Collaborations

Examples of Neo4j’s current and past research collaborations can be found below.

Ongoing Academic Projects

Newcastle University (UK)

Neo4j is engaged with the team lead by Dr Paul Ezhilchelvan and Prof Isi Mitrani working on novel transaction protocols for graph databases. The work has involved the design, modeling, verification and implementation of new kinds of transaction processing protocols for scalable, fault-tolerant graph databases. The work is an ongoing collaboration with Ph.D. students on the team having the opportunity to intern at Neo4j as part of their studies.

Ongoing Academic Projects

LIRIS (France)

Neo4j support the ongoing work of Prof Angela Bonifati and her team on query languages for graphs, including both Neo4j’s Cypher and the forthcoming ISO GQL standard.

Multi-Institution Projects

ACTiCLOUD

An EC funded H2020 research project to create elastic infrastructure for the cloud, including servers with large aggregate RAM and cores. As part of this work, Neo4j performed research to exploit the aggregate resources provided by the underlying platform by extending the Cypher runtime and query planner to execute queries in a parallel and NUMA-aware fashion.

Even on standard hardware the results of this research means that Cypher queries can be parallelized and have locality cost built into their query plans. Since Neo4j 5.13, users have been able to efficiently run large graph analytics jobs in the database that have previously been the domain of compute platforms, using the parallel runtime, without custom code.

Multi-Institution Projects

LDBC

Neo4j was a founding member of the Linked Data Benchmark Council (LDBC). LDBC is an independent authority for specifying benchmarks, benchmarking procedures and verifying/publishing results for software systems designed to manage connected data. Since its foundation other database vendors have joined the effort, including: Oracle, IBM, AWS, and SAP.