Neo4j Research
Welcome to Neo4j Research, where we turn science into technology.
At Neo4j, we build graph data products that delight our users and strive to always surpass customer expectations. Our products are built on a rich, collaborative history of computer science research and solid engineering.
Neo4j Research is where we explore future possibilities, both to enhance current products and to discover new opportunities. As multi-disciplinary computer scientists, we work with product teams at Neo4j as well as leading research institutions around the world with the common goal to accelerate graph technology.
Projects
At Neo4j, we perform systems research on all parts of the graph data stack. We are currently working on a diverse set of projects that target temporal graph use-cases, leaderless transaction processing methods, and novel query runtimes based on dynamic programming languages.
Our aim is to understand how to build graph processing systems for modern cloud environments that are more capable than the current state-of-the-art and a departure from the classic (relational) approach.
Current graph database runtimes are built using the same techniques and principles as relational databases which can inhibit their performance and functionality. The fundamental issue is that graph runtimes have to handle a lot of irregularity, stemming from both schema-optionality and irregularity of workload and topology from a machine point of view.
To solve these problems we are building a next-generation query runtime that is inspired by dynamic programming languages technology. It allows us to optimize schema-less graphs through dynamic code optimization and to scale processing by adopting new compute paradigms such as disaggregated compute or accelerated computing with specialized hardware.
Modern graph database management systems (DBMSs) allow users to model real-world interactions as a set of nodes and relationships at a billions-to-trillion scale. However, existing systems ignore the temporal dimension of data: how a graph evolved over time. Lacking native temporal support, ad-hoc strategies are implemented that only achieve good performance depending on the size of the effective graph workload, such as local pattern matching or global graph algorithms.
To tackle this problem, we designed Aion, a transactional temporal graph DBMS that generalizes previous approaches for labeled property graphs (LPGs). Aion is built directly atop Neo4j and adopts a hybrid temporal storage approach. For point lookups and small subgraph queries, it uses LineageStore that indexes graph updates by entity identifiers. For queries that require full graph reconstruction at arbitrary time points, it uses TimeStore that indexes updates by time.
To enable incremental graph computations for improved latency, Aion introduces a compute-efficient in-memory LPG representation. Our experiments so far show that Aion achieves up to 7x higher throughput against existing non-transactional temporal systems and provides up to an order of magnitude speedup over Neo4j with minimal storage overhead.
Transaction protocols have historically been decoupled from the data models they support. Consequently graph databases either support one of two suboptimal choices: either protocols that are too strict which sacrifice performance while maintaining correctness, or too loose which offers better performance but corrupts data in normal operation.
In the long term we need better options. We are investigating one such approach called “Conjunction of Majorities” where transaction messages carry metadata about their predecessors. Participants use this metadata to compare against their local state to determine compatibility. For single-shard transactions, if a majority of participants discover that a transaction is compatible then it can proceed through a conventional two-phase consensus protocol. For multi-shard transactions each shard must have a majority and hence “conjunction of majorities” in the general case.
We have undertaken a theoretical investigation to establish the limits of the approach with respect to correctness (specifically reciprocal consistency for graphs) and global constraints. We are also building a prototype system to evaluate the performance of the approach in real-world conditions.
Publications
Neo4j has a strong publication history, and often collaborates with universities and other industrial researchers.
2024
Hardware-Efficient Data Imputation through DBMS Extensibility
George Theodorakis et al
An Empirical Evaluation of Variable-length Record B+Trees on a Modern Graph Database System
George Theodorakis, James Clarkson, and Jim Webber
Seraph: Continuous Queries on Property Graph Streams
Stefan Plantikow and Hannes Voigt
Aion: Efficient Temporal Graph Data Management
George Theodorakis, James Clarkson, and Jim Webber
BIFROST: A Future Graph Database Runtime
James Clarkson, George Theodorakis, and Jim Webber
2023
Analysis of an Epoch Commit Protocol for Distributed Processing Systems
Jim Webber et al
2022
A Performance Study of Epoch-based Commit Protocols in Distributed OLTP Databases
Jack Waudby and Jim Webber
Pick & Mix Isolation Levels: Mixed Serialization Graph Testing
Jack Waudby and Jim Webber
2021
A GraphBLAS implementation in Pure Java
Florentin Dörre, Martin Junghanns et al
PG-Keys: Keys for Property Graphs
Keith Hare et al
The Future is Big Graphs! A Community View on Graph Processing Systems
Stefan Plantikow, Petra Selmer, Hannes Voigt et al
2020
Modeling the Gradual Degradation of Eventually-Consistent Distributed Graph Databases
Jim Webber et al
The Future is Big Graphs! A Community View on Graph Processing Systems
Stefan Plantikow, Petra Selmer, Hannes Voigt et al
2019
Big Graph Processing Systems
Hannes Voigt et al
Efficient Query Processing for Dynamically Changing Datasets
Hannes Voigt et al
Schema Validation and Evolution for Graph Databases
Peter Furniss, Alastair Green, Hannes Voigt et al
Period Index: A Learned 2D Hash Index for Range and Duration Queries
Hannes Voigt et al
Understanding Trolls with Efficient Analytics of Large Graphs in Neo4j
David Allen, Amy Hodler, Michael Hunger, William Lyon, Mark Needham, Hannes Voigt et al
Graph Query Languages
Hannes Voigt et al
Updating Graph Databases with Cypher
Alastair Green, Tobias Lindaaker, Stefan Plantikow, Mats Rydberg, Petra Selmer, Andrés Taylor et al
Approximate Querying for the Property Graph Language Cypher
Petra Selmer et al
2018
Cypher: An Evolving Query Language for Property Graphs
Alastair Green, Tobias Lindaaker, Stefan Plantikow, Mats Rydberg, Petra Selmer, Andrés Taylor et al
Declarative and distributed graph analytics with GRADOOP
Martin Junghanns, Max Kießling et al
openCypher: New Directions in Property Graph Querying
Alastair Green, Martin Junghanns, Max Kießling, Tobias Lindaaker, Stefan Plantikow, Petra Selmer
2017
ACTiCLOUD: Enabling the Next Generation of Cloud Applications
Jim Webber, Davide Grohmann et al
2016
Investigations on Path Indexing for Graph Databases
Jonathan Sumrall, Johan Svensson, Magnus Vejlstrup, Chris Vest, Jim Webber et al
2012
A Programmatic Introduction to Neo4j
Jim Webber
The Graph Traversal Pattern
Peter Neubauer et al
Funding
Neo4j is built upon a solid research foundation. Research is a collaborative endeavor, and we work alongside colleagues in academia to push forward the boundaries of graph data. We offer research funding across a range of activities, from masters level through to project and program funding.
M.Sc. Dissertations
Prospective masters students in computing science or allied disciplines are invited to contact us about thesis-level project opportunities. Students will be supported by our R&D team to build and evaluate real-world database implementations for their thesis.
In the past Neo4j has hosted students from TU Eindhoven, KTH, University of Leipzig, TU Munich, LTH.
Ph.D. Scholarships
Building on our successful track record of collaboration with leading research-intensive universities, Neo4j are able to offer a limited number of bursaries for Ph.D. studentships to investigate areas of research interest in graph databases and related fields. Available bursaries are announced through partner universities.
We are looking to recruit a Ph.D. student in a jointly-supervised thesis at the University of Surrey (UK) using AI on the inside of databases. Previously we have sponsored students from Newcastle University and Birkbeck University of London.
Post-Doctoral Funding
Neo4j researchers collaborate with leading research institutions on the most challenging graph database research problems. Funding is made available to partner universities for post-doctoral staff to work on medium-term systems research in graph databases.
Presently Neo4j is working with Newcastle University, LIRIS, and UC Berkeley.
Collaborations
Examples of Neo4j’s current and past research collaborations can be found below.
Ongoing Academic Projects
Newcastle University (UK)
Neo4j is engaged with the team lead by Dr Paul Ezhilchelvan and Prof Isi Mitrani working on novel transaction protocols for graph databases. The work has involved the design, modeling, verification and implementation of new kinds of transaction processing protocols for scalable, fault-tolerant graph databases. The work is an ongoing collaboration with Ph.D. students on the team having the opportunity to intern at Neo4j as part of their studies.
Ongoing Academic Projects
LIRIS (France)
Neo4j support the ongoing work of Prof Angela Bonifati and her team on query languages for graphs, including both Neo4j’s Cypher and the forthcoming ISO GQL standard.
Multi-Institution Projects
ACTiCLOUD
An EC funded H2020 research project to create elastic infrastructure for the cloud, including servers with large aggregate RAM and cores. As part of this work, Neo4j performed research to exploit the aggregate resources provided by the underlying platform by extending the Cypher runtime and query planner to execute queries in a parallel and NUMA-aware fashion.
Even on standard hardware the results of this research means that Cypher queries can be parallelized and have locality cost built into their query plans. Since Neo4j 5.13, users have been able to efficiently run large graph analytics jobs in the database that have previously been the domain of compute platforms, using the parallel runtime, without custom code.
Multi-Institution Projects
LDBC
Neo4j was a founding member of the Linked Data Benchmark Council (LDBC). LDBC is an independent authority for specifying benchmarks, benchmarking procedures and verifying/publishing results for software systems designed to manage connected data. Since its foundation other database vendors have joined the effort, including: Oracle, IBM, AWS, and SAP.
Contact Us
If you’d like to get in touch us, please email us:
research@neo4j.com