In our world today, we seem to encounter crises and global struggles on a daily basis. But what if we could help others and improve life around us using the data and technology at our fingertips?
Graph databases like Neo4j help us do exactly that. By storing data and the connections between them we can interrogate that data to see underlying context and relationships that get missed in other data formats.
We will look at the technical aspects of three project areas where graphs help solve these complex, real-world problems. While we will only cover three projects in today’s article, there are many more. To find out more about current projects or submit a project, check out the Graphs4Good page.
Graphs are being used to bring varying data stores together into a meaningful map for cancer research. The constantly changing field of medicine necessitates a flexible data model that can adapt to new entities and their relationships to existing data. Neo4j minimizes model constraints and captures relationships between many different types of entities, eliminating costly JOIN operations in traditional relational models and queries.
Graphs also manage massive family trees of plants to increase food sources and decrease natural resource consumption. Hierarchical data often means recursive queries over entities of the same type. Modeling the same data structure as a graph in Neo4j presents the tree structure naturally and removes iterating over the same dense paths.
Finally, graphs connect disparate data to identify and track space debris for future space travel and cleanup. Humans collect vast amounts of data, but extracting meaningful information is a challenge. Traditional data stores create a single-model view of the data and require many-to-many relationships across entity types. Neo4j integrates all kinds of data, connecting the pieces of the puzzle and presenting a unified view for transforming data into knowledge.
Let’s talk tech!
Working Towards a Cure
Medicine does amazing things, but humans still suffer from a number of illnesses. From survivable health conditions to deadly diseases, health sciences are a continuous field of study and improvement. How can we prevent health problems, minimize pain and symptoms, and cure illness?
Research scientists and doctors at the IRCCS utilize graphs for information management to conduct advanced cancer research that provides treatments and continues searching for a cure. Neo4j was initially brought in to maintain consistency between relational data stores. Complex entities and contexts made data integration and synchronization a difficult task. The graph became a way to connect disparate data and definitions across an entire system.
However, the project integrated Neo4j further when they combatted changing data models, complex relationships, and sluggish query performance in a relational solution. Queries across many types of entities meant many JOIN operations in SQL, and keeping the data model in sync with the latest medical research created obstacles for data coherence. By using graphs, the institute could integrate and analyze data from MySQL, MongoDB, and new data sources. Users are now harnessing Neo4j’s flexible data model and optimized pattern-matching to adapt to changes in the medical field, analyze experimental procedures, and model complex concepts in semantic knowledge.
IRCCS sets the stage for continued improvements with hopes of eliminating human disease through understanding complex biological data.
Feeding the World’s Growing Population
As the number of people on Earth continues to increase, many are looking for ways to maximize the output of food while reducing the consumption of other natural resources in the process. Bayer is working on solving this problem with graph databases to help feed the expected 9.5 billion humans worldwide by 2050.
They are studying plant ancestry to understand breeding cycles that produce certain traits (drought tolerance, high yield, etc). Two parent plants cross to produce several child plants, of which only a select few make it to the next breeding cycle. Starting with many parent plants that produce at least hundreds of thousands of offspring, this generates a massive “family tree” very quickly. All of this hierarchical data is extremely hard to navigate, and researchers want to find the entire tree of ancestors for a single plant.
Doing this in the relational model required 11 tables and many recursive joins that caused exponential query times at and beyond nine hops. In contrast, Neo4j’s relationship traversal created constant query times. This allowed researchers to also query plant features and return genealogy for one million plants, rather than only one plant, within the same timeframe. Neo4j provides this information as a REST API that the group named “Ancestry-as-a-Service”, which integrates with Oracle and Kafka. In addition, the graph is being used for predictive analytics, alongside Spark and HBase. Predicting traits of plant offspring will narrow down the list of test subjects, reducing the resource consumption of testing plants in the field.
Bayer is planning for the future by navigating dense plant ancestry data with decreasing natural resource consumption to feed an increasing population.
Keeping Current and Future Space Exploration Safe
With over 70 years of global space exploration, there is a lot of space debris from dead satellites, exploded rockets, and unusable equipment floating around and beyond Earth. How can we make sure we protect the safety of future missions (for equipment and people), as well as work on ways to clean up existing debris?
Moriba Jah has a project for just this purpose called ASTRIAGraph. It uses Neo4j to identify ownership of space objects to companies/countries, match objects to registries, and track changes over time. Graphs help them ingest widespread data sources to track object details, predict collisions, and potentially fingerprint objects for identification. The project also utilizes technologies such as computer-aided design (CAD) models and predictive analytics to gain comprehensive information about the object’s geospatial location and trajectories.
ASTRIAGraph consists of three technical layers — the landing zone for data ingestion, the mezzanine for interpreting information (Neo4j knowledge graph), and the inquiry layer for querying answers to questions. The amount of diverse data creates traditional many-to-many relationships in other data storage formats, which makes a knowledge graph that connects the disparate entities cumbersome. These difficulties become manageable and useful in a graph, where the schema is stored clearly with the data it organizes.
Jah’s ASTRIAGraph can inspire us to do more with existing datasets to promote safety and efficiency in research and exploration.
What Other Problems Can Graphs Solve?
These three projects are only a small sample of the myriad projects currently in the works, plus many unknown or unimagined others waiting to take advantage of the power of a graph database. From natural resources to healthcare to improving the lives of creatures and humans, storing and traversing data as nodes and relationships allows users to enhance their understanding of context, hierarchy, meaning, dependencies, and much more.
Many Graphs4Good projects are still active and looking for your help and input. There are also many unnamed or unknown projects out there. If you have a project or know of one, don’t hesitate to contact us. New to graphs? Check out Neo4j’s Developer page for more information and learn how to get started with the technology!
Graphs for Good — Where Graph Technology is Tackling Complex, Real-World Problems was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.