Turning Your Tabular Data Into a Graph Using Cypher


After more than seven years in the data science industry, I have become convinced of two things.

First, graphs (including graph databases, graph data science, and graph visualizations) are largely untapped powerhouses in the field of data science. Graphs can enhance analysis, improve machine learning algorithms, and ground LLMs on local data with no additional model tuning. Since I was introduced to the wonderful world of graphs in 2020, I have never looked back. I want to introduce more data professionals everywhere to the unique versatility graphs have to offer.

Image of a graph database, including different colored nodes that are connected to each other with labeled relationships.

Second, far too many training courses and online tutorials use the same datasets! I distinctly remember how exciting it was the first time I worked with datasets like Titanic, Iris, and MNIST. However, as I continued in my learning journey, seeing the same or similar datasets over and over again made it difficult to stay engaged.

The combination of these two things led me to writing this tutorial with one of my favorite datasets: the Scooby-Doo dataset. The original dataset was created by Amber Thomas, who sacrificed hours of her time to watch all the media in the Scooby-Doo universe and create a tabular dataset. As a huge Scooby-Doo fan myself, I have used this dataset for graph projects and assignments for several years now. I recently took Amber’s original dataset and made some substantial changes to her classification of monster “subtypes,” “types,” and “species” to enforce an ontological hierarchy (monster < subtype < type < species), as well as fill in a few null values on IMDB and engagement scores. Other than that, the dataset is preserved in its original form.

This dataset will walk you through the first step in creating your ScoobyGraph: how to turn your tabular data into a graph using Cypher commands. (Note: This step only focuses on developing Cypher queries to load the data straight into Neo4j.)

To complete this tutorial, you will need:

  • My edited version of the Scooby-Doo dataset, which you can find in my GitHub repo
  • A Neo4j database (you can spin up a free instance of our Neo4j Aura cloud software or download Neo4j Desktop to your computer — also free!)

Step 1: Decide How to Structure Your Graph

Once you have your data, the first step is deciding how you want to represent your data in a graph. Which should be a node? Which should be a relationship? Should the data be in a property or in its own node? The easiest way to do this is by creating a diagram called a data model (see Graph modeling guidelines for a more in-depth overview of how to create one).

Tabular data can be added to a graph in many ways, and choosing a graph model will depend on your specific use case and what parts of the data you are focused on. To me, the most interesting and fun part of this dataset is exploring the various monsters and their characteristics and interactions with the main characters. With that in mind, I created the following data model (using the handy Arrows.app website).

Data model for the Scooby-Doo data

The data model includes node labels, relationship labels, and properties. Example values of the properties are included. This data model now provides an important reference as we begin to craft Cypher queries.

Step 2: Create Cypher Queries to Load Your Graph

Now that we have a data model, we can begin loading data into Neo4j using Cypher, which is a graph query language similar to SQL. I won’t do a full Cypher tutorial here, but you can access free Cypher training in the GraphAcademy.

To load the data from your CSV into Neo4j, use the following command to access the GitHub file:

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row

Now that we can access the file, we can begin adding the nodes. For simplicity, compute, and easy troubleshooting, we will call to the file each time we load a section of the graph since the file is small. Let’s begin by loading in the series nodes by running the following code into the Neo4j command line:

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row
MERGE (s:Series {seriesName:row.series_name})

Now our graph includes the TV series nodes.

Now that we have the series nodes, we can add the episode nodes. We do this by going through each row, matching the existing series node, creating a new episode node, and connecting the two with the IN_SERIES relationship:

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row
MATCH (s:Series {seriesName:row.series_name})
MERGE (e:Episode {episodeTitle:row.title})
SET e.season = row.season, e.network = row.network,
e.imdb = row.imdb, e.engagement = row.engagement,
e.runtime = row.runtime, e.format = row.format
MERGE (s) <-[:IN_SERIES] - (e)

The graph now has episodes that are tied to the TV series in which they appeared.

Now that we have the series and episodes, let’s load in the monsters. The monsters have a subtype, a type, and a species. The following Cypher code loads in rows but filters them to where the monster name is not null because some episodes have no monsters. It creates a monster node, then creates new subtype, type, and species nodes if they don’t already exist, and finally creates the relationships between them. The code also looks for the episode the monster appears in and creates an APPEARS_IN relationship.

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row WHERE row.monster_name IS NOT NULL
MERGE (m:Monster {name:row.monster_name})
SET m.real = row.monster_real, m.gender = row.monster_gender
WITH m,row
MERGE (st:Subtype {name:row.monster_subtype})
MERGE (t:Type {name:row.monster_type})
MERGE (s:Species {name:row.monster_species})
WITH m,st,t,s,row
MATCH (e:Episode {episodeTitle:row.title})
MERGE (m) -[:APPEARS_IN] -> (e)
MERGE (m) -[:IS_SUBTYPE]->(st)
MERGE (st)-[:IS_TYPE]->(t)
MERGE (t)-[:IS_SPECIES]->(s)

Another aspect to add is the connections between monsters appearing in the same episode. This isn’t something derived from a row in the file but from the graph itself. We can capture this by running a simple Cypher statement that creates an APPEARS_WITH relationship between any two monsters connected to the same episode:

MATCH (m:Monster) - [:APPEARS_IN] -> (e:Episode) <- [:APPEARS_IN] - (m1:Monster)
MERGE (m) - [:APPEARS_WITH] -> (m1)

The database now contains monster information and almost all the nodes and relationships that were in the data model.

Some of the monster information in the graph

Referring back to the data model, the only remaining nodes that are not in the graph are the MainCharacter nodes. However, the main characters do not have a row in the data file like the monsters do, so instead of calling that information from the file, we will just create them using Cypher:

MERGE (:MainCharacter {name:"Fred"})
MERGE (:MainCharacter {name:"Daphne"})
MERGE (:MainCharacter {name:"Fred"}) MERGE (:MainCharacter {name:"Daphne"}) MERGE (:MainCharacter {name:"Shaggy"}) MERGE (:MainCharacter {name:"Scooby"}) MERGE (:MainCharacter {name:"Velma"})

Now that the main characters are in the graph, we need to add the UNMASKED, CAPTURED, and CAUGHT relationships between the main characters and the monsters.

Due to the structure of the data, this requires a more complex cipher, but since the size is so small, we can load the unmasked relationships:

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.unmask_fred ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Fred'})
MERGE (mc) -[:UNMASKED]->(m); 

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.unmask_daphnie ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Daphne'})
MERGE (mc) -[:UNMASKED]->(m);   

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.unmask_velma ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Velma'})
MERGE (mc) -[:UNMASKED]->(m);  

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.unmask_shaggy ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Shaggy'})
MERGE (mc) -[:UNMASKED]->(m);   

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.unmask_scooby ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Scooby'})
MERGE (mc) -[:UNMASKED]->(m); 

We can repeat the same process for the caught relationships:

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.caught_fred ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Fred'})
MERGE (mc) -[:CAUGHT]->(m); 

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.caught_daphnie ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Daphne'})
MERGE (mc) -[:CAUGHT]->(m);   

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.caught_velma ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Velma'})
MERGE (mc) -[:CAUGHT]->(m);   

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row
WITH row where row.caught_shaggy ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Shaggy'})
MERGE (mc) -[:CAUGHT]->(m);   

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.caught_scooby ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Scooby'})
MERGE (mc) -[:CAUGHT]->(m);

Finally, we can repeat the same process for the captured relationships, making sure that we point the relationship in the opposite direction per our data model — this time from monster to main character instead of the other way around:

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.captured_fred ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Fred'})
MERGE (m) -[:CAPTURED]->(mc);

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.captured_daphnie ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Daphne'})
MERGE (m) -[:CAPTURED]->(mc);

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.captured_velma ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Velma'})
MERGE (m) -[:CAPTURED]->(mc);

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.captured_shaggy ='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Shaggy'})
MERGE (m) -[:CAPTURED]->(mc);

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/CayleyCausey/ScoobyGraph/main/scoobydoo_new.csv' as row 
WITH row where row.captured_scooby='True'
MATCH (m:Monster {name:row.monster_name})
MATCH (mc:MainCharacter {name:'Scooby'})
MERGE (m) -[:CAPTURED]->(mc);

We have added all the data in our graph. We can call the function db.schema.visualization() to get confirmation that the data in our database looks like the data model we developed earlier.

Now you can explore your ScoobyGraph!


Turning Your Tabular Data Into a Graph Using Cypher was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.