Tutorial: Import data
This tutorial demonstrates how to import data from CSV files using LOAD CSV
.
With the combination of the Cypher® clauses LOAD CSV
, MERGE
, and CREATE
you can import data into Neo4j.
LOAD CSV
allows you to access the data values and perform actions on them.
|
The data files
In this tutorial, you import data from the following CSV files:
-
persons.csv
-
movies.csv
-
roles.csv
The content of the persons.csv file:
id,name
1,Charlie Sheen
2,Michael Douglas
3,Martin Sheen
4,Morgan Freeman
The persons.csv file contains two columns id
and name
.
Each row represents one person that has a unique id
and a name
.
The content of the movies.csv file:
id,title,country,year
1,Wall Street,USA,1987
2,The American President,USA,1995
3,The Shawshank Redemption,USA,1994
The movies.csv file contains the columns id
, title
, country
, and year
.
Each row represents one movie that has a unique id
, a title
, a country
of origin, and a release year
.
The content of the roles.csv file:
personId,movieId,role
1,1,Bud Fox
3,1,Carl Fox
2,1,Gordon Gekko
3,2,A.J. MacInerney
2,2,President Andrew Shepherd
4,3,Ellis Boyd 'Red' Redding
The roles.csv file contains the columns personId
, movieId
, and role
.
Each row represents one role with relationship data about the person id
(from the persons.csv file) and the movie id
(from the movies.csv file).
The graph model
The following data model shows what a graph model for this dataset could look like:
This is the resulting graph, based on the data from the CSV files:
Prerequisites
This tutorial uses the Linux or macOS tarball installation.
It assumes that your current work directory is the <neo4j-home> directory of the tarball installation, and the CSV files are placed in the default import directory.
|
Prepare the database
Before importing the data, you should prepare the database you want to use by creating indexes and constraints.
You should ensure that the Person
and Movie
nodes have unique id
properties by creating constraints on them.
Creating a unique constraint also implicitly creates an index.
By indexing the id
property, node lookup (e.g. by MATCH
) will be much faster.
Additionally, it is a good idea to index the country name
for a fast lookup.
1. Start neo4j.
Run the command:
bin/neo4j start
The default user name is |
2. Create a constraint so that each Person
node has a unique id
property.
You create a constraint on the id
property of Person
nodes to ensure that nodes with the Person
label will have a unique id
property.
Using Neo4j Browser, run the following Cypher:
CREATE CONSTRAINT personIdConstraint FOR (person:Person) REQUIRE person.id IS UNIQUE
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j "CREATE CONSTRAINT personIdConstraint FOR (person:Person) REQUIRE person.id IS UNIQUE"
3. Create a constraint so that each Movie
node has a unique id
property.
You create a constraint on the id
property of Movie
nodes to ensure that nodes with the Movie
label will have a unique id
property.
Using Neo4j Browser, run the following Cypher:
CREATE CONSTRAINT movieIdConstraint FOR (movie:Movie) REQUIRE movie.id IS UNIQUE
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j "CREATE CONSTRAINT movieIdConstraint FOR (movie:Movie) REQUIRE movie.id IS UNIQUE"
4. Create an index for Country
node for the name
property.
Create an index on the name
property of Country
nodes to ensure fast lookups.
When using |
Using Neo4j Browser, run the following Cypher:
CREATE INDEX FOR (c:Country) ON (c.name)
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j "CREATE INDEX FOR (c:Country) ON (c.name)"
Import data using LOAD CSV
1. Load the data from the persons.csv file.
You create nodes with the Person
label and the properties id
and name
.
Using Neo4j Browser, run the following Cypher:
LOAD CSV WITH HEADERS FROM "file:///persons.csv" AS csvLine
CREATE (p:Person {id: toInteger(csvLine.id), name: csvLine.name})
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///persons.csv" AS csvLine CREATE (p:Person {id:toInteger(csvLine.id), name:csvLine.name})'
Output:
Added 4 nodes, Set 8 properties, Added 4 labels
|
2. Load the data from the movies.csv file.
You create nodes with the Movie
label and the properties id
, title
, and year
.
Also you create nodes with the Country
label.
Using MERGE
avoids creating duplicate Country
nodes in the case where multiple movies have the same country of origin.
The relationship with the type ORIGIN
will connect the Country
node and the Movie
node.
Using Neo4j Browser, run the following Cypher:
LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS csvLine
MERGE (country:Country {name: csvLine.country})
CREATE (movie:Movie {id: toInteger(csvLine.id), title: csvLine.title, year:toInteger(csvLine.year)})
CREATE (movie)-[:ORIGIN]->(country)
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS csvLine MERGE (country:Country {name:csvLine.country}) CREATE (movie:Movie {id:toInteger(csvLine.id), title:csvLine.title, year:toInteger(csvLine.year)}) CREATE (movie)-[:ORIGIN]->(country)'
Output:
Added 4 nodes, Created 3 relationships, Set 10 properties, Added 4 labels
3. Load the data from the roles.csv file
Importing the data from the roles.csv file is a matter of finding the Person
node and the Movie
node and then creating relationships between them.
For larger data files, good practice is to use the subquery |
Using Neo4j Browser, run the following Cypher:
:auto LOAD CSV WITH HEADERS FROM 'file:///roles.csv' AS csvLine
CALL {
WITH csvLine
MATCH (person:Person {id: toInteger(csvLine.personId)}), (movie:Movie {id: toInteger(csvLine.movieId)})
CREATE (person)-[:ACTED_IN {role: csvLine.role}]->(movie)
} IN TRANSACTIONS OF 2 ROWS
Using Cypher Shell, you can run the command:
bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///roles.csv" AS csvLine CALL {WITH csvLine MATCH (person:Person {id: toInteger(csvLine.personId)}), (movie:Movie {id: toInteger(csvLine.movieId)}) CREATE (person)-[:ACTED_IN {role: csvLine.role}]->(movie)} IN TRANSACTIONS OF 2 ROWS'
Output:
Created 5 relationships, Set 5 properties
Note, |
Validate the imported data
Check the resulting data set by finding all the nodes that have a relationship.
Using Neo4j Browser, run the following Cypher:
MATCH (n)-[r]->(m) RETURN n, r, m
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j 'MATCH (n)-[r]->(m) RETURN n, r, m'
Output:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | r | m |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Movie {id: 3, title: "The Shawshank Redemption", year: 1994}) | [:ORIGIN] | (:Country {name: "USA"}) |
| (:Movie {id: 2, title: "The American President", year: 1995}) | [:ORIGIN] | (:Country {name: "USA"}) |
| (:Movie {id: 1, title: "Wall Street", year: 1987}) | [:ORIGIN] | (:Country {name: "USA"}) |
| (:Person {name: "Martin Sheen", id: 3}) | [:ACTED_IN {role: "Carl Fox"}] | (:Movie {id: 1, title: "Wall Street", year: 1987}) |
| (:Person {name: "Charlie Sheen", id: 1}) | [:ACTED_IN {role: "Bud Fox"}] | (:Movie {id: 1, title: "Wall Street", year: 1987}) |
| (:Person {name: "Michael Douglas", id: 2}) | [:ACTED_IN {role: "Gordon Gekko"}] | (:Movie {id: 1, title: "Wall Street", year: 1987}) |
| (:Person {name: "Michael Douglas", id: 2}) | [:ACTED_IN {role: "President Andrew Shepherd"}] | (:Movie {id: 2, title: "The American President", year: 1995}) |
| (:Person {name: "Martin Sheen", id: 3}) | [:ACTED_IN {role: "A.J. MacInerney"}] | (:Movie {id: 2, title: "The American President", year: 1995}) |
| (:Person {name: "Morgan Freeman", id: 4}) | [:ACTED_IN {role: "Ellis Boyd 'Red' Redding"}] | (:Movie {id: 3, title: "The Shawshank Redemption", year: 1994}) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+