Tutorial
In this short tutorial, users will learn how to create, query, and delete a property graph database using Cypher®. The tutorial uses the Neo4j movie database.
Creating a data model
Before creating a property graph database, it is important to develop an appropriate data model. This will provide structure to the data, and allow users of the graph to efficiently retrieve the information they are looking for.
The following data model is used for the Neo4j data model:
It includes two types of node labels:
-
Personnodes, which have the following properties:name(string) andborn(integer). -
Movienodes, which have the following properties:title(string),released(integer), andtagline(string).
The data model also contains five different relationship types between the Person and Movie nodes: ACTED_IN, DIRECTED, PRODUCED, WROTE, and REVIEWED. Two of the relationship types have properties:
-
The
ACTED_INrelationship type, which has therolesproperty (string). -
The
REVIEWEDrelationship type, which has asummaryproperty (string) and aratingproperty (float).
To learn more about data modelling for graph databases, enroll in the free Graph Data Modelling Fundamentals course offered by GraphAcademy.
Creating a property graph database
The complete Cypher query to create the Neo4j movie database, can be found here. To create the full graph, run the full query against an empty Neo4j database.
Finding nodes
The MATCH clause is used to find a specific pattern in the graph, such as a specific node.
The RETURN clause specifies what of the found graph pattern to return.
For example, this query will find the nodes with Person label and the name Keanu Reeves, and return the name and born properties of the found nodes:
MATCH (keanu:Person {name:'Keanu Reeves'})
RETURN keanu.name, keanu.born
| keanu.name | keanu.born |
|---|---|
|
|
Rows: 1 |
|
It is also possible to query a graph for several nodes.
This query matches all nodes with the Person label, and limits the results to only include five rows.
MATCH (people:Person)
RETURN people
LIMIT 5
| people |
|---|
|
|
|
|
|
Rows: 5 |
Note on clause composition
Similar to SQL, Cypher queries are constructed using various clauses which are chained together to feed intermediate results between each other. Each clause has as input the state of the graph and a table of intermediate results consisting of the current variables. The first clause takes as input the state of the graph before the query and an empty table of intermediate results. The output of a clause is a new state of the graph and a new table of intermediate results, serving as input to the next clause. The output of the last clause is the result of the query.
Note that if one of the clauses returns an empty table of intermediate results, there is nothing to pass on to subsequent clauses, thus ending the query.
(There are ways to circumvent this behaviour.
For example, by replacing a MATCH clause with OPTIONAL MATCH.)
In the below example, the first MATCH clause finds all nodes with the Person label.
The second clause will then filter those nodes to find all Person nodes who were bron in the 1980s.
The final clause returns the result in a descending chronological order.
MATCH (bornInEighties:Person)
WHERE bornInEighties.born >= 1980 AND bornInEighties.born < 1990
RETURN bornInEighties.name as name, bornInEighties.born as born ORDER BY born DESC
| name | born |
|---|---|
|
|
|
|
|
|
|
|
Rows: 4 |
|
Finding connected nodes
To discover how nodes are connected to one another, relationships must be added to queries. Queries can specify relationship types, properties, and direction, as well as the start and end nodes of the pattern.
For example, the following query matches the graph for the director of the movie the Matrix, and returns the name property of its directors.
MATCH (m:Movie {title: 'The Matrix'})<-[d:DIRECTED]-(p:Person)
RETURN p.name as director
| director |
|---|
|
|
Rows: 2 |
It also possible to look for the type of relationships that connect nodes to one another.
The below query searches the graph for outgoing relationships from the Tom Hanks node to any Movie nodes, and returns the relationships and the titles of the movies connected to him.
MATCH (tom:Person {name:'Tom Hanks'})-[r]->(m:Movie)
RETURN type(r) AS type, m.title AS movie
The result shows that he has 13 outgoing relationships connected to 12 different Movie nodes (12 have the ACTED_IN type and one has the DIRECTED type).
| type | movie |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rows: 13 |
|
Finding paths
There are several ways in which Cypher can be used to search a graph for paths between nodes.
To search for patterns of a fixed length, specify the distance ("hops") between the nodes in the pattern.
For example, the following query matches all Person nodes exactly 2 "hops" away from Tom Hanks and returns the first five rows.
MATCH (tom:Person {name:'Tom Hanks'})-[*2]-(colleagues:Person)
RETURN colleagues.name as colleagues LIMIT 5
| colleagues |
|---|
|
|
|
|
|
Rows: 5 |
It is also possible to match a graph for patterns of a variable length.
The below query matches all Person nodes up to 3 "hops" away from Tom Hanks and returns the first five rows.
The DISTINCT operator ensures that the result contain no duplicate values.
MATCH (tom:Person {name:'Tom Hanks'})-[*1..3]-(colleagues:Person)
RETURN DISTINCT colleagues.name as colleagues LIMIT 5
| colleagues |
|---|
|
|
|
|
|
Rows: 5 |
To find the shortest possible path between two nodes, use the shortestPath algorithm.
For example, this query matches the shorest path in the graph between the two nodes Tom Hanks and Keanu Reeves:
MATCH p=shortestPath(
(keanu:Person {name:"Keanu Reeves"})-[*]-(tom:Person {name:"Tom Hanks"})
)
RETURN p
This is the returned graph:
It shows that Keanu Reeves ACTED_IN the Movie The Replacements, which was REVIEWED by the movie critic Jessica Thompson, who also REVIEWED the Movie The Da Vinci Code which Tom Hanks ACTED_IN.
Finding recommendations
Cypher allows for more complex queries.
The following query tries to recommend co-actors for Keanu Reeves, who he has yet to work with but who his co-actors have worked with.
The query then orders the results by how frequently a matched co-co-actor has collaborated with one of Keanu Reeves' co-actors.
MATCH (keanu:Person {name:'Keanu Reeves'})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(coActors:Person),
(coActors:Person)-[:ACTED_IN]->(m2:Movie)<-[:ACTED_IN]-(cocoActors:Person)
WHERE NOT (keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(cocoActors) AND keanu <> cocoActors
RETURN cocoActors.name AS Recommended, count(cocoActors) AS Strength ORDER BY Strength DESC LIMIT 7
| Recommended | Strength |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rows: 5 |
|
There are several connections between the Keanu Reeves and Tom Hanks nodes in the movie database, but the two have never worked together in a film.
The following query matches coactors who could introduce the two, by looking for co-actors who have worked with both of them in separate movies:
MATCH (keanu:Person {name: 'Keanu Reeves'})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(coActors:Person),
(coActors)-[:ACTED_IN]->(m2:Movie)<-[:ACTED_IN]-(tom:Person {name:'Tom Hanks'})
RETURN DISTINCT coActors.name
| coActors.name |
|---|
|
|
Rows: 2 |