Graph database concepts
Introduction
The guide covers graph database fundamentals.
Neo4j uses a property graph database model. A graph data structure consists of nodes (discrete objects) that can be connected by relationships. Below is the image of a graph with three nodes (the circles) and three relationships (the arrows).
The Neo4j property graph database model consists of:
-
Nodes describe entities (discrete objects) of a domain.
-
Nodes can have zero or more labels to define (classify) what kind of nodes they are.
-
Relationships describe a connection between a source node and a target node.
-
Relationships always have a direction (one direction).
-
Relationships must have a type (one type) to define (classify) what type of relationship they are.
-
Nodes and relationships can have properties (key-value pairs), which further describe them.
In mathematics, graph theory is the study of graphs. In graph theory:
|
Example graph
The example graph shown below introduces the basic concepts of the property graph:
To create the example graph, use the Cypher® clause CREATE
.
CREATE (:Person:Actor {name: 'Tom Hanks', born: 1956})-[:ACTED_IN {roles: ['Forrest']}]->(:Movie {title: 'Forrest Gump', released: 1994})<-[:DIRECTED]-(:Person {name: 'Robert Zemeckis', born: 1951})
Node
Nodes are used to represent entities (discrete objects) of a domain.
The simplest possible graph is a single node with no relationships. Consider the following graph, consisting of a single node.
The node labels are:
-
Person
-
Actor
The properties are:
-
name: Tom Hanks
-
born: 1956
The node can be created with Cypher using the query:
CREATE (:Person:Actor {name: 'Tom Hanks', born: 1956})
Node labels
Labels shape the domain by grouping (classifying) nodes into sets where all nodes with a certain label belong to the same set.
For example, all nodes representing users could be labeled with the label User
.
With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given name.
Since labels can be added and removed during runtime, they can also be used to mark temporary states for nodes.
A Suspended
label could be used to denote bank accounts that are suspended, and a Seasonal
label can denote vegetables that are currently in season.
A node can have zero to many labels.
In the example graph, the node labels, Person
, Actor
, and Movie
, are used to describe (classify) the nodes.
More labels can be added to express different dimensions of the data.
The following graph shows the use of multiple labels.
Relationship
A relationship describes how a connection between a source node and a target node are related. It is possible for a node to have a relationship to itself.
A relationship:
-
Connects a source node and a target node.
-
Has a direction (one direction).
-
Must have a type (one type) to define (classify) what type of relationship it is.
-
Can have properties (key-value pairs), which further describe the relationship.
Relationships organize nodes into structures, allowing a graph to resemble a list, a tree, a map, or a compound entity — any of which may be combined into yet more complex, richly inter-connected structures.
The relationship type: ACTED_IN
The properties are:
-
roles: ['Forrest']
-
performance: 5
The roles
property has an array value with a single item ('Forrest'
) in it.
The relationship can be created with Cypher using the query:
CREATE ()-[:ACTED_IN {roles: ['Forrest'], performance: 5}]->()
You must create or reference a source node and a target node to be able to create a relationship. |
Relationships always have a direction. However, the direction can be disregarded where it is not useful. This means that there is no need to add duplicate relationships in the opposite direction unless it is needed to describe the data model properly.
A node can have relationships to itself.
To express that Tom Hanks
KNOWS
himself would be expressed as:
Relationship type
A relationship must have exactly one relationship type.
Below is an ACTED_IN
relationship, with the Tom Hanks
node as the source node and Forrest Gump
as the target node.
Observe that the Tom Hanks
node has an outgoing relationship, while the Forrest Gump
node has an incoming relationship.
Properties
Properties are key-value pairs that are used for storing data on nodes and relationships.
The value part of a property:
-
Can hold different data types, such as
number
,string
, orboolean
. -
Can hold a homogeneous list (array) containing, for example, strings, numbers, or boolean values.
CREATE (:Example {a: 1, b: 3.14})
-
The property
a
has the typeinteger
with the value1
. -
The property
b
has the typefloat
with the value3.14
.
CREATE (:Example {c: 'This is an example string', d: true, e: false})
-
The property
c
has the typestring
with the value'This is an example string'
. -
The property
d
has the typeboolean
with the valuetrue
. -
The property
e
has the typeboolean
with the valuefalse
.
CREATE (:Example {f: [1, 2, 3], g: [2.71, 3.14], h: ['abc', 'example'], i: [true, true, false]})
-
The property
f
contains an array with the value[1, 2, 3]
. -
The property
g
contains an array with the value[2.71, 3.14]
. -
The property
h
contains an array with the value['abc', 'example']
. -
The property
i
contains an array with the value[true, true, false]
.
For a thorough description of the available data types, refer to the Cypher manual → Values and types. |
Traversals and paths
A traversal is how you query a graph in order to find answers to questions, for example: "What music do my friends like that I don’t yet own?", or "What web services are affected if this power supply goes down?".
Traversing a graph means visiting nodes by following relationships according to some rules. In most cases only a subset of the graph is visited.
To find out which movies Tom Hanks acted in according to the tiny example database, the traversal would start from the Tom Hanks
node, follow any ACTED_IN
relationships connected to the node, and end up with the Movie
node Forrest Gump
as the result (see the black lines):
The traversal result could be returned as a path with the length 1
:
The shortest possible path has length zero. It contains a single node and no relationships.
A path containing only a single node has the length of 0
.
A path containing one relationship has the length of 1
.
Schema
A schema in Neo4j refers to indexes and constraints.
Neo4j is often described as schema optional, meaning that it is not necessary to create indexes and constraints. You can create data — nodes, relationships and properties — without defining a schema up front. Indexes and constraints can be introduced when desired, in order to gain performance or modeling benefits.
Indexes
Indexes are used to increase performance. To see examples of how to work with indexes, see Using indexes. For detailed descriptions of how to work with indexes in Cypher, see Cypher Manual → Indexes.
Constraints
Constraints are used to make sure that the data adheres to the rules of the domain. To see examples of how to work with constraints, see Using constraints. For detailed descriptions of how to work with constraints in Cypher, see the Cypher manual → Constraints.
Naming conventions
Node labels, relationship types, and properties (the key part) are case sensitive, meaning, for example, that the property name
is different from the property Name
.
The following naming conventions are recommended:
Graph entity | Recommended style | Example |
---|---|---|
Node label |
Camel case, beginning with an upper-case character |
|
Relationship type |
Upper case, using underscore to separate words |
|
Property |
Lower camel case, beginning with a lower-case character |
|
For the precise naming rules, refer to the Cypher manual → Naming rules and recommendations.