Variable length patterns
Cypher® can be used to match patterns of a variable or an unknown length. Such patterns can be found using quantified path patterns and quantified relationships. This page also discusses how variables work when declared in quantified path patterns (group variables), and how to use predicates in quantified path patterns.
Quantified path patterns
This section considers how to match paths of varying length by using quantified path patterns, allowing you to search for paths whose lengths are unknown or within a specific range.
Quantified path patterns can be useful when, for example, searching for all nodes that can be reached from an anchor node, finding all paths connecting two nodes, or when traversing a hierarchy that may have differing depths.
This example uses a new graph:
To recreate the graph, run the following query against an empty Neo4j database:
CREATE (pmr:Station {name: 'Peckham Rye'}),
(dmk:Station {name: 'Denmark Hill'}),
(clp:Station {name: 'Clapham High Street'}),
(wwr:Station {name: 'Wandsworth Road'}),
(clj:Station {name: 'Clapham Junction'}),
(s1:Stop {arrives: time('17:19'), departs: time('17:20')}),
(s2:Stop {arrives: time('17:12'), departs: time('17:13')}),
(s3:Stop {arrives: time('17:10'), departs: time('17:11')}),
(s4:Stop {arrives: time('17:06'), departs: time('17:07')}),
(s5:Stop {arrives: time('16:58'), departs: time('17:01')}),
(s6:Stop {arrives: time('17:17'), departs: time('17:20')}),
(s7:Stop {arrives: time('17:08'), departs: time('17:10')}),
(clj)<-[:CALLS_AT]-(s1), (wwr)<-[:CALLS_AT]-(s2),
(clp)<-[:CALLS_AT]-(s3), (dmk)<-[:CALLS_AT]-(s4),
(pmr)<-[:CALLS_AT]-(s5), (clj)<-[:CALLS_AT]-(s6),
(dmk)<-[:CALLS_AT]-(s7),
(s5)-[:NEXT {distance: 1.2}]->(s4),(s4)-[:NEXT {distance: 0.34}]->(s3),
(s3)-[:NEXT {distance: 0.76}]->(s2), (s2)-[:NEXT {distance: 0.3}]->(s1),
(s7)-[:NEXT {distance: 1.4}]->(s6)
Each Stop
on a service CALLS_AT
one Station
.
Each Stop
has the properties arrives
and departs
that give the times the train is at the Station
.
Following the NEXT
relationship of a Stop
will give the next Stop
of the service.
For this example, a path pattern is constructed to match each of the services that allow passengers to travel from Denmark Hill
to Clapham Junction
.
The following shows the two paths that the path pattern should match:
The following motif represents a fixed-length path pattern that matches the service that departs from Denmark Hill
station at 17:07
:
To match the second train service, leaving Denmark Hill
at 17:10
, a shorter path pattern is needed:
Translating the motifs into Cypher, and adding predicates to match the origin and destination Stations
, yields the following two path patterns respectively:
(:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(:Stop)
-[:NEXT]->(:Stop)
-[:NEXT]->(:Stop)
-[:NEXT]->(:Stop)-[:CALLS_AT]->
(:Station { name: 'Clapham Junction' })
(:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(:Stop)
-[:NEXT]->(:Stop)-[:CALLS_AT]->
(:Station { name: 'Clapham Junction' })
To return both solutions in the same query using these fixed-length path patterns, a UNION of two MATCH
statements would be needed.
For example, the following query returns the departure
of the two services:
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
-[:NEXT]->(:Stop)
-[:NEXT]->(:Stop)
-[:NEXT]->(a:Stop)-[:CALLS_AT]->
(:Station { name: 'Clapham Junction' })
RETURN d.departs AS departureTime, a.arrives AS arrivalTime
UNION
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
-[:NEXT]->(a:Stop)-[:CALLS_AT]->
(:Station { name: 'Clapham Junction' })
RETURN d.departs AS departureTime, a.arrives AS arrivalTime
departureTime | arrivalTime |
---|---|
|
|
|
|
Rows: 2 |
The problem with this solution is that not only is it verbose, it can only be used where the lengths of the target paths are known in advance.
Quantified path patterns solve this problem by extracting repeating parts of a path pattern into parentheses and applying a quantifier.
That quantifier specifies a range of possible repetitions of the extracted pattern to match on.
For the current example, the first step is identifying the repeating pattern, which in this case is the sequence of alternating Stop
nodes and NEXT
relationships, representing one segment of a Service
:
(:Stop)-[:NEXT]->(:Stop)
The shortest path has one instance of this pattern, the longest three.
So the quantifier applied to the wrapper parentheses is the range one to three, expressed as {1,3}
:
((:Stop)-[:NEXT]->(:Stop)){1,3}
This also includes repetitions of two, but in this case this repetition will not return matches. To understand the semantics of this pattern, it helps to work through the expansion of the repetitions. Here are the three repetitions specified by the quantifier, combined into a union of path patterns:
(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)(:Stop)-[:NEXT]->(:Stop)(:Stop)-[:NEXT]->(:Stop)
The union operator (|
) and placing node patterns next to each other are used here for illustration only; using it this way is not part of Cypher syntax.
Where two node patterns are next to each other in the expansion above, they must necessarily match the same node: the next segment of a Service
starts where the previous segment ends.
As such they can be rewritten as a single node pattern with any filtering condition combined conjunctively.
In this example this is trivial, because the filtering applied to those nodes is just the label Stop
:
With this, the union of path patterns simplifies to:
(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)-[:NEXT]->(:Stop)-[:NEXT]->(:Stop)
The segments of the original path pattern that connect the Stations
to the Stops
can also be rewritten.
Here is what those segments look like when concatenated with the first repetition:
(:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(:Stop)
(:Stop)-[:NEXT]->(:Stop)
(:Stop)-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
The original MATCH
clause now has the following three parts:
Translating the union of fixed-length path patterns into a quantified path pattern results in a pattern that will return the correct paths.
The following query adds a RETURN
clause that yields the departure and arrival times of the two services:
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
((:Stop)-[:NEXT]->(:Stop)){1,3}
(a:Stop)-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
RETURN d.departs AS departureTime, a.arrives AS arrivalTime
departureTime | arrivalTime |
---|---|
|
|
|
|
Rows: 2 |
Quantified relationships
Quantified relationships allow some simple quantified path patterns to be re-written in a more succinct way.
Continuing with the example of Stations
and Stops
from the previous section, consider the following query:
MATCH (d:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(n:Stop)
((:Stop)-[:NEXT]->(:Stop)){1,10}
(m:Stop)-[:CALLS_AT]->(a:Station { name: 'Clapham Junction' })
WHERE m.arrives < time('17:18')
RETURN n.departs AS departureTime
If the relationship NEXT
only connects Stop
nodes, the :Stop
label expressions can be removed:
MATCH (d:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(n:Stop)
(()-[:NEXT]->()){1,10}
(m:Stop)-[:CALLS_AT]->(a:Station { name: 'Clapham Junction' })
WHERE m.arrives < time('17:18')
RETURN n.departs AS departureTime
When the quantified path pattern has one relationship pattern, it can be abbreviated to a quantified relationship. A quantified relationship is a relationship pattern with a postfix quantifier. Below is the previous query rewritten with a quantified relationship:
MATCH (d:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-
(n:Stop)-[:NEXT]->{1,10}(m:Stop)-[:CALLS_AT]->
(a:Station { name: 'Clapham Junction' })
WHERE m.arrives < time('17:18')
RETURN n.departs AS departureTime
The scope of the quantifier {1,10}
is the relationship pattern -[:NEXT]->
and not the node patterns abutting it.
More generally, where a path pattern contained in a quantified path pattern has the following form:
(() <relationship pattern> ()) <quantifier>
then it can be re-written as follows:
<relationship pattern> <quantifier>
Prior to the introduction of quantified path patterns and quantified relationships in Neo4j 5.9, the only method in Cypher to match paths of a variable length was through variable-length relationships. This syntax is still available but it is not GQL conformant. It is very similar to the syntax for quantified relationships, with the following differences:
For more information, see the reference section on variable-length relationships. |
Group variables
This section uses the example of Stations
and Stops
used in the previous section, but with an additional property distance
added to the NEXT
relationships:
As the name suggests, this property represents the distance between two Stops
.
To return the total distance for each service connecting a pair of Stations
, a variable referencing each of the relationships traversed is needed.
Similarly, to extract the departs
and arrives
properties of each Stop
, variables referencing each of the nodes traversed is required.
In this example of matching services between Denmark Hill
and Clapham Junction
, the variables l
and m
are declared to match the Stops
and r
is declared to match the relationships.
The variable origin only matches the first Stop
in the path:
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(origin)
((l)-[r:NEXT]->(m)){1,3}
()-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
Variables that are declared inside quantified path patterns are known as group variables.
They are so called because, when referred outside of the quantified path pattern, they are lists of the nodes or relationships they are bound to in the match.
To understand how to think about the way group variables are bound to nodes or relationships, it helps to expand the quantified path pattern, and observe how the different variables match to the elements of the overall matched path.
Here the three different expansions for each value in the range given by the quantifier {1,3}
:
(l1)-[r1:NEXT]->(m1) |
(l1)-[r1:NEXT]->(m1)(l2)-[r2:NEXT]->(m2) |
(l1)-[r1:NEXT]->(m1)(l2)-[r2:NEXT]->(m2)(l3)-[r3:NEXT]->(m3)
The subscript of each variable indicates which instance of the path pattern repetition they belong to.
The following diagram shows the variable bindings of the path pattern with three repetitions, which matches the service that departs Denmark Hill
at 17:07
.
It traces the node or relationship that each indexed variable is bound to.
Note that the index increases from right to left as the path starts at Denmark Hill
:
For this matched path, the group variables have the following bindings:
l => [n2, n3, n4]
r => [r2, r3, r4]
m => [n3, n4, n5]
The second solution is the following path:
The following table shows the bindings for both matches, including the variable origin.
In contrast to the group variables, origin
is a singleton variable due to being declared outside the quantification.
Singleton variables bind at most to one node or relationship.
origin | l | r | m |
---|---|---|---|
|
|
|
|
|
|
|
|
Returning to the original goal, which was to return the sequence of depart times for the Stops
and the total distance of each service, the final query exploits the compatibility of group variables with list comprehensions and list functions such as reduce():
MATCH (:Station {name: 'Denmark Hill'})<-[:CALLS_AT]-(origin)
((l)-[r:NEXT]->(m)){1,3}
()-[:CALLS_AT]->(:Station {name: 'Clapham Junction'})
RETURN origin.departs + [stop in m | stop.departs] AS departureTimes,
reduce(acc = 0.0, next in r | round(acc + next.distance, 2)) AS totalDistance
departureTimes | totalDistance |
---|---|
|
|
|
|
Rows: 2 |
Predicates in quantified path patterns
One of the pitfalls of quantified path patterns is that, depending on the graph, they can end up matching very large numbers of paths, resulting in a slow query performance. This is especially true when searching for paths with a large maximum length or when the pattern is too general. However, by using inline predicates that specify precisely which nodes and relationships should be included in the results, unwanted results will be pruned as the graph is traversed.
Here are some examples of the types of constraints you can impose on quantified path pattern traversals:
-
Nodes must have certain combinations of labels. For example, all nodes must be an
Employee
, but not aContractor
. -
Relationships must have certain types. For example, all relationships in the path must be of type
EMPLOYED_BY
. -
Nodes or relationships must have properties satisfying some condition. For example, all relationships must have the property
distance > 10
.
To demonstrate the utility of predicates in quantified path patterns, this section considers an example of finding the shortest path by physical distance and compares that to the results yielded by using the SHORTEST
keyword.
The graph in this example continues with Station
nodes, but adds both a geospatial location
property to the Stations
, as well as LINK
relationships with a distance
property representing the distance between pairs of Stations
:
To recreate the graph, run the following query against an empty Neo4j database:
CREATE (lbg:Station {name: "London Bridge"}),
(bfr:Station {name: "London Blackfriars"}),
(eph:Station {name: "Elephant & Castle"}),
(dmk:Station {name: "Denmark Hill"}),
(pmr:Station {name: "Peckham Rye"}),
(qrp:Station {name: "Queens Rd Peckham"}),
(sbm:Station {name: "South Bermondsey"}),
(lgj:Station {name: "Loughborough Jn"}),
(hnh:Station {name: "Herne Hill"}),
(tuh:Station {name: "Tulse Hill"}),
(ndl:Station {name: "North Dulwich"}),
(edw:Station {name: "East Dulwich"}),
(brx:Station {name: "Brixton"})
SET lbg.location = point({longitude: -0.08609, latitude: 51.50502}),
bfr.location = point({longitude: -0.10333, latitude: 51.51181}),
eph.location = point({longitude: -0.09873, latitude: 51.49403}),
dmk.location = point({longitude: -0.08936, latitude: 51.46820}),
pmr.location = point({longitude: -0.06941, latitude: 51.47003}),
qrp.location = point({longitude: -0.05731, latitude: 51.47357}),
sbm.location = point({longitude: -0.05468, latitude: 51.48814}),
lgj.location = point({longitude: -0.10218, latitude: 51.46630}),
hnh.location = point({longitude: -0.10229, latitude: 51.45331}),
tuh.location = point({longitude: -0.10508, latitude: 51.43986}),
ndl.location = point({longitude: -0.08792, latitude: 51.45451}),
edw.location = point({longitude: -0.08057, latitude: 51.46149}),
brx.location = point({longitude: -0.11418, latitude: 51.46330})
CREATE (lbg)<-[:LINK {distance: 1.13}]-(bfr),
(bfr)<-[:LINK {distance: 1.21}]-(eph),
(eph)-[:LINK {distance: 2.6}]->(dmk),
(dmk)-[:LINK {distance: 0.86}]->(pmr),
(pmr)-[:LINK {distance: 0.71}]->(qrp),
(qrp)<-[:LINK {distance: 0.95}]-(sbm),
(sbm)<-[:LINK {distance: 1.8}]-(lbg),
(lgj)-[:LINK {distance: 0.88}]->(hnh),
(hnh)-[:LINK {distance: 1.08}]->(tuh),
(tuh)<-[:LINK {distance: 1.29}]-(ndl),
(ndl)-[:LINK {distance: 0.53}]->(edw),
(edw)-[:LINK {distance: 0.84}]->(pmr),
(eph)-[:LINK {distance: 2.01}]->(lgj),
(dmk)-[:LINK {distance: 1.11}]->(brx),
(brx)-[:LINK {distance: 0.51}]->(hnh)
The following query finds the path length and total distance for ALL SHORTEST
paths between London Blackfriars
to North Dulwich
:
MATCH (bfr:Station {name: 'London Blackfriars'}),
(ndl:Station {name: 'North Dulwich'})
MATCH p = ALL SHORTEST (bfr)-[:LINK]-+(ndl)
RETURN [n in nodes(p) | n.name] AS stops,
length(p) as stopCount,
reduce(acc = 0, r in relationships(p) | round(acc + r.distance, 2)) AS distance
stops | stopCount | distance |
---|---|---|
|
|
|
|
|
|
Rows: 2 |
ALL SHORTEST
finds all shortest paths by number of hops, and as the result shows, there are two paths in the graph tied for the shortest path.
Whether any of these paths corresponds to the shortest path by distance can be checked by looking at each path between the two end Stations
and returning the first result after ordering by distance
:
MATCH (bfr:Station {name: 'London Blackfriars'}),
(ndl:Station {name: 'North Dulwich'})
MATCH p = (bfr)-[:LINK]-+(ndl)
RETURN reduce(acc = 0, r in relationships(p) | round(acc + r.distance, 2))
AS distance
ORDER BY distance LIMIT 1
distance |
---|
|
Rows: 1 |
This shows that there is a route with a shorter distance than either of the paths with fewer Stations
returned using ALL SHORTEST
.
But to get this result, the query had to first find all paths from London Blackfriars
to North Dulwich
before it could select the shortest one.
The following query shows the number of possible paths:
MATCH (bfr:Station {name: 'London Blackfriars'}),
(ndl:Station {name: 'North Dulwich'})
MATCH p = (bfr)-[:LINK]-+(ndl)
RETURN count(*) AS numPaths
numPaths |
---|
|
Rows: 1 |
For a small dataset like this, finding all the paths will be fast. But as the size of the graph grows, the execution time will increase exponentially. For a real dataset, such as the entire rail network of the UK, it might be unacceptably long.
One approach to avoid the exponential explosion in paths is to put a finite upper bound to the quantified path pattern (e.g. {,10}
) to limit the amount of path iterations returned.
This works fine where the solution is known to lie within some range of hops.
But in cases where this is not known, one alternative would be to make the pattern more specific by, for example, adding node labels, or by specifying a relationship direction.
Another alternative would be to add an inline predicate to the quantified path pattern.
In this example, an inline predicate can be added that takes advantage of the geospatial location
property of the Stations
: for each pair of Stations
on the path, the second Station
will be closer to the endpoint (not always true, but is assumed here to keep the example simple).
To compose the predicate, the point.distance() function is used to compare the distance between the left-hand Station
(a
) and the right-hand Station
(b
) for each node-pair along the path to the destination North Dulwich
:
MATCH (bfr:Station {name: "London Blackfriars"}),
(ndl:Station {name: "North Dulwich"})
MATCH p = (bfr)
((a)-[:LINK]-(b:Station)
WHERE point.distance(a.location, ndl.location) >
point.distance(b.location, ndl.location))+ (ndl)
RETURN reduce(acc = 0, r in relationships(p) | round(acc + r.distance, 2))
AS distance
distance |
---|
|
Rows: 1 |
This query avoids having to find all possible paths and then imposing a LIMIT 1
to find the shortest one by distance.
It also shows that there is only one path to solving the query (a number that remains constant even if the data from the rest of the UK railway network was included).
Using inline predicates or making quantified path patterns more specific where possible can thus greatly improve query performance.