Shortest paths
The Cypher® keyword SHORTEST
is used to find variations of the shortest paths between nodes.
This includes the ability to look for the shortest, second-shortest (and so on) paths, all available shortest paths, and groups of paths containing the same pattern length.
The ANY
keyword, which can be used to test the reachability of nodes from a given node(s), is also explained, as is how to apply filters in queries using SHORTEST
.
SHORTEST functionally replaces and extends the shortestPath() and allShortestPaths() functions.
Both functions can still be used, but they are not GQL conformant.
For more information, see Syntax and semantics → The shortestPath() and allShortestPaths() functions.
|
Note on Cypher and GDS shortest paths
Both Cypher and Neo4j´s Graph Data Science (GDS) library can be used to find variations of the shortest paths between nodes.
Use Cypher if:
-
You need to specify complex graph navigation via quantified path patterns.
-
Creating a graph projection takes too long.
-
GDS is not available in your instance, or the size of the GDS projection is too large for your instance.
Use GDS if:
To read more about the shortest path algorithms included in the GDS library, see GDS Graph algorithms → Path finding.
SHORTEST k
This section uses the following graph:
To recreate it, run the following query against an empty Neo4j database:
CREATE (asc:Station {name:"Ashchurch"}),
(bmv:Station {name:"Bromsgrove"}),
(cnm:Station {name:"Cheltenham Spa"}),
(dtw:Station {name:"Droitwich Spa"}),
(hby:Station {name:"Hartlebury"}),
(psh:Station {name:"Pershore"}),
(wop:Station {name:"Worcestershire Parkway"}),
(wof:Station {name:"Worcester Foregate Street"}),
(wos:Station {name:"Worcester Shrub Hill"})
CREATE (asc)-[:LINK {distance: 7.25}]->(cnm),
(asc)-[:LINK {distance: 11.29}]->(wop),
(asc)-[:LINK {distance: 14.75}]->(wos),
(bmv)-[:LINK {distance: 31.14}]->(cnm),
(bmv)-[:LINK {distance: 6.16}]->(dtw),
(bmv)-[:LINK {distance: 12.6}]->(wop),
(dtw)-[:LINK {distance: 5.64}]->(hby),
(dtw)-[:LINK {distance: 6.03}]->(wof),
(dtw)-[:LINK {distance: 5.76}]->(wos),
(psh)-[:LINK {distance: 4.16}]->(wop),
(wop)-[:LINK {distance: 3.71}]->(wos),
(wof)-[:LINK {distance: 0.65}]->(wos)
The paths matched by a path pattern can be restricted to only the shortest (by number of hops) by including the keyword SHORTEST k
, where k
is the number of paths to match.
For example, the following example uses SHORTEST 1
to return the length of the shortest path between Worcester Shrub Hill
and Bromsgrove
:
MATCH p = SHORTEST 1 (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN length(p) AS result
Note that this and the following examples in this section use a quantified relationship -[:LINK]-+ , which is composed of a relationship pattern -[:LINK]- and a postfix quantifier + . The relationship pattern is only concerned with following relationships with type LINK , and will otherwise traverse any node along the way. There is no arrowhead < or > on the relationship pattern, allowing the pattern to match relationships going in either direction. This represents the fact that trains can go in both directions along the LINK relationships between Stations. The + quantifier means that one or more relationships should be matched. For more information, see Syntax and semantics - quantified relationships.
|
result |
---|
|
Rows: 1 |
Although the query returned a single result, there are in fact two paths that are tied for shortest:
Because 1
was specified in SHORTEST
, only one of the paths is returned.
Which one is returned is non-deterministic.
If instead SHORTEST 2
is specified, all shortest paths in this example would be returned, and the result would be deterministic:
MATCH p = SHORTEST 2 (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops
stops |
---|
|
|
Rows: 2 |
Increasing the number of paths will return the next shortest paths. Three paths are tied for the second shortest:
The following query returns all three of the second shortest paths, along with the two shortest paths:
MATCH p = SHORTEST 5 (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops
stops |
---|
|
|
|
|
|
Rows: 5 |
If there had been only four possible paths between the two Stations, then only those four would have been returned.
ALL SHORTEST
To return all paths that are tied for shortest length, use the keywords ALL SHORTEST
:
MATCH p = ALL SHORTEST (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops
stops |
---|
|
|
Rows: 2 |
SHORTEST k GROUPS
To return all paths that are tied for first, second, and so on up to the kth shortest length, use SHORTEST k GROUPS
.
For example, the following returns the first and second shortest length paths between Worcester Shrub Hill
and Bromsgrove
:
MATCH p = SHORTEST 2 GROUPS (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops, length(p) AS pathLength
stops | pathLength |
---|---|
|
|
|
|
|
|
|
|
|
|
Rows: 5 |
The first group includes the two shortest paths with pathLength = 2
(as seen in the first two rows of the results), and the second group includes the three second shortest paths with pathLength = 3
(as seen in the last three rows of the results).
If more groups are specified than exist in the graph, only those paths that exist are returned.
For example, if the paths equal to one of the eight shortest paths are specified for Worcester Shrub Hill
to Bromsgrove
, only seven groups are returned:
MATCH p = SHORTEST 8 GROUPS (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN length(p) AS pathLength, count(*) AS numPaths
pathLength | numPaths |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rows: 7 |
ANY
The ANY
keyword can be used to test the reachability of nodes from a given node(s).
It returns the same as SHORTEST 1
, but by using the ANY
keyword the intent of the query is clearer.
For example, the following query shows that there exists a route from Pershore
to Bromsgrove
where the distance between each pair of stations is less than 10 miles:
MATCH path = ANY
(:Station {name: 'Pershore'})-[l:LINK WHERE l.distance < 10]-+(b:Station {name: 'Bromsgrove'})
RETURN [r IN relationships(path) | r.distance] AS distances
distances |
---|
|
Rows: 1 |
Partitions
When there are multiple start or end nodes matching a path pattern, the matches are partitioned into distinct pairs of start and end nodes prior to selecting the shortest paths; a partition is one distinct pair of start node and end node. The selection of shortest paths is then done from all paths that join the start and end node of a given partition. The results are then formed from the union of all the shortest paths found for each partition.
For example, if the start nodes of matches are bound to either Droitwich Spa
or Hartlebury
, and the end nodes are bound to either Ashchurch
or Cheltenham Spa
, there will be four distinct pairs of start and end nodes, and therefore four partitions:
Start node | End node |
---|---|
|
|
|
|
|
|
|
|
The following query illustrates how these partitions define the sets of results within which the shortest paths are selected.
It uses a pair of UNWIND
clauses to generate a Cartesian product of the names of the Stations
(all possible pairs of start node and end node), followed by the MATCH
clause to find the shortest two groups of paths for each pair of distinct start and end Stations
:
UNWIND ["Droitwich Spa", "Hartlebury"] AS a
UNWIND ["Ashchurch", "Cheltenham Spa"] AS b
MATCH SHORTEST 2 GROUPS (o:Station {name: a})-[l]-+(d:Station {name: b})
RETURN o.name AS start, d.name AS end,
size(l) AS pathLength, count(*) AS numPaths
ORDER BY start, end, pathLength
start | end | pathLength | numPaths |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rows: 8 |
Each partition appears twice: once for the group of shortest paths and once for the group of second shortest paths.
For example, for the partition of Droitwich Spa
as the start
and Ashchurch
as the end
, the shortest path group (paths with length 2
) has one path, and the second shortest path group (paths with length 3
) has four paths.
Pre-filters and post-filters
The position of a filter in a shortest path query will affect whether it is applied before or after selecting the shortest paths.
To see the difference, first consider a query that returns the shortest path from Hartlebury
to Cheltenham Spa
:
MATCH SHORTEST 1
(:Station {name: 'Hartlebury'})
(()--(n))+
(:Station {name: 'Cheltenham Spa'})
RETURN [stop in n[..-1] | stop.name] AS stops
stops |
---|
|
Rows: 1 |
Note that n[..-1]
is a slicing operation that returns all elements of n
except the last.
If instead, the query uses a WHERE
clause at the MATCH
level to filter out routes that go via Bromsgrove, the filtering is applied after the shortest paths are selected.
This results in the only solution being removed, and no results being returned:
MATCH SHORTEST 1
(:Station {name: 'Hartlebury'})
(()--(n:Station))+
(:Station {name: 'Cheltenham Spa'})
WHERE none(stop IN n[..-1] WHERE stop.name = 'Bromsgrove')
RETURN [stop in n[..-1] | stop.name] AS stops
stops |
---|
Rows: 0 |
There are two ways to turn a post-filter without solutions into a pre-filter that returns solutions. One is to inline the predicate into the path pattern:
MATCH SHORTEST 1
(:Station {name: 'Hartlebury'})
(()--(n:Station WHERE n.name <> 'Bromsgrove'))+
(:Station {name: 'Cheltenham Spa'})
RETURN [stop in n[..-1] | stop.name] AS stops
stops |
---|
|
Rows: 1 |
The shortest journey that avoids Bromsgrove
is now returned.
An alternative is to wrap the path pattern and filter in parentheses (leaving the SHORTEST
keyword on the outside):
MATCH SHORTEST 1
( (:Station {name: 'Hartlebury'})
(()--(n:Station))+
(:Station {name: 'Cheltenham Spa'})
WHERE none(stop IN n[..-1] WHERE stop.name = 'Bromsgrove') )
RETURN [stop in n[..-1] | stop.name] AS stops
stops |
---|
|
Rows: 1 |
Pre-filter with a path variable
The previous section showed how to apply a filter before the shortest path selection by the use of parentheses. Placing a path variable declaration before the shortest path keywords, however, places it outside the scope of the parentheses. To reference a path variable in a pre-filter, it has to be declared inside the parentheses.
To illustrate, consider this example that returns all shortest paths from Hartlebury
to each of the other Stations
:
MATCH p = SHORTEST 1 (:Station {name: 'Hartlebury'})--+(b:Station)
RETURN b.name AS destination, length(p) AS pathLength
ORDER BY pathLength, destination
destination | pathLength |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rows: 8 |
If the query is altered to only include routes that have an even number of stops, adding a WHERE
clause at the MATCH
level will not work, because it would be a post-filter.
It would return the results of the previous query with all routes with an odd number of stops removed:
MATCH p = SHORTEST 1 (:Station {name: 'Hartlebury'})--+(b:Station)
WHERE length(p) % 2 = 0
RETURN b.name AS destination, length(p) AS pathLength
ORDER BY pathLength, destination
destination | pathLength |
---|---|
|
|
|
|
|
|
|
|
Rows: 4 |
To move the predicate to a pre-filter, the path variable should be referenced from within the parentheses, and the shortest routes with an even number of stops will be returned for all the destinations:
MATCH SHORTEST 1
(p = (:Station {name: 'Hartlebury'})--+(b:Station)
WHERE length(p) % 2 = 0 )
RETURN b.name AS destination, length(p) AS pathLength
ORDER BY pathLength, destination
destination | pathLength |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rows: 8 |
Planning shortest path queries
This section describes the operators used when planning shortest path queries. For readers not familiar with Cypher execution plans and operators, it is recommended to first read the section Understanding execution plans.
There are two operators used to plan SHORTEST
queries:
-
StatefulShortestPath(All)
- uses a unidirectional breadth-first search algorithm to find shortest paths from a previously matched start node to an end node that has not yet been matched. -
StatefulShortestPath(Into)
- uses a bidirectional breadth-first search (BFS) algorithm, where two simultaneous BFS invocations are performed, one from the left boundary node and one from the right boundary node.
StatefulShortestPath(Into)
is used by the planner when both boundary nodes in the shortest path are estimated to match at most one node each.
Otherwise, StatefulShortestPath(All)
is used.
For example, the planner estimates that the left boundary node in the below query will match one node, and the right boundary node will match five nodes,
and chooses to expand from the left boundary node. Using StatefulShortestPath(Into)
would require five bidirectional breadth-first search (BFS) invocations,
whereas StatefulShortestPath(All)
would require only one unidirectional BFS invocation.
As a result, the query will use StatefulShortestPath(All)
.
StatefulShortestPath(All)
PROFILE
MATCH
p = SHORTEST 1 (a:Station {name: "Worcestershire Parkway"})(()-[]-()-[]-()){1,}(b:Station)
RETURN p
+----------------------------+----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+ | Operator | Id | Details | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline | +----------------------------+----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+ | +ProduceResults | 0 | p | 5 | 9 | 122 | 0 | 0/0 | 10.967 | | | | +----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+ | | +Projection | 1 | (a) ((anon_12)-[anon_14]-(anon_13)-[anon_11]-())* (b) AS p | 5 | 9 | 0 | | 0/0 | 0.063 | | | | +----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+ | | +StatefulShortestPath(All) | 2 | SHORTEST 1 (a) ((`anon_5`)-[`anon_6`]-(`anon_7`)-[`anon_8`]-(`anon_9`)){1, } (b) | 5 | 9 | 80 | 18927 | 0/0 | 1.071 | In Pipeline 1 | | | | | expanding from: a | | | | | | | | | | | | inlined predicates: b:Station | | | | | | | | | | +----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+ | +Filter | 3 | a.name = $autostring_0 | 1 | 1 | 18 | | | | | | | +----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+ | | | | +NodeByLabelScan | 4 | a:Station | 10 | 9 | 10 | 376 | 3/0 | 0.811 | Fused in Pipeline 0 | +----------------------------+----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
However, the heuristic to favor StatefulShortestPath(All)
can lead to worse query performance.
To have the planner choose the StatefulShortestPath(Into)
instead, rewrite the query using a CALL
subquery, which will execute once for each incoming row.
For example, in the below query, using a CALL
subquery ensures that the planner binds a
and b
to exactly one Station
node respectively for each executed row, and this forces it to use StatefulShortestPath(Into)
for each invocation of the CALL
subquery, since a precondition of using this operator is that both boundary nodes match exactly one node each.
The below query uses a variable scope clause (introduced in Neo4j 5.23) to import variables into the CALL subquery.
If you are using an older version of Neo4j, use an importing WITH clause instead.
|
StatefulShortestPath(Into)
PROFILE
MATCH
(a:Station {name: "Worcestershire Parkway"}),
(b:Station)
CALL (a, b) {
MATCH
p = SHORTEST 1 (a)(()-[]-()-[]-()){1,}(b)
RETURN p
}
RETURN p
+-----------------------------+----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+ | Operator | Id | Details | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline | +-----------------------------+----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+ | +ProduceResults | 0 | p | 5 | 9 | 122 | 0 | 0/0 | 0.561 | | | | +----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+ | | +Projection | 1 | (a) ((anon_12)-[anon_14]-(anon_13)-[anon_11]-())* (b) AS p | 5 | 9 | 0 | | 0/0 | 0.060 | | | | +----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+ | | +StatefulShortestPath(Into) | 2 | SHORTEST 1 (a) ((`anon_5`)-[`anon_6`]-(`anon_7`)-[`anon_8`]-(`anon_9`)){1, } (b) | 5 | 9 | 176 | 17873 | 0/0 | 2.273 | In Pipeline 3 | | | +----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+ | +CartesianProduct | 3 | | 5 | 9 | 0 | 2056 | 0/0 | 0.048 | In Pipeline 2 | | |\ +----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+ | | +NodeByLabelScan | 4 | b:Station | 10 | 9 | 10 | 392 | 1/0 | 0.023 | In Pipeline 1 | | | +----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+ | +Filter | 5 | a.name = $autostring_0 | 1 | 1 | 18 | | | | | | | +----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+ | | | | +NodeByLabelScan | 6 | a:Station | 10 | 9 | 10 | 376 | 3/0 | 0.089 | Fused in Pipeline 0 | +-----------------------------+----+----------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
Sometimes the planner cannot make reliable estimations about how many nodes a pattern node will match. Consider using a property uniqueness constraint where applicable to help the planner get more reliable estimates. |