Shortest paths

The Cypher® keyword SHORTEST is used to find variations of the shortest paths between nodes. This includes the ability to look for the shortest, second-shortest (and so on) paths, all available shortest paths, and groups of paths containing the same pattern length. The ANY keyword, which can be used to test the reachability of nodes from a given node(s), is also explained, as is how to apply filters in queries using SHORTEST.

SHORTEST functionally replaces and extends the shortestPath() and allShortestPaths() functions. Both functions can still be used, but they are not GQL conformant. For more information, see Syntax and semantics → The shortestPath() and allShortestPaths() functions.

Note on Cypher and GDS shortest paths

Both Cypher and Neo4j´s Graph Data Science (GDS) library can be used to find variations of the shortest paths between nodes.

Use Cypher if:

  • You need to specify complex graph navigation via quantified path patterns.

  • Creating a graph projection takes too long.

  • GDS is not available in your instance, or the size of the GDS projection is too large for your instance.

Use GDS if:

  • You need to compute a weighted shortest path.

  • You need a specific algorithm like A* or Yen’s.

  • You need to transform the graph with a projection before finding shortest path.

  • You need to use shortest paths in conjunction with other GDS algorithms in the pipeline.

To read more about the shortest path algorithms included in the GDS library, see GDS Graph algorithms → Path finding.

SHORTEST k

This section uses the following graph:

patterns shortest graph

To recreate it, run the following query against an empty Neo4j database:

CREATE (asc:Station {name:"Ashchurch"}),
  (bmv:Station {name:"Bromsgrove"}),
  (cnm:Station {name:"Cheltenham Spa"}),
  (dtw:Station {name:"Droitwich Spa"}),
  (hby:Station {name:"Hartlebury"}),
  (psh:Station {name:"Pershore"}),
  (wop:Station {name:"Worcestershire Parkway"}),
  (wof:Station {name:"Worcester Foregate Street"}),
  (wos:Station {name:"Worcester Shrub Hill"})
CREATE (asc)-[:LINK {distance: 7.25}]->(cnm),
  (asc)-[:LINK {distance: 11.29}]->(wop),
  (asc)-[:LINK {distance: 14.75}]->(wos),
  (bmv)-[:LINK {distance: 31.14}]->(cnm),
  (bmv)-[:LINK {distance: 6.16}]->(dtw),
  (bmv)-[:LINK {distance: 12.6}]->(wop),
  (dtw)-[:LINK {distance: 5.64}]->(hby),
  (dtw)-[:LINK {distance: 6.03}]->(wof),
  (dtw)-[:LINK {distance: 5.76}]->(wos),
  (psh)-[:LINK {distance: 4.16}]->(wop),
  (wop)-[:LINK {distance: 3.71}]->(wos),
  (wof)-[:LINK {distance: 0.65}]->(wos)

The paths matched by a path pattern can be restricted to only the shortest (by number of hops) by including the keyword SHORTEST k, where k is the number of paths to match. For example, the following example uses SHORTEST 1 to return the length of the shortest path between Worcester Shrub Hill and Bromsgrove:

Query
MATCH p = SHORTEST 1 (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN length(p) AS result
Note that this and the following examples in this section use a quantified relationship -[:LINK]-+, which is composed of a relationship pattern -[:LINK]- and a postfix quantifier +. The relationship pattern is only concerned with following relationships with type LINK, and will otherwise traverse any node along the way. There is no arrowhead < or > on the relationship pattern, allowing the pattern to match relationships going in either direction. This represents the fact that trains can go in both directions along the LINK relationships between Stations. The + quantifier means that one or more relationships should be matched. For more information, see Syntax and semantics - quantified relationships.
Table 1. Result
result

2

Rows: 1

Although the query returned a single result, there are in fact two paths that are tied for shortest:

patterns shortest tie

Because 1 was specified in SHORTEST, only one of the paths is returned. Which one is returned is non-deterministic.

If instead SHORTEST 2 is specified, all shortest paths in this example would be returned, and the result would be deterministic:

Query
MATCH p = SHORTEST 2 (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops
Table 2. Result
stops

["Worcester Shrub Hill", "Droitwich Spa", "Bromsgrove"]

["Worcester Shrub Hill", "Worcestershire Parkway", "Bromsgrove"]

Rows: 2

Increasing the number of paths will return the next shortest paths. Three paths are tied for the second shortest:

patterns second shortest paths

The following query returns all three of the second shortest paths, along with the two shortest paths:

Query
MATCH p = SHORTEST 5 (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops
Table 3. Result
stops

["Worcester Shrub Hill", "Droitwich Spa", "Bromsgrove"]

["Worcester Shrub Hill", "Worcestershire Parkway", "Bromsgrove"]

["Worcester Shrub Hill", "Worcester Foregate Street", "Droitwich Spa", "Bromsgrove"]

["Worcester Shrub Hill", "Ashchurch", "Worcestershire Parkway", "Bromsgrove"]

["Worcester Shrub Hill", "Ashchurch", "Cheltenham Spa", "Bromsgrove"]

Rows: 5

If there had been only four possible paths between the two Stations, then only those four would have been returned.

ALL SHORTEST

To return all paths that are tied for shortest length, use the keywords ALL SHORTEST:

Query
MATCH p = ALL SHORTEST (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops
Table 4. Result
stops

["Worcester Shrub Hill", "Droitwich Spa", "Bromsgrove"]

["Worcester Shrub Hill", "Worcestershire Parkway", "Bromsgrove"]

Rows: 2

SHORTEST k GROUPS

To return all paths that are tied for first, second, and so on up to the kth shortest length, use SHORTEST k GROUPS. For example, the following returns the first and second shortest length paths between Worcester Shrub Hill and Bromsgrove:

Query
MATCH p = SHORTEST 2 GROUPS (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops, length(p) AS pathLength
Table 5. Result
stops pathLength

["Worcester Shrub Hill", "Droitwich Spa", "Bromsgrove"]

2

["Worcester Shrub Hill", "Worcestershire Parkway", "Bromsgrove"]

2

["Worcester Shrub Hill", "Worcester Foregate Street", "Droitwich Spa", "Bromsgrove"]

3

["Worcester Shrub Hill", "Ashchurch", "Worcestershire Parkway", "Bromsgrove"]

3

["Worcester Shrub Hill", "Ashchurch", "Cheltenham Spa", "Bromsgrove"]

3

Rows: 5

The first group includes the two shortest paths with pathLength = 2 (as seen in the first two rows of the results), and the second group includes the three second shortest paths with pathLength = 3 (as seen in the last three rows of the results).

If more groups are specified than exist in the graph, only those paths that exist are returned. For example, if the paths equal to one of the eight shortest paths are specified for Worcester Shrub Hill to Bromsgrove, only seven groups are returned:

Query
MATCH p = SHORTEST 8 GROUPS (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN length(p) AS pathLength, count(*) AS numPaths
Table 6. Result
pathLength numPaths

2

2

3

3

4

1

5

4

6

8

7

10

8

6

Rows: 7

ANY

The ANY keyword can be used to test the reachability of nodes from a given node(s). It returns the same as SHORTEST 1, but by using the ANY keyword the intent of the query is clearer. For example, the following query shows that there exists a route from Pershore to Bromsgrove where the distance between each pair of stations is less than 10 miles:

Query
MATCH path = ANY
  (:Station {name: 'Pershore'})-[l:LINK WHERE l.distance < 10]-+(b:Station {name: 'Bromsgrove'})
RETURN [r IN relationships(path) | r.distance] AS distances
Table 7. Result
distances

[4.16, 3.71, 5.76, 6.16]

Rows: 1

Partitions

When there are multiple start or end nodes matching a path pattern, the matches are partitioned into distinct pairs of start and end nodes prior to selecting the shortest paths; a partition is one distinct pair of start node and end node. The selection of shortest paths is then done from all paths that join the start and end node of a given partition. The results are then formed from the union of all the shortest paths found for each partition.

For example, if the start nodes of matches are bound to either Droitwich Spa or Hartlebury, and the end nodes are bound to either Ashchurch or Cheltenham Spa, there will be four distinct pairs of start and end nodes, and therefore four partitions:

Start node End node

Droitwich Spa

Ashchurch

Droitwich Spa

Cheltenham Spa

Hartlebury

Ashchurch

Hartlebury

Cheltenham Spa

The following query illustrates how these partitions define the sets of results within which the shortest paths are selected. It uses a pair of UNWIND clauses to generate a Cartesian product of the names of the Stations (all possible pairs of start node and end node), followed by the MATCH clause to find the shortest two groups of paths for each pair of distinct start and end Stations:

Query
UNWIND ["Droitwich Spa", "Hartlebury"] AS a
UNWIND ["Ashchurch", "Cheltenham Spa"] AS b
MATCH SHORTEST 2 GROUPS (o:Station {name: a})-[l]-+(d:Station {name: b})
RETURN o.name AS start, d.name AS end,
    size(l) AS pathLength, count(*) AS numPaths
ORDER BY start, end, pathLength
Table 8. Result
start end pathLength numPaths

"Droitwich Spa"

"Ashchurch"

2

1

"Droitwich Spa"

"Ashchurch"

3

4

"Droitwich Spa"

"Cheltenham Spa"

2

1

"Droitwich Spa"

"Cheltenham Spa"

3

1

"Hartlebury"

"Ashchurch"

3

1

"Hartlebury"

"Ashchurch"

4

4

"Hartlebury"

"Cheltenham Spa"

3

1

"Hartlebury"

"Cheltenham Spa"

4

1

Rows: 8

Each partition appears twice: once for the group of shortest paths and once for the group of second shortest paths. For example, for the partition of Droitwich Spa as the start and Ashchurch as the end, the shortest path group (paths with length 2) has one path, and the second shortest path group (paths with length 3) has four paths.

Pre-filters and post-filters

The position of a filter in a shortest path query will affect whether it is applied before or after selecting the shortest paths. To see the difference, first consider a query that returns the shortest path from Hartlebury to Cheltenham Spa:

Query
MATCH SHORTEST 1
  (:Station {name: 'Hartlebury'})
  (()--(n))+
  (:Station {name: 'Cheltenham Spa'})
RETURN [stop in n[..-1] | stop.name] AS stops
Table 9. Result
stops

["Droitwich Spa", "Bromsgrove"]

Rows: 1

Note that n[..-1] is a slicing operation that returns all elements of n except the last. If instead, the query uses a WHERE clause at the MATCH level to filter out routes that go via Bromsgrove, the filtering is applied after the shortest paths are selected. This results in the only solution being removed, and no results being returned:

Query
MATCH SHORTEST 1
  (:Station {name: 'Hartlebury'})
  (()--(n:Station))+
  (:Station {name: 'Cheltenham Spa'})
WHERE none(stop IN n[..-1] WHERE stop.name = 'Bromsgrove')
RETURN [stop in n[..-1] | stop.name] AS stops
Table 10. Result
stops

Rows: 0

There are two ways to turn a post-filter without solutions into a pre-filter that returns solutions. One is to inline the predicate into the path pattern:

Query
MATCH SHORTEST 1
  (:Station {name: 'Hartlebury'})
  (()--(n:Station WHERE n.name <> 'Bromsgrove'))+
  (:Station {name: 'Cheltenham Spa'})
RETURN [stop in n[..-1] | stop.name] AS stops
Table 11. Result
stops

["Droitwich Spa", "Worcester Shrub Hill", "Ashchurch"]

Rows: 1

The shortest journey that avoids Bromsgrove is now returned.

An alternative is to wrap the path pattern and filter in parentheses (leaving the SHORTEST keyword on the outside):

Query
MATCH SHORTEST 1
  ( (:Station {name: 'Hartlebury'})
    (()--(n:Station))+
    (:Station {name: 'Cheltenham Spa'})
     WHERE none(stop IN n[..-1] WHERE stop.name = 'Bromsgrove') )
RETURN [stop in n[..-1] | stop.name] AS stops
Table 12. Result
stops

["Droitwich Spa", "Worcester Shrub Hill", "Ashchurch"]

Rows: 1

Pre-filter with a path variable

The previous section showed how to apply a filter before the shortest path selection by the use of parentheses. Placing a path variable declaration before the shortest path keywords, however, places it outside the scope of the parentheses. To reference a path variable in a pre-filter, it has to be declared inside the parentheses.

To illustrate, consider this example that returns all shortest paths from Hartlebury to each of the other Stations:

Query
MATCH p = SHORTEST 1 (:Station {name: 'Hartlebury'})--+(b:Station)
RETURN b.name AS destination, length(p) AS pathLength
ORDER BY pathLength, destination
Table 13. Result
destination pathLength

"Droitwich Spa"

1

"Bromsgrove"

2

"Worcester Foregate Street"

2

"Worcester Shrub Hill"

2

"Ashchurch"

3

"Cheltenham Spa"

3

"Worcestershire Parkway"

3

"Pershore"

4

Rows: 8

If the query is altered to only include routes that have an even number of stops, adding a WHERE clause at the MATCH level will not work, because it would be a post-filter. It would return the results of the previous query with all routes with an odd number of stops removed:

Query
MATCH p = SHORTEST 1 (:Station {name: 'Hartlebury'})--+(b:Station)
WHERE length(p) % 2 = 0
RETURN b.name AS destination, length(p) AS pathLength
ORDER BY pathLength, destination
Table 14. Result
destination pathLength

"Bromsgrove"

2

"Worcester Foregate Street"

2

"Worcester Shrub Hill"

2

"Pershore"

4

Rows: 4

To move the predicate to a pre-filter, the path variable should be referenced from within the parentheses, and the shortest routes with an even number of stops will be returned for all the destinations:

Query
MATCH SHORTEST 1
  (p = (:Station {name: 'Hartlebury'})--+(b:Station)
    WHERE length(p) % 2 = 0 )
RETURN b.name AS destination, length(p) AS pathLength
ORDER BY pathLength, destination
Table 15. Result
destination pathLength

"Bromsgrove"

2

"Worcester Foregate Street"

2

"Worcester Shrub Hill"

2

"Ashchurch"

4

"Cheltenham Spa"

4

"Droitwich Spa"

4

"Pershore"

4

"Worcestershire Parkway"

4

Rows: 8

Performance and planning

Shortest path queries often perform better when the Cypher planner can identify a single source-target node pair for a shortest path. This is because it allows the planner to use a bidirectional search from the source and target nodes and terminate when the shortest path between them is found, rather than traversing the whole graph for potential target nodes. However, while there are strategies to enforce this optimization, forcing Cypher to use them does not always improve performance.

If the planner estimates a single source-target node pair, Cypher uses either the ShortestPath or the StatefulShortestPath(Into) operators; otherwise it uses StatefulShortestPath(All). Each of these operators, and the criteria for their use, is outlined below.

For readers not familiar with Cypher execution plans and operators, it is recommended to first read Understanding execution plans.

Example graph

This section uses a graph structured as a tree with a branching factor of three and a depth of nine. This means the graph starts with a single root node, and each node creates three child nodes, continuing this pattern for nine levels. In total, the graph consists of almost 90 000 nodes.

Each node (N) has a trail property that acts as a "breadcrumb" list, tracing the path from the root of the tree to that node (the root node has an empty list for its trail property). The trail records the choice of "A", "B", or "C" taken to reach each node, building a sequence that increases in length with each level. This ensures that every trail value is unique in the graph.

To recreate the graph, run the following query against an empty Neo4j database:

WITH 9 AS depth, 3 AS branching
CREATE (:N {level: 0, trail: []})
WITH *
UNWIND range(0, depth) AS level
CALL (branching, level) {
   MATCH (n {level: level})
   UNWIND ["A", "B", "C"] AS branch
   CREATE (n)-[:R]->(:N {level: level + 1, trail: n.trail + [branch]})
};
CREATE RANGE INDEX level_index FOR (n:N) ON n.level

Enforcing a single source-target node pair with CALL subqueries

One way for Cypher to enforce a single-source node pair for a shortest path is to rewrite the query to include a CALL subquery. This is because, with CALL, each incoming row already has the source and target nodes bound, so the cardinality (i.e. count) for each is known to be 1. Without CALL, the planner does not know how many targets there are per source node, even though it still runs the shortest-path search once per source, and this may generate less efficient queries.

Example 1. CALL subqueries and shortest path query performance
Find a shortest path between two nodes
PROFILE
MATCH p = ANY SHORTEST
  (source:N {trail: ["C", "C", "A", "C", "A", "B", "B", "B", "A"]})--+
  (target:N {trail: ["A", "B", "C", "A", "B", "C", "A", "B", "C"]})
RETURN length(p) AS pathLength
Table 16. Result
pathLength

18

Rows: 1

The plan generated by this query shows that the planner cannot ascertain the existence of a single source-target node pair, even though the trail property is unique to each node in the graph (no index has been created on this property yet). It, therefore, must exhaust all possible target nodes before determining the shortest path from the source using the StatefulShortestPath(All) operator.

Query Plan
+-----------------------------------+----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| Operator                          | Id | Details                                                              | Estimated Rows | Rows  | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-----------------------------------+----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults                   |  0 | `length(p)`                                                          |       19612941 |     1 |       0 |              0 |                    0/0 |     0.027 |                     |
| |                                 +----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+                     |
| +Projection                       |  1 | length((source)-[anon_7*]-(target)) AS `length(p)`                   |       19612941 |     1 |      17 |                |                    9/0 |     0.036 |                     |
| |                                 +----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+                     |
| +StatefulShortestPath(All, Trail) |  2 | SHORTEST 1 (source) ((`anon_3`)-[`anon_4`]-(`anon_5`)){1, } (target) |       19612941 |     1 |  354292 |       64720328 |                85358/0 |   139.138 | In Pipeline 1       |
| |                                 |    |         expanding from: source                                       |                |       |         |                |                        |           |                     |
| |                                 |    |     inlined predicates: target.trail = $autolist_1                   |                |       |         |                |                        |           |                     |
| |                                 |    |                         target:N                                     |                |       |         |                |                        |           |                     |
| |                                 +----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| +Filter                           |  3 | source.trail = $autolist_0                                           |           4429 |     1 |  177146 |                |                        |           |                     |
| |                                 +----+----------------------------------------------------------------------+----------------+-------+---------+----------------+                        |           |                     |
| +NodeByLabelScan                  |  4 | source:N                                                             |          88573 | 88573 |   88574 |            376 |                 2128/0 |    42.628 | Fused in Pipeline 0 |
+-----------------------------------+----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 620029, total allocated memory: 64720664

1 row
ready to start consuming query after 59 ms, results consumed after another 183 ms

However, since each trail property is unique, rewriting the query to use a CALL subquery yields a more efficient plan. This is because it forces the planner to use the StatefulShortestPath(Into) operator, which expands from the source node until it finds its specific target node, and ensures that ANY SHORTEST is executed once per source-target pair.

Shortest path query rewritten with a CALL subquery
PROFILE
MATCH (start:N {trail: ["C", "C", "A", "C", "A", "B", "B", "B", "A"]}),
      (end:N {trail: ["A", "B", "C", "A", "B", "C", "A", "B", "C"]})
CALL (start, end) {
   MATCH p = ANY SHORTEST (start)--+(end)
   RETURN p
}
RETURN length(p) AS pathLength

The result is a significantly faster query (down from 59 to 9 milliseconds):

Query Plan
+------------------------------------+----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| Operator                           | Id | Details                                                        | Estimated Rows | Rows  | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+------------------------------------+----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults                    |  0 | `length(p)`                                                    |       19612941 |     1 |       0 |              0 |                    0/0 |     0.019 |                     |
| |                                  +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+                     |
| +Projection                        |  1 | length(p) AS `length(p)`                                       |       19612941 |     1 |       0 |                |                    0/0 |     0.007 |                     |
| |                                  +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+                     |
| +Projection                        |  2 | (start)-[anon_14*]-(end) AS p                                  |       19612941 |     1 |      17 |                |                    9/0 |     0.073 |                     |
| |                                  +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+                     |
| +StatefulShortestPath(Into, Trail) |  3 | ANY 1 (start) ((`anon_10`)-[`anon_11`]-(`anon_12`)){1, } (end) |       19612941 |     1 |    1936 |         990280 |                  205/0 |     1.135 | In Pipeline 3       |
| |                                  +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| +CartesianProduct                  |  4 |                                                                |       19612941 |     1 |       0 |           9040 |                        |     0.131 | In Pipeline 2       |
| |\                                 +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| | +Filter                          |  5 | end.trail = $autolist_1                                        |          22143 |     1 |  177146 |                |                        |           |                     |
| | |                                +----+----------------------------------------------------------------+----------------+-------+---------+----------------+                        |           |                     |
| | +NodeByLabelScan                 |  6 | end:N                                                          |         442865 | 88573 |   88574 |            392 |                 2128/0 |    29.822 | Fused in Pipeline 1 |
| |                                  +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| +Filter                            |  7 | start.trail = $autolist_0                                      |           4429 |     1 |  177146 |                |                        |           |                     |
| |                                  +----+----------------------------------------------------------------+----------------+-------+---------+----------------+                        |           |                     |
| +NodeByLabelScan                   |  8 | start:N                                                        |          88573 | 88573 |   88574 |            376 |                 2128/0 |    40.743 | Fused in Pipeline 0 |
+------------------------------------+----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 533393, total allocated memory: 999592

1 row
ready to start consuming query after 9 ms, results consumed after another 73 ms

Enforcing a single source-target node pair with indexes and constraints

Another way to inform the planner of the uniqueness of the target node in a shortest path is to create an index or property uniqueness/key constraint (both of which are index-backed) on a property belonging to the matched nodes in the shortest path. This will accurately inform the planner of node cardinality and thereby enable more efficient query planning (assuming the graph contains uniquely identifying node properties).

Example 2. Impact of indexes and constraints
Create a property uniqueness constraint on the trail property
CREATE CONSTRAINT unique_trail FOR (n:N) REQUIRE n.trail IS UNIQUE

This constraint will inform the planner of the uniqueness of trail values up front. As a result, the simpler shortest path query (without a CALL subquery) will now generate a faster plan (using the StatefulShortestPath(Into)) operator with a cardinality of 1 for both the source and target nodes of the shortest path.

Find a shortest path between two nodes
PROFILE
MATCH p = ANY SHORTEST
  (source:N {trail: ["C", "C", "A", "C", "A", "B", "B", "B", "A"]})--+
  (target:N {trail: ["A", "B", "C", "A", "B", "C", "A", "B", "C"]})
RETURN length(p) AS pathLength
Query Plan
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------+
| Operator                           | Id | Details                                                                                            | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline      |
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------+
| +ProduceResults                    |  0 | `length(p)`                                                                                        |              1 |    1 |       0 |              0 |                    0/0 |     0.034 |               |
| |                                  +----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+               |
| +Projection                        |  1 | length((source)-[anon_19*]-(target)) AS `length(p)`                                                |              1 |    1 |      17 |                |                    9/0 |     0.057 |               |
| |                                  +----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+               |
| +StatefulShortestPath(Into, Trail) |  2 | SHORTEST 1 (source) ((`anon_15`)-[`anon_16`]-(`anon_17`)){1, } (target)                            |              1 |    1 |    1936 |         990288 |                  205/0 |     1.658 | In Pipeline 1 |
| |                                  +----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------+
| +MultiNodeIndexSeek                |  3 | UNIQUE source:N(trail) WHERE trail = $autolist_0, UNIQUE target:N(trail) WHERE trail = $autolist_1 |              1 |    1 |       4 |            376 |                    4/2 |     0.332 | In Pipeline 0 |
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------+

Total database accesses: 1957, total allocated memory: 990608

1 row
ready to start consuming query after 48 ms, results consumed after another 3 ms

Limitations of enforcing a single source-target node pair

Enforcing a single source-target node pair is not always preferable. With one source and many targets, rewriting a shortest path query using a CALL subquery forces the planner to use StatefulShortestPath(Into), which runs once per target node. While this is efficient for a single pair, it can become slower as the number of targets increases because it forces the planner to traverse the graph for each individual pair of source-target nodes. In such cases, it may be more efficient to let the planner use StatefulShortestPath(All), which expands across the graph once and returns all matches.

Example 3. Efficient planning for multi-target shortest paths

Consider the following query, which does not specify a unique target node and generates a total of 19682 shortest paths from the source nodes:

Find a shortest path to many target nodes
PROFILE
MATCH p = ANY SHORTEST
  (start:N {trail: ["C", "C", "A", "C", "A", "B", "B", "B", "A"]})--+
  (end:N {level: 9})
RETURN count(*) AS pathCount
Table 17. Result
pathLength

19682

Rows: 1

Due to the existence of multiple target nodes without a specified, unique property value (there are 19683 nodes in the graph with a level property value of 9), the planner will default to using the StatefulShortestPath(All) operator, which expands once from the source node until all valid shortest paths have been found.

Query Plan
+-----------------------------------+----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------+
| Operator                          | Id | Details                                                          | Estimated Rows | Rows  | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline      |
+-----------------------------------+----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------+
| +ProduceResults                   |  0 | `count(*)`                                                       |              1 |     1 |       0 |              0 |                    0/0 |     0.015 |               |
| |                                 +----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+               |
| +EagerAggregation                 |  1 | count(*) AS `count(*)`                                           |              1 |     1 |       0 |             40 |                    0/0 |     0.097 | In Pipeline 2 |
| |                                 +----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------+
| +StatefulShortestPath(All, Trail) |  2 | SHORTEST 1 (start) ((`anon_3`)-[`anon_4`]-(`anon_5`)){1, } (end) |           8052 | 19682 |  373974 |       81274328 |                65235/0 |   330.475 | In Pipeline 1 |
| |                                 |    |         expanding from: start                                    |                |       |         |                |                        |           |               |
| |                                 |    |     inlined predicates: end.level = $autoint_1                   |                |       |         |                |                        |           |               |
| |                                 |    |                         end:N                                    |                |       |         |                |                        |           |               |
| |                                 +----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------+
| +NodeUniqueIndexSeek              |  3 | UNIQUE start:N(trail) WHERE trail = $autolist_0                  |              1 |     1 |       2 |            376 |                    3/0 |     0.106 | In Pipeline 0 |
+-----------------------------------+----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------+

Total database accesses: 373976, total allocated memory: 81274688

1 row
ready to start consuming query after 40 ms, results consumed after another 331 ms

If the query is rewritten with a CALL subquery the planner will use StatefulShortestPath(Into) which performs separate traversals for each individual source-target node pairs.

Multi-target shortest path query rewritten with a CALL subquery
PROFILE
MATCH (start:N {trail: ["C", "C", "A", "C", "A", "B", "B", "B", "A"]}),
      (end:N {level: 9})
CALL (start, end) {
   MATCH p = ANY SHORTEST (start)--+(end)
   RETURN p
}
RETURN count(*) AS pathCount
Query Plan
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+---------------+
| Operator                           | Id | Details                                                                                            | Estimated Rows | Rows  | DB Hits  | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline      |
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+---------------+
| +ProduceResults                    |  0 | `count(*)`                                                                                         |              1 |     1 |        0 |              0 |                    0/0 |     0.120 |               |
| |                                  +----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+               |
| +EagerAggregation                  |  1 | count(*) AS `count(*)`                                                                             |              1 |     1 |        0 |             40 |                    0/0 |     0.172 | In Pipeline 2 |
| |                                  +----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+---------------+
| +Projection                        |  2 | (start)-[anon_7*]-(end) AS p                                                                       |           8052 | 19682 |   314930 |                |               184197/0 |    35.430 |               |
| |                                  +----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+               |
| +StatefulShortestPath(Into, Trail) |  3 | SHORTEST 1 (start) ((`anon_3`)-[`anon_4`]-(`anon_5`)){1, } (end)                                   |           8052 | 19682 | 32672226 |      157866776 |              3588500/0 | 14200.424 | In Pipeline 1 |
| |                                  +----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+---------------+
| +MultiNodeIndexSeek                |  4 | UNIQUE start:N(trail) WHERE trail = $autolist_0, RANGE INDEX end:N(level) WHERE level = $autoint_1 |           8052 | 19683 |    19686 |            376 |                  108/0 |     4.014 | In Pipeline 0 |
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+---------------+

Total database accesses: 33006842, total allocated memory: 157867272

1 row
ready to start consuming query after 32 ms, results consumed after another 14244 ms

As the plan shows, in this scenario it is not more efficient to enforce a single source-target node pair. On the contrary, doing so ensures that StatefulShortestPath(Into) is executed 19682 times, once for each source-target node pair, thereby generating a more expensive query.

Shortest path operators

Cypher uses three different operators to plan shortest path queries. The criteria for when each is used is outlined in the table below.

Operator Description Criteria

ShortestPath

Performs bidirectional breadth-first searches (BFS) from target and source node. Terminates when a shortest path is found between them.

Used when:

Or, when the estimated cardinality of the source and target nodes in a shortest path are 1 or less and the following are all true:

  • Selector is one of: SHORTEST 1, ANY, ANY SHORTEST, ALL SHORTEST, SHORTEST GROUP, or SHORTEST 1 GROUP.

  • There is only one relationship pattern.

  • If the pattern is a quantified path pattern in the form of (()-[]-()), or is a quantified-relationship, and it uses a filter that can be applied directly to the relationship.

  • In the case of a quantified path pattern, there are no node variables declared inside the quantified path pattern that are referenced elsewhere.

StatefulShortestPath(Into)

Performs bidirectional BFS from target and source node. Terminates when a shortest path is found between them.

Used when the estimated cardinality of the source and target nodes in a shortest path are 1 or less and either of the following are true:

  • Selector is one of: ANY/SHORTEST k or SHORTEST k GROUPS and k is larger than 1.

  • There is more than one relationship pattern.

StatefulShortestPath(All)

Performs unidirectional BFS to find shortest paths from a source node to all nodes matching the target node conditions.

Used when the planner estimates more than one source-target node pair in a shortest path.

StatefulShortestPath(Into) and StatefulShortestPath(All) can match more complex shortest paths than ShortestPath. As a result, queries using these operators may be slower and more costly.

ShortestPath operator: fast vs. exhaustive search

Queries planned with the ShortestPath operator (see the table above for when this operator is used), use two different search algorithms depending on the predicates in the query.

If the predicate can be checked as the search progresses (for example, requiring every relationship in the path to have a specific property), the planner can exclude invalid paths early. In such cases, a fast bidirectional breadth-first search (BFS) algorithm is used.

Fast search-algorithm
MATCH (start:N {level: 1}), (end:N {level: 5})
MATCH p = shortestPath((start)-[r*]-(end))
WHERE all(rel IN r WHERE rel.flag IS NULL)
RETURN p

If the predicate requires inspecting the entire path after it has been matched (such as checking whether the path length exceeds a certain value), the planner cannot exclude paths early. In such cases, a slower, exhaustive search-algorithm is used. Exhaustive searches may be very time consuming in certain cases, such as when there is no shortest path between two nodes (to disallow exhaustive searches, set dbms.cypher.forbid_exhaustive_shortestpath to true).

Exhaustive search-algorithm
MATCH (start:N {level: 1}), (end:N {level: 5})
MATCH p = shortestPath((start)-[*]-(end))
WHERE length(p) > 3
RETURN p

For queries that would otherwise trigger an exhaustive search, a practical workaround is to first bind the matched path and then filter it using WITH and WHERE. This allows the planner to use a fast search-algorithm while finding the shortest path, and only afterwards apply the filter. Note that, because the filter is applied after the fast algorithm runs, it may eliminate all candidate paths and return no results.

Query rewritten to use fast search-algorithm
MATCH (start:N {level: 1}), (end:N {level: 5})
MATCH p = shortestPath((start)-[*]-(end))
WITH p
WHERE length(p) > 3
RETURN p