Shortest paths
The Cypher® keyword SHORTEST
is used to find variations of the shortest paths between nodes.
This includes the ability to look for the shortest, second-shortest (and so on) paths, all available shortest paths, and groups of paths containing the same pattern length.
The ANY
keyword, which can be used to test the reachability of nodes from a given node(s), is also explained, as is how to apply filters in queries using SHORTEST
.
SHORTEST functionally replaces and extends the shortestPath() and allShortestPaths() functions.
Both functions can still be used, but they are not GQL conformant.
For more information, see Syntax and semantics → The shortestPath() and allShortestPaths() functions.
|
Note on Cypher and GDS shortest paths
Both Cypher and Neo4j´s Graph Data Science (GDS) library can be used to find variations of the shortest paths between nodes.
Use Cypher if:
-
You need to specify complex graph navigation via quantified path patterns.
-
Creating a graph projection takes too long.
-
GDS is not available in your instance, or the size of the GDS projection is too large for your instance.
Use GDS if:
To read more about the shortest path algorithms included in the GDS library, see GDS Graph algorithms → Path finding.
SHORTEST k
This section uses the following graph:
To recreate it, run the following query against an empty Neo4j database:
CREATE (asc:Station {name:"Ashchurch"}),
(bmv:Station {name:"Bromsgrove"}),
(cnm:Station {name:"Cheltenham Spa"}),
(dtw:Station {name:"Droitwich Spa"}),
(hby:Station {name:"Hartlebury"}),
(psh:Station {name:"Pershore"}),
(wop:Station {name:"Worcestershire Parkway"}),
(wof:Station {name:"Worcester Foregate Street"}),
(wos:Station {name:"Worcester Shrub Hill"})
CREATE (asc)-[:LINK {distance: 7.25}]->(cnm),
(asc)-[:LINK {distance: 11.29}]->(wop),
(asc)-[:LINK {distance: 14.75}]->(wos),
(bmv)-[:LINK {distance: 31.14}]->(cnm),
(bmv)-[:LINK {distance: 6.16}]->(dtw),
(bmv)-[:LINK {distance: 12.6}]->(wop),
(dtw)-[:LINK {distance: 5.64}]->(hby),
(dtw)-[:LINK {distance: 6.03}]->(wof),
(dtw)-[:LINK {distance: 5.76}]->(wos),
(psh)-[:LINK {distance: 4.16}]->(wop),
(wop)-[:LINK {distance: 3.71}]->(wos),
(wof)-[:LINK {distance: 0.65}]->(wos)
The paths matched by a path pattern can be restricted to only the shortest (by number of hops) by including the keyword SHORTEST k
, where k
is the number of paths to match.
For example, the following example uses SHORTEST 1
to return the length of the shortest path between Worcester Shrub Hill
and Bromsgrove
:
MATCH p = SHORTEST 1 (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN length(p) AS result
Note that this and the following examples in this section use a quantified relationship -[:LINK]-+ , which is composed of a relationship pattern -[:LINK]- and a postfix quantifier + .
The relationship pattern is only concerned with following relationships with type LINK , and will otherwise traverse any node along the way.
There is no arrowhead < or > on the relationship pattern, allowing the pattern to match relationships going in either direction.
This represents the fact that trains can go in both directions along the LINK relationships between Stations.
The + quantifier means that one or more relationships should be matched. For more information, see Syntax and semantics - quantified relationships.
|
result |
---|
|
Rows: 1 |
Although the query returned a single result, there are in fact two paths that are tied for shortest:
Because 1
was specified in SHORTEST
, only one of the paths is returned.
Which one is returned is non-deterministic.
If instead SHORTEST 2
is specified, all shortest paths in this example would be returned, and the result would be deterministic:
MATCH p = SHORTEST 2 (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops
stops |
---|
|
|
Rows: 2 |
Increasing the number of paths will return the next shortest paths. Three paths are tied for the second shortest:
The following query returns all three of the second shortest paths, along with the two shortest paths:
MATCH p = SHORTEST 5 (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops
stops |
---|
|
|
|
|
|
Rows: 5 |
If there had been only four possible paths between the two Stations, then only those four would have been returned.
ALL SHORTEST
To return all paths that are tied for shortest length, use the keywords ALL SHORTEST
:
MATCH p = ALL SHORTEST (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops
stops |
---|
|
|
Rows: 2 |
SHORTEST k GROUPS
To return all paths that are tied for first, second, and so on up to the kth shortest length, use SHORTEST k GROUPS
.
For example, the following returns the first and second shortest length paths between Worcester Shrub Hill
and Bromsgrove
:
MATCH p = SHORTEST 2 GROUPS (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN [n in nodes(p) | n.name] AS stops, length(p) AS pathLength
stops | pathLength |
---|---|
|
|
|
|
|
|
|
|
|
|
Rows: 5 |
The first group includes the two shortest paths with pathLength = 2
(as seen in the first two rows of the results), and the second group includes the three second shortest paths with pathLength = 3
(as seen in the last three rows of the results).
If more groups are specified than exist in the graph, only those paths that exist are returned.
For example, if the paths equal to one of the eight shortest paths are specified for Worcester Shrub Hill
to Bromsgrove
, only seven groups are returned:
MATCH p = SHORTEST 8 GROUPS (wos:Station)-[:LINK]-+(bmv:Station)
WHERE wos.name = "Worcester Shrub Hill" AND bmv.name = "Bromsgrove"
RETURN length(p) AS pathLength, count(*) AS numPaths
pathLength | numPaths |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rows: 7 |
ANY
The ANY
keyword can be used to test the reachability of nodes from a given node(s).
It returns the same as SHORTEST 1
, but by using the ANY
keyword the intent of the query is clearer.
For example, the following query shows that there exists a route from Pershore
to Bromsgrove
where the distance between each pair of stations is less than 10 miles:
MATCH path = ANY
(:Station {name: 'Pershore'})-[l:LINK WHERE l.distance < 10]-+(b:Station {name: 'Bromsgrove'})
RETURN [r IN relationships(path) | r.distance] AS distances
distances |
---|
|
Rows: 1 |
Partitions
When there are multiple start or end nodes matching a path pattern, the matches are partitioned into distinct pairs of start and end nodes prior to selecting the shortest paths; a partition is one distinct pair of start node and end node. The selection of shortest paths is then done from all paths that join the start and end node of a given partition. The results are then formed from the union of all the shortest paths found for each partition.
For example, if the start nodes of matches are bound to either Droitwich Spa
or Hartlebury
, and the end nodes are bound to either Ashchurch
or Cheltenham Spa
, there will be four distinct pairs of start and end nodes, and therefore four partitions:
Start node | End node |
---|---|
|
|
|
|
|
|
|
|
The following query illustrates how these partitions define the sets of results within which the shortest paths are selected.
It uses a pair of UNWIND
clauses to generate a Cartesian product of the names of the Stations
(all possible pairs of start node and end node), followed by the MATCH
clause to find the shortest two groups of paths for each pair of distinct start and end Stations
:
UNWIND ["Droitwich Spa", "Hartlebury"] AS a
UNWIND ["Ashchurch", "Cheltenham Spa"] AS b
MATCH SHORTEST 2 GROUPS (o:Station {name: a})-[l]-+(d:Station {name: b})
RETURN o.name AS start, d.name AS end,
size(l) AS pathLength, count(*) AS numPaths
ORDER BY start, end, pathLength
start | end | pathLength | numPaths |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rows: 8 |
Each partition appears twice: once for the group of shortest paths and once for the group of second shortest paths.
For example, for the partition of Droitwich Spa
as the start
and Ashchurch
as the end
, the shortest path group (paths with length 2
) has one path, and the second shortest path group (paths with length 3
) has four paths.
Pre-filters and post-filters
The position of a filter in a shortest path query will affect whether it is applied before or after selecting the shortest paths.
To see the difference, first consider a query that returns the shortest path from Hartlebury
to Cheltenham Spa
:
MATCH SHORTEST 1
(:Station {name: 'Hartlebury'})
(()--(n))+
(:Station {name: 'Cheltenham Spa'})
RETURN [stop in n[..-1] | stop.name] AS stops
stops |
---|
|
Rows: 1 |
Note that n[..-1]
is a slicing operation that returns all elements of n
except the last.
If instead, the query uses a WHERE
clause at the MATCH
level to filter out routes that go via Bromsgrove, the filtering is applied after the shortest paths are selected.
This results in the only solution being removed, and no results being returned:
MATCH SHORTEST 1
(:Station {name: 'Hartlebury'})
(()--(n:Station))+
(:Station {name: 'Cheltenham Spa'})
WHERE none(stop IN n[..-1] WHERE stop.name = 'Bromsgrove')
RETURN [stop in n[..-1] | stop.name] AS stops
stops |
---|
Rows: 0 |
There are two ways to turn a post-filter without solutions into a pre-filter that returns solutions. One is to inline the predicate into the path pattern:
MATCH SHORTEST 1
(:Station {name: 'Hartlebury'})
(()--(n:Station WHERE n.name <> 'Bromsgrove'))+
(:Station {name: 'Cheltenham Spa'})
RETURN [stop in n[..-1] | stop.name] AS stops
stops |
---|
|
Rows: 1 |
The shortest journey that avoids Bromsgrove
is now returned.
An alternative is to wrap the path pattern and filter in parentheses (leaving the SHORTEST
keyword on the outside):
MATCH SHORTEST 1
( (:Station {name: 'Hartlebury'})
(()--(n:Station))+
(:Station {name: 'Cheltenham Spa'})
WHERE none(stop IN n[..-1] WHERE stop.name = 'Bromsgrove') )
RETURN [stop in n[..-1] | stop.name] AS stops
stops |
---|
|
Rows: 1 |
Pre-filter with a path variable
The previous section showed how to apply a filter before the shortest path selection by the use of parentheses. Placing a path variable declaration before the shortest path keywords, however, places it outside the scope of the parentheses. To reference a path variable in a pre-filter, it has to be declared inside the parentheses.
To illustrate, consider this example that returns all shortest paths from Hartlebury
to each of the other Stations
:
MATCH p = SHORTEST 1 (:Station {name: 'Hartlebury'})--+(b:Station)
RETURN b.name AS destination, length(p) AS pathLength
ORDER BY pathLength, destination
destination | pathLength |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rows: 8 |
If the query is altered to only include routes that have an even number of stops, adding a WHERE
clause at the MATCH
level will not work, because it would be a post-filter.
It would return the results of the previous query with all routes with an odd number of stops removed:
MATCH p = SHORTEST 1 (:Station {name: 'Hartlebury'})--+(b:Station)
WHERE length(p) % 2 = 0
RETURN b.name AS destination, length(p) AS pathLength
ORDER BY pathLength, destination
destination | pathLength |
---|---|
|
|
|
|
|
|
|
|
Rows: 4 |
To move the predicate to a pre-filter, the path variable should be referenced from within the parentheses, and the shortest routes with an even number of stops will be returned for all the destinations:
MATCH SHORTEST 1
(p = (:Station {name: 'Hartlebury'})--+(b:Station)
WHERE length(p) % 2 = 0 )
RETURN b.name AS destination, length(p) AS pathLength
ORDER BY pathLength, destination
destination | pathLength |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rows: 8 |
Performance and planning
Shortest path queries often perform better when the Cypher planner can identify a single source-target node pair for a shortest path. This is because it allows the planner to use a bidirectional search from the source and target nodes and terminate when the shortest path between them is found, rather than traversing the whole graph for potential target nodes. However, while there are strategies to enforce this optimization, forcing Cypher to use them does not always improve performance.
If the planner estimates a single source-target node pair, Cypher uses either the ShortestPath
or the StatefulShortestPath(Into)
operators; otherwise it uses StatefulShortestPath(All)
.
Each of these operators, and the criteria for their use, is outlined below.
For readers not familiar with Cypher execution plans and operators, it is recommended to first read Understanding execution plans.
Example graph
This section uses a graph structured as a tree with a branching factor of three and a depth of nine. This means the graph starts with a single root node, and each node creates three child nodes, continuing this pattern for nine levels. In total, the graph consists of almost 90 000 nodes.
Each node (N
) has a trail
property that acts as a "breadcrumb" list, tracing the path from the root of the tree to that node (the root node has an empty list for its trail
property).
The trail records the choice of "A"
, "B"
, or "C"
taken to reach each node, building a sequence that increases in length with each level.
This ensures that every trail
value is unique in the graph.
To recreate the graph, run the following query against an empty Neo4j database:
WITH 9 AS depth, 3 AS branching
CREATE (:N {level: 0, trail: []})
WITH *
UNWIND range(0, depth) AS level
CALL (branching, level) {
MATCH (n {level: level})
UNWIND ["A", "B", "C"] AS branch
CREATE (n)-[:R]->(:N {level: level + 1, trail: n.trail + [branch]})
};
CREATE RANGE INDEX level_index FOR (n:N) ON n.level
Enforcing a single source-target node pair with CALL
subqueries
One way for Cypher to enforce a single-source node pair for a shortest path is to rewrite the query to include a CALL
subquery.
This is because, with CALL
, each incoming row already has the source and target nodes bound, so the cardinality (i.e. count) for each is known to be 1.
Without CALL
, the planner does not know how many targets there are per source node, even though it still runs the shortest-path search once per source, and this may generate less efficient queries.
CALL
subqueries and shortest path query performancePROFILE
MATCH p = ANY SHORTEST
(source:N {trail: ["C", "C", "A", "C", "A", "B", "B", "B", "A"]})--+
(target:N {trail: ["A", "B", "C", "A", "B", "C", "A", "B", "C"]})
RETURN length(p) AS pathLength
pathLength |
---|
|
Rows: 1 |
The plan generated by this query shows that the planner cannot ascertain the existence of a single source-target node pair, even though the trail
property is unique to each node in the graph (no index has been created on this property yet).
It, therefore, must exhaust all possible target nodes before determining the shortest path from the source using the StatefulShortestPath(All)
operator.
+-----------------------------------+----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| Operator | Id | Details | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline |
+-----------------------------------+----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults | 0 | `length(p)` | 19612941 | 1 | 0 | 0 | 0/0 | 0.027 | |
| | +----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+ |
| +Projection | 1 | length((source)-[anon_7*]-(target)) AS `length(p)` | 19612941 | 1 | 17 | | 9/0 | 0.036 | |
| | +----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+ |
| +StatefulShortestPath(All, Trail) | 2 | SHORTEST 1 (source) ((`anon_3`)-[`anon_4`]-(`anon_5`)){1, } (target) | 19612941 | 1 | 354292 | 64720328 | 85358/0 | 139.138 | In Pipeline 1 |
| | | | expanding from: source | | | | | | | |
| | | | inlined predicates: target.trail = $autolist_1 | | | | | | | |
| | | | target:N | | | | | | | |
| | +----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| +Filter | 3 | source.trail = $autolist_0 | 4429 | 1 | 177146 | | | | |
| | +----+----------------------------------------------------------------------+----------------+-------+---------+----------------+ | | |
| +NodeByLabelScan | 4 | source:N | 88573 | 88573 | 88574 | 376 | 2128/0 | 42.628 | Fused in Pipeline 0 |
+-----------------------------------+----+----------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
Total database accesses: 620029, total allocated memory: 64720664
1 row
ready to start consuming query after 59 ms, results consumed after another 183 ms
However, since each trail
property is unique, rewriting the query to use a CALL
subquery yields a more efficient plan.
This is because it forces the planner to use the StatefulShortestPath(Into)
operator, which expands from the source node until it finds its specific target node, and ensures that ANY SHORTEST
is executed once per source-target pair.
CALL
subqueryPROFILE
MATCH (start:N {trail: ["C", "C", "A", "C", "A", "B", "B", "B", "A"]}),
(end:N {trail: ["A", "B", "C", "A", "B", "C", "A", "B", "C"]})
CALL (start, end) {
MATCH p = ANY SHORTEST (start)--+(end)
RETURN p
}
RETURN length(p) AS pathLength
The result is a significantly faster query (down from 59 to 9 milliseconds):
+------------------------------------+----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| Operator | Id | Details | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline |
+------------------------------------+----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults | 0 | `length(p)` | 19612941 | 1 | 0 | 0 | 0/0 | 0.019 | |
| | +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+ |
| +Projection | 1 | length(p) AS `length(p)` | 19612941 | 1 | 0 | | 0/0 | 0.007 | |
| | +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+ |
| +Projection | 2 | (start)-[anon_14*]-(end) AS p | 19612941 | 1 | 17 | | 9/0 | 0.073 | |
| | +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+ |
| +StatefulShortestPath(Into, Trail) | 3 | ANY 1 (start) ((`anon_10`)-[`anon_11`]-(`anon_12`)){1, } (end) | 19612941 | 1 | 1936 | 990280 | 205/0 | 1.135 | In Pipeline 3 |
| | +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| +CartesianProduct | 4 | | 19612941 | 1 | 0 | 9040 | | 0.131 | In Pipeline 2 |
| |\ +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| | +Filter | 5 | end.trail = $autolist_1 | 22143 | 1 | 177146 | | | | |
| | | +----+----------------------------------------------------------------+----------------+-------+---------+----------------+ | | |
| | +NodeByLabelScan | 6 | end:N | 442865 | 88573 | 88574 | 392 | 2128/0 | 29.822 | Fused in Pipeline 1 |
| | +----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
| +Filter | 7 | start.trail = $autolist_0 | 4429 | 1 | 177146 | | | | |
| | +----+----------------------------------------------------------------+----------------+-------+---------+----------------+ | | |
| +NodeByLabelScan | 8 | start:N | 88573 | 88573 | 88574 | 376 | 2128/0 | 40.743 | Fused in Pipeline 0 |
+------------------------------------+----+----------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------------+
Total database accesses: 533393, total allocated memory: 999592
1 row
ready to start consuming query after 9 ms, results consumed after another 73 ms
Enforcing a single source-target node pair with indexes and constraints
Another way to inform the planner of the uniqueness of the target node in a shortest path is to create an index or property uniqueness/key constraint (both of which are index-backed) on a property belonging to the matched nodes in the shortest path. This will accurately inform the planner of node cardinality and thereby enable more efficient query planning (assuming the graph contains uniquely identifying node properties).
trail
propertyCREATE CONSTRAINT unique_trail FOR (n:N) REQUIRE n.trail IS UNIQUE
This constraint will inform the planner of the uniqueness of trail
values up front.
As a result, the simpler shortest path query (without a CALL
subquery) will now generate a faster plan (using the StatefulShortestPath(Into)
) operator with a cardinality of 1 for both the source and target nodes of the shortest path.
PROFILE
MATCH p = ANY SHORTEST
(source:N {trail: ["C", "C", "A", "C", "A", "B", "B", "B", "A"]})--+
(target:N {trail: ["A", "B", "C", "A", "B", "C", "A", "B", "C"]})
RETURN length(p) AS pathLength
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------+
| Operator | Id | Details | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline |
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------+
| +ProduceResults | 0 | `length(p)` | 1 | 1 | 0 | 0 | 0/0 | 0.034 | |
| | +----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+ |
| +Projection | 1 | length((source)-[anon_19*]-(target)) AS `length(p)` | 1 | 1 | 17 | | 9/0 | 0.057 | |
| | +----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+ |
| +StatefulShortestPath(Into, Trail) | 2 | SHORTEST 1 (source) ((`anon_15`)-[`anon_16`]-(`anon_17`)){1, } (target) | 1 | 1 | 1936 | 990288 | 205/0 | 1.658 | In Pipeline 1 |
| | +----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------+
| +MultiNodeIndexSeek | 3 | UNIQUE source:N(trail) WHERE trail = $autolist_0, UNIQUE target:N(trail) WHERE trail = $autolist_1 | 1 | 1 | 4 | 376 | 4/2 | 0.332 | In Pipeline 0 |
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------+
Total database accesses: 1957, total allocated memory: 990608
1 row
ready to start consuming query after 48 ms, results consumed after another 3 ms
Limitations of enforcing a single source-target node pair
Enforcing a single source-target node pair is not always preferable.
With one source and many targets, rewriting a shortest path query using a CALL
subquery forces the planner to use StatefulShortestPath(Into)
, which runs once per target node.
While this is efficient for a single pair, it can become slower as the number of targets increases because it forces the planner to traverse the graph for each individual pair of source-target nodes.
In such cases, it may be more efficient to let the planner use StatefulShortestPath(All)
, which expands across the graph once and returns all matches.
Consider the following query, which does not specify a unique target node and generates a total of 19682 shortest paths from the source nodes:
PROFILE
MATCH p = ANY SHORTEST
(start:N {trail: ["C", "C", "A", "C", "A", "B", "B", "B", "A"]})--+
(end:N {level: 9})
RETURN count(*) AS pathCount
pathLength |
---|
|
Rows: 1 |
Due to the existence of multiple target nodes without a specified, unique property value (there are 19683 nodes in the graph with a level
property value of 9
), the planner will default to using the StatefulShortestPath(All)
operator, which expands once from the source node until all valid shortest paths have been found.
+-----------------------------------+----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------+
| Operator | Id | Details | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline |
+-----------------------------------+----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------+
| +ProduceResults | 0 | `count(*)` | 1 | 1 | 0 | 0 | 0/0 | 0.015 | |
| | +----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+ |
| +EagerAggregation | 1 | count(*) AS `count(*)` | 1 | 1 | 0 | 40 | 0/0 | 0.097 | In Pipeline 2 |
| | +----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------+
| +StatefulShortestPath(All, Trail) | 2 | SHORTEST 1 (start) ((`anon_3`)-[`anon_4`]-(`anon_5`)){1, } (end) | 8052 | 19682 | 373974 | 81274328 | 65235/0 | 330.475 | In Pipeline 1 |
| | | | expanding from: start | | | | | | | |
| | | | inlined predicates: end.level = $autoint_1 | | | | | | | |
| | | | end:N | | | | | | | |
| | +----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------+
| +NodeUniqueIndexSeek | 3 | UNIQUE start:N(trail) WHERE trail = $autolist_0 | 1 | 1 | 2 | 376 | 3/0 | 0.106 | In Pipeline 0 |
+-----------------------------------+----+------------------------------------------------------------------+----------------+-------+---------+----------------+------------------------+-----------+---------------+
Total database accesses: 373976, total allocated memory: 81274688
1 row
ready to start consuming query after 40 ms, results consumed after another 331 ms
If the query is rewritten with a CALL
subquery the planner will use StatefulShortestPath(Into)
which performs separate traversals for each individual source-target node pairs.
CALL
subqueryPROFILE
MATCH (start:N {trail: ["C", "C", "A", "C", "A", "B", "B", "B", "A"]}),
(end:N {level: 9})
CALL (start, end) {
MATCH p = ANY SHORTEST (start)--+(end)
RETURN p
}
RETURN count(*) AS pathCount
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+---------------+
| Operator | Id | Details | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline |
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+---------------+
| +ProduceResults | 0 | `count(*)` | 1 | 1 | 0 | 0 | 0/0 | 0.120 | |
| | +----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+ |
| +EagerAggregation | 1 | count(*) AS `count(*)` | 1 | 1 | 0 | 40 | 0/0 | 0.172 | In Pipeline 2 |
| | +----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+---------------+
| +Projection | 2 | (start)-[anon_7*]-(end) AS p | 8052 | 19682 | 314930 | | 184197/0 | 35.430 | |
| | +----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+ |
| +StatefulShortestPath(Into, Trail) | 3 | SHORTEST 1 (start) ((`anon_3`)-[`anon_4`]-(`anon_5`)){1, } (end) | 8052 | 19682 | 32672226 | 157866776 | 3588500/0 | 14200.424 | In Pipeline 1 |
| | +----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+---------------+
| +MultiNodeIndexSeek | 4 | UNIQUE start:N(trail) WHERE trail = $autolist_0, RANGE INDEX end:N(level) WHERE level = $autoint_1 | 8052 | 19683 | 19686 | 376 | 108/0 | 4.014 | In Pipeline 0 |
+------------------------------------+----+----------------------------------------------------------------------------------------------------+----------------+-------+----------+----------------+------------------------+-----------+---------------+
Total database accesses: 33006842, total allocated memory: 157867272
1 row
ready to start consuming query after 32 ms, results consumed after another 14244 ms
As the plan shows, in this scenario it is not more efficient to enforce a single source-target node pair.
On the contrary, doing so ensures that StatefulShortestPath(Into)
is executed 19682
times, once for each source-target node pair, thereby generating a more expensive query.
Shortest path operators
Cypher uses three different operators to plan shortest path queries. The criteria for when each is used is outlined in the table below.
Operator | Description | Criteria |
---|---|---|
|
Performs bidirectional breadth-first searches (BFS) from target and source node. Terminates when a shortest path is found between them. |
Used when:
Or, when the estimated cardinality of the source and target nodes in a shortest path are 1 or less and the following are all true:
|
|
Performs bidirectional BFS from target and source node. Terminates when a shortest path is found between them. |
Used when the estimated cardinality of the source and target nodes in a shortest path are 1 or less and either of the following are true:
|
|
Performs unidirectional BFS to find shortest paths from a source node to all nodes matching the target node conditions. |
Used when the planner estimates more than one source-target node pair in a shortest path. |
StatefulShortestPath(Into) and StatefulShortestPath(All) can match more complex shortest paths than ShortestPath . As a result, queries using these operators may be slower and more costly.
|
ShortestPath
operator: fast vs. exhaustive search
Queries planned with the ShortestPath
operator (see the table above for when this operator is used), use two different search algorithms depending on the predicates in the query.
If the predicate can be checked as the search progresses (for example, requiring every relationship in the path to have a specific property), the planner can exclude invalid paths early. In such cases, a fast bidirectional breadth-first search (BFS) algorithm is used.
MATCH (start:N {level: 1}), (end:N {level: 5})
MATCH p = shortestPath((start)-[r*]-(end))
WHERE all(rel IN r WHERE rel.flag IS NULL)
RETURN p
If the predicate requires inspecting the entire path after it has been matched (such as checking whether the path length exceeds a certain value), the planner cannot exclude paths early.
In such cases, a slower, exhaustive search-algorithm is used.
Exhaustive searches may be very time consuming in certain cases, such as when there is no shortest path between two nodes (to disallow exhaustive searches, set dbms.cypher.forbid_exhaustive_shortestpath
to true
).
MATCH (start:N {level: 1}), (end:N {level: 5})
MATCH p = shortestPath((start)-[*]-(end))
WHERE length(p) > 3
RETURN p
For queries that would otherwise trigger an exhaustive search, a practical workaround is to first bind the matched path and then filter it using WITH
and WHERE
.
This allows the planner to use a fast search-algorithm while finding the shortest path, and only afterwards apply the filter.
Note that, because the filter is applied after the fast algorithm runs, it may eliminate all candidate paths and return no results.
MATCH (start:N {level: 1}), (end:N {level: 5})
MATCH p = shortestPath((start)-[*]-(end))
WITH p
WHERE length(p) > 3
RETURN p