COLLECT subqueriesIntroduced in 5.6
A COLLECT subquery expression can be used to create a list with the rows returned by a given subquery.
COLLECT subqueries differ from COUNT and EXISTS subqueries in that the final RETURN clause is mandatory.
The RETURN clause must return exactly one column.
Example graph
The following graph is used for the examples below:
To recreate the graph, run the following query against an empty Neo4j database:
CREATE
(andy:Swedish:Person {name: 'Andy', age: 36}),
(timothy:Person {name: 'Timothy', nickname: 'Tim', age: 25}),
(peter:Person {name: 'Peter', nickname: 'Pete', age: 35}),
(andy)-[:HAS_DOG {since: 2016}]->(:Dog {name:'Andy'}),
(timothy)-[:HAS_CAT {since: 2019}]->(:Cat {name:'Mittens'}),
(fido:Dog {name:'Fido'})<-[:HAS_DOG {since: 2010}]-(peter)-[:HAS_DOG {since: 2018}]->(:Dog {name:'Ozzy'}),
(fido)-[:HAS_TOY]->(:Toy{name:'Banana'})
Simple COLLECT subquery
Variables introduced by the outside scope can be used in the COLLECT subquery without importing them.
In this regard, COLLECT subqueries are different from CALL subqueries, which do require importing.
The following query exemplifies this and outputs the owners of the dog named Ozzy:
MATCH (person:Person)
WHERE 'Ozzy' IN COLLECT { MATCH (person)-[:HAS_DOG]->(dog:Dog) RETURN dog.name }
RETURN person.name AS name
| name |
|---|
|
Rows: 1 |
COLLECT subquery with WHERE clause
A WHERE clause can be used inside the COLLECT subquery.
Variables introduced by the MATCH clause and the outside scope can be used in the inner scope.
MATCH (person:Person)
RETURN person.name as name, COLLECT {
MATCH (person)-[r:HAS_DOG]->(dog:Dog)
WHERE r.since > 2017
RETURN dog.name
} as youngDogs
| name | youngDogs |
|---|---|
|
|
|
|
|
|
Rows: 3 |
|
COLLECT subquery with a UNION
COLLECT can be used with a UNION clause.
The below example shows the collection of pet names each person has by using a UNION clause:
MATCH (person:Person)
RETURN
person.name AS name,
COLLECT {
MATCH (person)-[:HAS_DOG]->(dog:Dog)
RETURN dog.name AS petName
UNION
MATCH (person)-[:HAS_CAT]->(cat:Cat)
RETURN cat.name AS petName
} AS petNames
| name | petNames |
|---|---|
|
|
|
|
|
|
Rows: 3 |
|
COLLECT subquery with WITH
Variables from the outside scope are visible for the entire subquery, even when using a WITH clause.
To avoid confusion, shadowing of these variables is not allowed.
An outside scope variable is shadowed when a newly introduced variable within the inner scope is defined with the same variable.
In the example below, the outer variable name is shadowed and will therefore throw an error.
WITH 'Peter' as name
MATCH (person:Person {name: name})
RETURN COLLECT {
WITH 'Ozzy' AS name
MATCH (person)-[r:HAS_DOG]->(d:Dog {name: name})
RETURN d.name
} as dogsOfTheYear
The variable `name` is shadowing a variable with the same name from the outer scope and needs to be renamed (line 4, column 20 (offset: 92))
New variables can be introduced into the subquery, as long as they use a different identifier.
In the example below, a WITH clause introduces a new variable.
Note that the outer scope variable person referenced in the main query is still available after the WITH clause.
MATCH (person:Person)
RETURN person.name AS name, COLLECT {
WITH 2018 AS yearOfTheDog
MATCH (person)-[r:HAS_DOG]->(d:Dog)
WHERE r.since = yearOfTheDog
RETURN d.name
} as dogsOfTheYear
| name | dogsOfTheYear |
|---|---|
|
|
|
|
|
|
Rows: 3 |
|
Using COLLECT subqueries inside other clauses
COLLECT can be used in any position in a query, with the exception of administration commands, where the COLLECT expression is restricted.
See a few examples below of how COLLECT can be used in different positions within a query:
Using COLLECT in RETURN
MATCH (person:Person)
RETURN person.name,
COLLECT {
MATCH (person)-[:HAS_DOG]->(d:Dog)
MATCH (d)-[:HAS_TOY]->(t:Toy)
RETURN t.name
} as toyNames
| person.name | toyNames |
|---|---|
|
|
|
|
|
|
Rows: 3 |
|
Using COLLECT in SET
MATCH (person:Person) WHERE person.name = "Peter"
SET person.dogNames = COLLECT { MATCH (person)-[:HAS_DOG]->(d:Dog) RETURN d.name }
RETURN person.dogNames as dogNames
| dogNames |
|---|
|
Rows: 1 |
Using COLLECT in CASE
MATCH (person:Person)
RETURN
CASE
WHEN COLLECT { MATCH (person)-[:HAS_DOG]->(d:Dog) RETURN d.name } = [] THEN "No Dogs " + person.name
ELSE person.name
END AS result
| result |
|---|
|
|
|
Rows: 3 |
Using COLLECT as a grouping key
The following query collects all persons by their dogs' names, and then calculates the average age for each group.
MATCH (person:Person)
RETURN COLLECT { MATCH (person)-[:HAS_DOG]->(d:Dog) RETURN d.name } AS dogNames,
avg(person.age) AS averageAge
ORDER BY dogNames
| dogNames | averageAge |
|---|---|
|
|
|
|
|
|
Rows: 3 |
|
Using COLLECT vs collect()
COLLECT does not handle null values in the same way that the aggregating function collect() does.
The collect() function automatically removes null values.
COLLECT will not remove null values automatically.
However, they can be removed by adding a filtering step in the subquery.
The following queries illustrate these differences:
MATCH (p:Person)
RETURN collect(p.nickname) AS names
| names |
|---|
|
Rows: 1 |
RETURN COLLECT {
MATCH (p:Person)
RETURN p.nickname ORDER BY p.nickname
} AS names
| names |
|---|
|
Rows: 1 |
RETURN COLLECT {
MATCH (p:Person)
WHERE p.nickname IS NOT NULL
RETURN p.nickname ORDER BY p.nickname
} AS names
| name |
|---|
|
Rows: 1 |
Rules
The following is true for COLLECT subqueries:
-
Any non-writing query is allowed.
-
The final
RETURNclause is mandatory when using aCOLLECTsubquery. TheRETURNclause must return exactly one column. -
A
COLLECTsubquery can appear anywhere in a query that an expression is valid. -
Any variable that is defined in the outside scope can be referenced inside the
COLLECTsubquery’s own scope. -
Variables introduced inside the
COLLECTsubquery are not part of the outside scope and therefore cannot be accessed on the outside.