Post-UNION processing
Cypher does not allow further processing of UNION
or UNION ALL
results, since RETURN
is required in all queries of the union.
Here are some workarounds.
Post-UNION processing in Neo4j 4.0
With Neo4j 4.0, post-UNION processing is now possible via subqueries.
An example of usage:
CALL {
MATCH (movieOrPerson:Movie) RETURN movieOrPerson
UNION
MATCH (movieOrPerson:Person) RETURN movieOrPerson
}
WITH movieOrPerson
...
This allows us to keep working with the results of the UNION subquery.
However, with the initial 4.0 release only uncorrelated subqueries are supported, meaning the subquery cannot use variables from outside the call. This means that using a subquery for post-UNION processing in the middle of a more complex query may not be possible, as you cannot pass in variables from the outer query for use within the subquery.
In 4.0.x, correlated subqueries, which are more useful and can use variables from outside the subquery, are currently only available when using Neo4j Fabric.
Post-UNION processing enhancements in 4.1+
With the release of 4.1, the CALL subquery functionality has been enhanced to allow correlated subqueries. This lets us use existing variables mid-query within the subquery.
This requires the use of WITH
as the first clause within the subquery CALL block, for the purpose of importing variables to the subquery.
When using UNION
or UNION ALL
, we can provide a similar importing WITH
clause for each of the unioned queries:
MATCH (m:Movie {title:'The Matrix'})
CALL {
WITH m
MATCH (m)<-[:ACTED_IN]-(p)
RETURN p
UNION
WITH m
MATCH (m)<-[:DIRECTED]-(p)
RETURN p
}
RETURN p.name as name
ORDER BY name ASC
This will correctly return the alphabetically-sorted names of all actors and directors for The Matrix.
This import usage has some special restrictions that do not normally apply to WITH
usage:
-
You may only include variables from the outer query and no others. + You cannot perform calculations, aggregations, or introduction of new variables in the initial
WITH
. -
You cannot alias any variables within this initial
WITH
. -
You cannot follow the initial
WITH
with aWHERE
clause for filtering.
If you try any of these, you will be met with some kind of error, such as:
Importing WITH should consist only of simple references to outside variables. Aliasing or expressions are not supported.
or more cryptically, if you try to use a WHERE
clause after the initial WITH
Variable `x` not defined
(where the variable is the first one present in the WITH
clause)
You can get around all of these restrictions by simply introducing an additional WITH
clause after the importing WITH
, like so:
MATCH (m:Movie)
WHERE m.title CONTAINS 'Matrix'
CALL {
WITH m
WITH m as movie
MATCH (m)<-[:DIRECTED]-(p)
RETURN p.name as name
UNION
WITH m
WITH m
WHERE m.title CONTAINS 'Reloaded'
MATCH (m)<-[:ACTED_IN]-(p)
RETURN p.name as name
}
RETURN DISTINCT name
ORDER BY name ASC
This demonstrates both the ability to alias the imported variable as well as to filter the imported variable provided we use a second WITH
clause,
which is not restricted in the same way as the initial WITH
used for the import into the subquery.
For earlier versions, native subqueries are not available, so other workarounds must be used.
Combine collections, then UNWIND
back to rows and apply DISTINCT
MATCH (m:Movie)
WITH collect(m) AS movies
MATCH (p:Person)
WITH movies + collect(p) AS moviesAndPeople
UNWIND moviesAndPeople AS movieOrPerson
WITH DISTINCT movieOrPerson
...
DISTINCT
isn’t really needed in the above query, but it will be needed if it’s possible for a result to be present in multiple collections being combined, provided you want distinct values.
Use apoc.cypher.run() to return UNION
results from a subquery
Using APOC Procedures, you can use apoc.cypher.run()
to execute a UNION
within a subquery, and return its results.
CALL apoc.cypher.run('
MATCH (movieOrPerson:Movie)
RETURN movieOrPerson
UNION
MATCH (movieOrPerson:Person)
RETURN movieOrPerson',
{}) yield value
WITH value.movieOrPerson as movieOrPerson
...
Remember that procedure calls are executed per-row, so using this approach when multiple rows already exist may lead to unintended and unexpected results.
Was this page helpful?