Using max() and min() while keeping items
The aggregation functions of max() and min() are very useful, but you can sometimes find yourself fighting against Cypher’s aggregation behavior for cases that should be simple.
This often comes up when you want to calculate the max() or min() or something, but keep the item or items associated with that maximum or minimum value.
Let’s use a very simple example, a graph of people who bought food at a store:
(:Person {name})-[:BOUGHT]->(:FoodItem {name})We want to find, per person, the food item or items that they bought the most.
Complex when it should be simple
We can easily use max() to find the count of the food item that they bought the most:
MATCH (person:Person)-[:BOUGHT]->(food:FoodItem)
WITH person, food, count(food) as timesBought
RETURN person, max(timesBought) as mostBoughtCountBut we lose the data associated with the food that generated that result! WHICH food item was bought that many times? And was it just a single food item, or were there ties among several?
If we keep food in scope like so: RETURN person, food, max(timesBought) as mostBoughtCount, we get a wrong result, since each food is listed on its own row, and the mostBoughtCount is for each food and not aggregated across all of them.
If we collect() the food like so: RETURN person, collect(food) as foods, max(timesBought) as mostBoughtCount, while mostBoughtCount is correct, we’ve collected all the foods, and have no idea which one is associated with that maximum value.
We’re forced to abandon this approach and instead perform an ordering, a collect(), then keep the top result:
MATCH (person:Person)-[:BOUGHT]->(food:FoodItem)
WITH person, food, count(food) as timesBought
ORDER BY timesBought DESC
RETURN person, collect(food)[0] as favoriteFood, max(timesBought) as mostBoughtCountBut again, what about ties? A person might have several favorite foods tied in their mostBoughtCount. We might spend a lot of time refactoring that query, doing collects() and UNWINDs and counting and comparison, and the query gets even more complex.
APOC Procedures helps keep things simple
First, we were granted custom procedures, then we were given custom functions, and last we received the ability to write custom aggregation functions. As of APOC 3.5.0.5, new functions were added that help out in these cases.
| 
 | returns a map {items:[], value:n} where  | 
There is an apoc.agg.minItems() as well that works similarly.
In short, this function lets us use the equivalent of min() or max(), but also to keep the item or items associated with that value.
If we add this to our query we get:
MATCH (person:Person)-[:BOUGHT]->(food:FoodItem)
WITH person, food, count(food) as timesBought
WITH person, apoc.agg.maxItems(food, timesBought) as maxData
RETURN person, maxData.items as favoriteFoods, maxData.value as mostBoughtCountThis lets us keep all the foods that tied as favorites, and if we did want to limit ties, we could add that as an additional parameter to the function call.
Was this page helpful?