Graph Grouping

Large graphs are often hard to understand or visualize.

Tabular results can be aggregated for overviews, e.g. in charts with sums, counts etc.

Grouping nodes by property values into virtual nodes helps to do the same with graph visualizations.

When doing that, relationships between those groups are aggregated too, so you only see the summary information.

This functionality is inspired by the work of Martin Junghanns in the Grouping Demo for the Gradoop Graph Processing system.

Basically you can use any (entity)<-->(entity) graph for the grouping, support for graph projections is on the roadmap.

Here is an example using the dataset from :play movies.

Example on movie graph
MATCH (n)
SET n.century = toInteger(coalesce(n.born,n.released)/100) * 100;

CALL apoc.nodes.group(['Person','Movie'],['century']);
apoc.nodes.group

Sometimes an UI has an issue with the return values of the grouping (list of nodes and list of relationships), then it might help to run:

CALL apoc.nodes.group(['Person','Movie'],['century'])
YIELD nodes, relationships
UNWIND nodes as node
UNWIND relationships as rel
RETURN node, rel;

Usage

CALL apoc.nodes.group(labels,properties, [grouping], [config])

The only required parameters are a label-list (can also be ['*']) and a list of property names to group by (both for rels/nodes).

Optionally you can also provide grouping operators by field and a number of configuration options.

Grouping Operators

For grouping operators, you provide a MAP of operations per field in this form: {fieldName: [operators]}

One map for nodes and one for relationships: [{nodeOperators},{relOperators}]

Possible operators:

  • count_*

  • count

  • sum

  • min/max

  • avg

  • collect

The default is: [{`*`:"count"},{`*`:"count"}] which just counts nodes and relationships.

Configuration

In the config there are more options:

option default description

selfRels

true

show self-relationships in resulting graph

orphans

true

show orphan nodes in resulting graph

limitNodes

-1

limit to maximum of nodes

limitRels

-1

limit to maximum of rels

relsPerNode

-1

limit number of relationships per node

filter

null

a min/max filter by property value, e.g. {User.count_*.min:2} see below

includeRels

[]

relationship types to include. Default is to include all relationship types. Can be a list of types or a single type.

excludeRels

[]

relationship types to exclude. Default is to not exclude any relationship type. Can be a list of types or a single type.

The filter config option is a MAP of {Label/TYPE.operator_property.min/max: number} where the Label/TYPE. prefix is optional.

So you can e.g. filter only for people with a min-age in the grouping of 21: Person.min_age.min: 21 or having at most 10 KNOWS relationships in common: KNOWS.count_*.max:10.

Examples

Graph Setup
CREATE
 (alice:Person {name:'Alice', gender:'female', age:32, kids:1}),
 (bob:Person   {name:'Bob',   gender:'male',   age:42, kids:3}),
 (eve:Person   {name:'Eve',   gender:'female', age:28, kids:2}),
 (graphs:Forum {name:'Graphs',    members:23}),
 (dbs:Forum    {name:'Databases', members:42}),
 (alice)-[:KNOWS {since:2017}]->(bob),
 (eve)-[:KNOWS   {since:2018}]->(bob),
 (alice)-[:MEMBER_OF]->(graphs),
 (alice)-[:MEMBER_OF]->(dbs),
 (bob)-[:MEMBER_OF]->(dbs),
 (eve)-[:MEMBER_OF]->(graphs)
Query
CALL apoc.nodes.group(['*'],['gender'],
  [{`*`:'count', age:'min'}, {`*`:'count'} ])
Table 1. Result
nodes relationships node relationship

[(:Person {gender: "female",min_age: 28,count_*: 2})]

[[:MEMBER_OF {count_*: 3}], [:KNOWS {count_*: 2}]]

(:Person {gender: "female",min_age: 28,count_*: 2})

[:MEMBER_OF {count_*: 3}]

[(:Person {gender: "female",min_age: 28,count_*: 2})]

[[:KNOWS {count_*: 2}]]

(:Person {gender: "female",min_age: 28,count_*: 2})

[:KNOWS {count_*: 2}]

[(:Person {gender: "male",min_age: 42,count_*: 1})]

[[:MEMBER_OF {count_*: 1}]]

(:Person {gender: "male",min_age: 42,count_*: 1})

[:MEMBER_OF {count_*: 1}]

[(:Forum {gender: null,count_*: 2})]

[]

(:Forum {gender: null,count_*: 2})

null

Note that this query doesn’t work in Neo4j Browser in "Graph" mode but only in "Table" mode (or also in cypher-shell) because, since Forum does not have the gender property, in node result there will be a "gender": null property which is not supported and returns a TypeError. Instead, the query below also works in "Graph" mode:

CALL apoc.nodes.group(
        ['Person'],['gender'],
        [{`*`:'count', kids:'sum', age:['min', 'max', 'avg'], gender:'collect'},
         {`*`:'count', since:['min', 'max']}]);

Larger Example

Graph setup
WITH ["US","DE","UK","FR","CA","BR","SE"] AS tld
UNWIND range(1,1000) AS id
CREATE (u:User {id:id, age : id % 100, female: rand() < 0.5, name: "Name "+id, country:tld[toInteger(rand()*size(tld))]})
WITH collect(u) AS users
UNWIND users AS u
WITH u, users[toInteger(rand()*size(users))] AS u2
WHERE u <> u2
MERGE (u)-[:KNOWS]-(u2);
CALL apoc.nodes.group(['*'], ['country'])
YIELD node, relationship return *
grouping country all
Query
CALL apoc.nodes.group(['*'], ['country'], null,
    {selfRels:false, orphans:false,
     filter:{`User.count_*.min`:130,`KNOWS.count_*.max`:200}})
YIELD node, relationship return *
grouping country filter

To visualize this result in Neo4j Browser it’s useful to have a custom Graph Style Sheet (GRASS) that renders the grouped properties with some of the aggregations.

node {
  diameter: 50px;
  color: #A5ABB6;
  border-color: #9AA1AC;
  border-width: 2px;
  text-color-internal: #FFFFFF;
  font-size: 10px;
}

relationship {
  color: #A5ABB6;
  shaft-width: 3px;
  font-size: 8px;
  padding: 3px;
  text-color-external: #000000;
  text-color-internal: #FFFFFF;
  caption: '{count_*}';
}

node.Country {
  color: #68BDF6;
  diameter: 80px;
  border-color: #5CA8DB;
  text-color-internal: #FFFFFF;
  caption: '{country} ({count_*})';
}