Graph Algorithms in Neo4j: Louvain Modularity


Graph algorithms provide the means to understand, model and predict complicated dynamics such as the flow of resources or information, the pathways through which contagions or network failures spread, and the influences on and resiliency of groups.

This blog series is designed to help you better utilize graph analytics and graph algorithms so you can effectively innovate and develop intelligent solutions faster using a graph database like Neo4j.

Last week we continued our focus on Community Detection algorithms, with a look at the Label Propagation algorithm.

Learn more about the Louvain Modularity algorithm.


This week we continue our exploration of Community Detection algorithms, with a look at the Louvain Modularity algorithm, which measures the quality (i.e., presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network.

About Louvain Modularity


The Louvain method of community detection is an algorithm for detecting communities in networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.

The Louvain algorithm is one of the fastest modularity-based algorithms and works well with large graphs. It also reveals a hierarchy of communities at different scales, which is useful for understanding the global functioning of a network.

In order to understand the Louvain modularity algorithm, we must first look at modularity in general.

Learn more about modularity in a graph database.


Modularity is a measure of how well groups have been partitioned into clusters. It compares the relationships in a cluster compared to what would be expected for a random (or other baseline) number of connections.

Compare data relationships in a cluster.


The Louvain algorithm was proposed in 2008. The method consists of repeated application of two steps. The first step is a “greedy” assignment of nodes to communities, favoring local optimizations of modularity. The second step is the definition of a new coarse-grained network based on the communities found in the first step. These two steps are repeated until no further modularity-increasing reassignments of communities are possible.

When Should I Use Louvain?


    • The Louvain method has been proposed to provide recommendations for Reddit users to find similar subreddits based on general user behavior. For more details, see Subreddit Recommendations within Reddit Communities.”
    • The Louvain method has been used to extract topics from online social platforms, such as Twitter and YouTube, based on the co-occurrence graph of terms in documents as a part of the topic modeling process. This process is described in Topic Modeling based on Louvain method in Online Social Networks.
    • The Louvain method has been used to investigate the human brain and find hierarchical community structures within the brain’s functional network. The study mentioned is Hierarchical Modularity in Human Brain Functional Networks.
TIP: Although the Louvain method, and modularity optimization algorithms more generally, have found wide applications across many domains, some problems with these algorithms have been identified:

1. The resolution limit
For larger networks, the Louvain method doesn’t stop with the “intuitive” communities. Instead, there’s a second pass through the community modification and coarse-graining stages, in which several of the intuitive communities are merged together. This is a general problem with modularity optimization algorithms; they have trouble detecting small communities in large networks. It’s a virtue of the Louvain method that something close to the intuitive community structure is available as an intermediate step in the process.

2. The degeneracy problem
There is typically an exponentially large (in network size) number of community assignments with modularities close to the maximum. This can be a severe problem because, in the presence of a large number of high modularity solutions, it’s hard to find the global maximum and difficult to determine if the global maximum is truly more scientifically important than local maxima that achieve similar modularity. Research shows that the different locally optimal community assignments have different structural properties. For more information, see The performance of modularity maximization in practical contexts.”

Louvain Example


Let’s see the Louvain algorithm in action. The following Cypher statement creates a graph of users and friends:

MERGE (nAlice:User {id:"Alice"})
MERGE (nBridget:User {id:"Bridget"})
MERGE (nCharles:User {id:"Charles"})
MERGE (nDoug:User {id:"Doug"})
MERGE (nMark:User {id:"Mark"})
MERGE (nMichael:User {id:"Michael"})
MERGE (nAlice)-[:FRIEND]->(nBridget)
MERGE (nAlice)-[:FRIEND]->(nCharles)
MERGE (nMark)-[:FRIEND]->(nDoug)
MERGE (nBridget)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nMark)
MERGE (nAlice)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nDoug);

Check out the Louvain Modularity graph model.

Now we run Louvain to find communities in the social network. Execute the following query:

CALL algo.louvain.stream("User", "FRIEND", {})
YIELD nodeId, community
MATCH (user:User) WHERE id(user) = nodeId
RETURN user.id AS user, community
ORDER BY community;

See the results of running the Louvain Modularity graph algorithm.


Our algorithm found two communities with three members each.

Mark, Doug and Charles are all friends with each other, as are Bridget, Alice and Michael. Charles is the only one who has friends in both communities, but he has more in community four so he fits better in that one.

Conclusion


As we’ve seen, the Louvain Modularity algorithm is used to evaluate social structures in Twitter, LinkedIn and YouTube. It’s also used in fraud analytics to evaluate whether a group has just a few bad behaviors or is acting as a fraud ring, which would be indicated by a higher relationship density than average.

Next week we’ll continue our focus on Community Detection algorithms, with a look at the Triangle Count and Average Clustering Coefficient algorithm.


Find the patterns in your connected data
Learn about the power of graph algorithms in the O’Reilly book,
Graph Algorithms: Practical Examples in Apache Spark and Neo4j by the authors of this article. Click below to get your free ebook copy.


Get the O’Reilly Ebook