Graph Algorithms in Neo4j: Triangle Count & Clustering Coefficient

Graph algorithms provide the means to understand, model and predict complicated dynamics such as the flow of resources or information, the pathways through which contagions or network failures spread, and the influences on and resiliency of groups.

This blog series is designed to help you better utilize graph analytics and graph algorithms so you can effectively innovate and develop intelligent solutions faster using a graph database like Neo4j.

Last week we once again looked at Community Detection algorithms, with a focus on the Louvain Modularity algorithm.

Learn more about Triangle Count and Clustering Coefficient graph algorithms in Neo4j.

This week we wrap up our exploration of Community Detection algorithms, with a look at the Triangle Count and Average Clustering Coefficient algorithm, which measures how many nodes have triangles and the degree to which nodes tend to cluster together. The average clustering coefficient is 1 when there is a clique, and 0 when there are no connections.

About Triangle Count and Average Clustering Coefficient

Triangle Count is a community detection graph algorithm that is used to determine the number of triangles passing through each node in the graph. A triangle is a set of three nodes, where each node has a relationship to all other nodes.

Triangle counting gained popularity in social network analysis, where it is used to detect communities and measure the cohesiveness of those communities. It is also used to determine the stability of a graph and is often used as part of the computation of network indices, such as the clustering coefficient.

There are two types of clustering coefficients:

Local clustering coefficient

The local clustering coefficient of a node is the likelihood that its neighbors are also connected. The computation of this score involves triangle counting.

Global clustering coefficient

The global clustering coefficient is the normalized sum of those local clustering coefficients. The transitivity coefficient of a graph is sometimes used, which is three times the number of triangles divided by the number of triples in the graph. For more information, see Finding, Counting and Listing all Triangles in Large Graphs, An Experimental Study.”

When Should I Use Triangle Count and Clustering Coefficient?

Triangle Count and Clustering Coefficient have been shown to be useful as features for classifying a given website as spam or non-spam content. This is described in Efficient Semi-streaming Algorithms for Local Triangle Counting in Massive Graphs.”
Clustering Coefficient has been used to investigate the community structure of Facebook’s social graph, where they found dense neighborhoods of users in an otherwise sparse global graph. Find this study in The Anatomy of the Facebook Social Graph.”
Clustering Coefficient has been proposed to help explore the thematic structure of the Web and detect communities of pages with a common topic based on the reciprocal links between them. For more information, see Curvature of co-links uncovers hidden thematic layers in the World Wide Web.“

Triangles Example

Let’s see how the Triangle Count and Clustering Coefficient algorithm works on a small dataset. The following Cypher statement creates a graph with people and KNOWS relationships between them.

MERGE (alice:Person{id:"Alice"})
MERGE (michael:Person{id:"Michael"})
MERGE (karin:Person{id:"Karin"})
MERGE (chris:Person{id:"Chris"})
MERGE (will:Person{id:"Will"})
MERGE (mark:Person{id:"Mark"})
MERGE (michael)-[:KNOWS]->(karin)
MERGE (michael)-[:KNOWS]->(chris)
MERGE (will)-[:KNOWS]->(michael)
MERGE (mark)-[:KNOWS]->(michael)
MERGE (mark)-[:KNOWS]->(will)
MERGE (alice)-[:KNOWS]->(michael)
MERGE (will)-[:KNOWS]->(chris)
MERGE (chris)-[:KNOWS]->(karin);

The following query finds all the KNOWS triangles between people in our graph.

CALL algo.triangle.stream("Person","KNOWS")
YIELD nodeA,nodeB,nodeC
MATCH (a:Person) WHERE id(a) = nodeA
MATCH (b:Person) WHERE id(b) = nodeB
MATCH (c:Person) WHERE id(c) = nodeC
RETURN a.id AS nodeA, b.id AS nodeB, c.id AS node

We can see that there are KNOWS triangles containing “Will, Michael and Chris”, “Will, Mark and Michael”, and “Michael, Karin and Chris.” This means that everybody in the triangle knows each other. We work out the clustering coefficient of each person by running the following algorithm.

CALL algo.triangleCount.stream('Person', 'KNOWS')
YIELD nodeId, triangles, coefficient
MATCH (p:Person) WHERE id(p) = nodeId
RETURN p.id AS name, triangles, coefficient
ORDER BY coefficient DESC

We learn that Michael is part of the most triangles, but it’s Karin and Mark who are the best at introducing their friends – all of the people who know them, know each other!

Conclusion

As we’ve seen, the Average Clustering Coefficient is often used to estimate whether a network might exhibit “small-world” behaviors that are based on tightly knit clusters.

Next week we’ll look at graph algorithms in practice, where we’ll learn how to apply them in data-intensive applications using data from Yelp’s annual dataset challenge.

Find the patterns in your connected data
Learn about the power of graph algorithms in the O’Reilly book,
Graph Algorithms: Practical Examples in Apache Spark and Neo4j by the authors of this article. Click below to get your free ebook copy.

Get the O’Reilly Ebook