This blog series is designed to help you better utilize graph analytics and graph algorithms so you can effectively innovate and develop intelligent solutions faster using a graph database like Neo4j.
Last week we once again looked at Community Detection algorithms, with a focus on the Louvain Modularity algorithm.
This week we wrap up our exploration of Community Detection algorithms, with a look at the Triangle Count and Average Clustering Coefficient algorithm, which measures how many nodes have triangles and the degree to which nodes tend to cluster together. The average clustering coefficient is 1 when there is a clique, and 0 when there are no connections.
About Triangle Count and Average Clustering Coefficient
Triangle Count is a community detection graph algorithm that is used to determine the number of triangles passing through each node in the graph. A triangle is a set of three nodes, where each node has a relationship to all other nodes.
Triangle counting gained popularity in social network analysis, where it is used to detect communities and measure the cohesiveness of those communities. It is also used to determine the stability of a graph and is often used as part of the computation of network indices, such as the clustering coefficient.
There are two types of clustering coefficients:
Local clustering coefficient
The local clustering coefficient of a node is the likelihood that its neighbors are also connected. The computation of this score involves triangle counting.Global clustering coefficient
The global clustering coefficient is the normalized sum of those local clustering coefficients. The transitivity coefficient of a graph is sometimes used, which is three times the number of triangles divided by the number of triples in the graph. For more information, see Finding, Counting and Listing all Triangles in Large Graphs, An Experimental Study.”When Should I Use Triangle Count and Clustering Coefficient?
- Triangle Count and Clustering Coefficient have been shown to be useful as features for classifying a given website as spam or non-spam content. This is described in Efficient Semi-streaming Algorithms for Local Triangle Counting in Massive Graphs.”
- Clustering Coefficient has been used to investigate the community structure of Facebook’s social graph, where they found dense neighborhoods of users in an otherwise sparse global graph. Find this study in The Anatomy of the Facebook Social Graph.”
- Clustering Coefficient has been proposed to help explore the thematic structure of the Web and detect communities of pages with a common topic based on the reciprocal links between them. For more information, see Curvature of co-links uncovers hidden thematic layers in the World Wide Web.“
Triangles Example
Let’s see how the Triangle Count and Clustering Coefficient algorithm works on a small dataset. The following Cypher statement creates a graph with people and
KNOWS
relationships between them.MERGE (alice:Person{id:"Alice"}) MERGE (michael:Person{id:"Michael"}) MERGE (karin:Person{id:"Karin"}) MERGE (chris:Person{id:"Chris"}) MERGE (will:Person{id:"Will"}) MERGE (mark:Person{id:"Mark"}) MERGE (michael)-[:KNOWS]->(karin) MERGE (michael)-[:KNOWS]->(chris) MERGE (will)-[:KNOWS]->(michael) MERGE (mark)-[:KNOWS]->(michael) MERGE (mark)-[:KNOWS]->(will) MERGE (alice)-[:KNOWS]->(michael) MERGE (will)-[:KNOWS]->(chris) MERGE (chris)-[:KNOWS]->(karin);
The following query finds all the
KNOWS
triangles between people in our graph.CALL algo.triangle.stream("Person","KNOWS") YIELD nodeA,nodeB,nodeC MATCH (a:Person) WHERE id(a) = nodeA MATCH (b:Person) WHERE id(b) = nodeB MATCH (c:Person) WHERE id(c) = nodeC RETURN a.id AS nodeA, b.id AS nodeB, c.id AS node
We can see that there are
KNOWS
triangles containing “Will, Michael and Chris”, “Will, Mark and Michael”, and “Michael, Karin and Chris.” This means that everybody in the triangle knows each other. We work out the clustering coefficient of each person by running the following algorithm.CALL algo.triangleCount.stream('Person', 'KNOWS') YIELD nodeId, triangles, coefficient MATCH (p:Person) WHERE id(p) = nodeId RETURN p.id AS name, triangles, coefficient ORDER BY coefficient DESC
We learn that Michael is part of the most triangles, but it’s Karin and Mark who are the best at introducing their friends – all of the people who know them, know each other!
Conclusion
As we’ve seen, the Average Clustering Coefficient is often used to estimate whether a network might exhibit “small-world” behaviors that are based on tightly knit clusters.
Next week we’ll look at graph algorithms in practice, where we’ll learn how to apply them in data-intensive applications using data from Yelp’s annual dataset challenge.
Learn about the power of graph algorithms in the O’Reilly book,
Graph Algorithms: Practical Examples in Apache Spark and Neo4j by the authors of this article. Click below to get your free ebook copy.
Get the O’Reilly Ebook