This blog series is designed to help you better utilize graph analytics and graph algorithms so you can effectively innovate and develop intelligent solutions faster using a graph database.
Last week we began our look at Centrality algorithms, starting with the PageRank algorithm.
This week we’ll look at Degree Centrality, which measures the number of relationships a node has. It’s broken into indegree (flowing in) and outdegree (flowing out), where relationships are directed, using examples from Neo4j
About Degree Centrality
Degree Centrality is the simplest of all the centrality algorithms. It measures the number of incoming and outgoing relationships from a node. The algorithm helps us find popular nodes in a graph.
Degree Centrality was proposed by Linton C. Freeman in his 1979 paper, “Centrality in Social Networks Conceptual Clarification.” While the algorithm is usually used to find the popularity of individual nodes, it is often used as part of a global analysis where we calculate the minimum degree, maximum degree, mean degree and standard deviation across the whole graph.
When Should I Use Degree Centrality?
- Degree Centrality is an important component of any attempt to analyze influence by looking at the number of incoming and outgoing relationships, such as connections of people on a social network. For example, in BrandWatch’s most influential men and women on Twitter 2017, the top five people in each category have over 40 million followers each.
- Weighted Degree Centrality has been used to help separate fraudsters from legitimate users of an online auction. The weighted centrality for fraudsters is significantly higher because they tend to collude with each other to artificially increase the price of items. Read more in “Two Step Graph-Based Semi-Supervised Learning for Online Auction Fraud Detection.”
Degree Centrality Example
Let’s see how Degree Centrality works on a small dataset. The following Cypher statement creates a Twitter-esque graph of users and followers.
MERGE (nAlice:User {id:"Alice"}) MERGE (nBridget:User {id:"Bridget"}) MERGE (nCharles:User {id:"Charles"}) MERGE (nDoug:User {id:"Doug"}) MERGE (nMark:User {id:"Mark"}) MERGE (nMichael:User {id:"Michael"}) MERGE (nAlice)-[:FOLLOWS]->(nDoug) MERGE (nAlice)-[:FOLLOWS]->(nBridget) MERGE (nAlice)-[:FOLLOWS]->(nCharles) MERGE (nMark)-[:FOLLOWS]->(nDoug) MERGE (nMark)-[:FOLLOWS]->(nMichael) MERGE (nBridget)-[:FOLLOWS]->(nDoug) MERGE (nCharles)-[:FOLLOWS]->(nDoug) MERGE (nMichael)-[:FOLLOWS]->(nDoug)
The following query calculates the number of people that each user follows and is followed by.
MATCH (u:User) RETURN u.id AS name, size((u)-[:FOLLOWS]->()) AS follows, size((u)<-[:FOLLOWS]-()) AS followers
Doug is the most popular user in our imaginary Twitter graph with five followers; all other users follow him but he doesn’t follow anybody back. In the real Twitter network, celebrities have high follower counts but tend to follow few people. We could therefore consider Doug a celebrity!
Conclusion
As we showed in our look at the Degree Centrality algorithm, Degree Centrality looks at immediate connectedness for uses such as evaluating the near-term risk of a person catching a virus or the probability of a person hearing a given piece of information. Next week we'll continue our look at Centrality algorithms, with a focus on the Betweenness Centrality algorithm.
Learn about the power of graph algorithms in the O'Reilly book,
Graph Algorithms: Practical Examples in Apache Spark and Neo4j by the authors of this article. Click below to get your free ebook copy.
Get the O'Reilly Ebook