Centrality Algorithms
In this guide, we will learn about centrality graph algorithms.
Please have Neo4j (version 4.0 or later) and the Graph Data Science Library downloaded and installed to use centrality algorithms.
Beginner
What are centrality algorithms?
Centrality algorithms are one of the traditional categories of graph algorithms. They find the important nodes in a graph, where importance can mean that a node:
-
has a lot of direct connections
-
is transitively connected to other important nodes
-
can reach other nodes with few hops
-
sits on the shortest path of lots of pairs of nodes.
The Neo4j Graph Data Science Library supports many different centrality algorithms.
Degree Centrality
The Degree Centrality algorithm counts the number of incoming and outgoing relationships from a node. It is used to find popular nodes in a graph, and has the following use cases:
-
Degree centrality is an important component of any attempt to determine the most important people on a social network. For example, in BrandWatch’s most influential men and women on Twitter 2017 the top 5 people in each category have over 40m followers each.
-
Two Step graph-based semi-supervised Learning for Online Auction Fraud Detection describes how weighted degree centrality has been used to help separate fraudsters from legitimate users of an online auction. The centrality for fraudsters is significantly higher because they collude with each other to artificially increase prices.
Closeness Centrality
The Closeness Centrality algorithm is a way of detecting nodes that are able to spread information efficiently through a subgraph. It measures the average farness (inverse distance) from a node to all other nodes. Nodes with a high closeness score have, on average, the shortest distances to all other nodes. Closeness Centrality has the following use cases:
-
Closeness centrality is used to research organizational networks, where individuals with high closeness centrality are in a favourable position to control and acquire vital information and resources within the organization. One such study is "Mapping Networks of Terrorist Cells" by Valdis E. Krebs.
-
Closeness centrality can also used in networks where information spreads through all shortest paths simultaneously, such as infection spreading through a social network. Find more details in "Centrality and network flow" by Stephen P. Borgatti.
-
Closeness centrality has been used to estimate the importance of words in a document, based on a graph-based keyphrase extraction process. This process is described by Florian Boudin in "A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction".
Harmonic Centrality
The Harmonic Centrality algorithm is a variant of Closeness Centrality that gives a more accurate measure of closeness when run on graphs where not all nodes are reachable from each other.
Betweenness Centrality
Betweenness Centrality is a way of detecting the amount of influence a node has over the flow of information or resources in a graph. It can be used to find nodes that serve as bridges from one part of a graph to another. Betweenness Centrality has the following use cases:
-
Betweenness centrality is used to measure the network flow in package delivery processes or telecommunications networks. These networks are characterized by traffic that has a known target and takes the shortest path possible. This, and other scenarios, are described in "Centrality and network flow".
-
Betweenness centrality is used to identify influencers in legitimate, or criminal, organizations. Influencers in organizations are not necessarily in management positions, but instead can be found in brokerage positions of the organizational network. The removal of these influencers could destabilize the organization. More detail can be found in "Brokerage qualifications in ringing operations".
-
Betweenness centrality can be used to help microbloggers spread their reach on Twitter, with a recommendation engine that targets influencers that they should interact with in future. This approach is described in "Making Recommendations in a Microblog to Improve the Impact of a Focal User".
Eigenvector Centrality
The Eigenvector Centrality algorithm measures the transitive (or directional) influence of nodes. Relationships to high-scoring nodes contribute more to the score of a node than connections to low-scoring nodes. A high score means that a node is connected to other nodes that have high scores.
PageRank
PageRank measures the transitive (or directional) influence of nodes and is a variant of the Eigenvector Centrality algorithm. Eigenvector Centrality can be used on undirected graphs, whereas the PageRank algorithm is more suited to directed graphs. It has the following use cases:
-
Personalized PageRank is used by Twitter to present users with recommendations of other accounts that they may wish to follow. The algorithm is run over a graph which contains shared interests and common connections. Their approach is described in more detail in "WTF: The Who to Follow Service at Twitter".
-
PageRank has been used to rank public spaces or streets, predicting traffic flow and human movement in these areas. The algorithm is run over a graph which contains intersections connected by roads, where the PageRank score reflects the tendency of people to park, or end their journey, on each street. This is described in more detail in "Self-organized Natural Roads for Predicting Traffic Flow: A Sensitivity Study".
-
PageRank can be used in an anomaly or fraud detection system in the healthcare and insurance industries. It helps find doctors or providers that are behaving in an unusual manner, and the score can be fed into a machine learning algorithm.
ArticleRank
ArticleRank measures the transitive (or directional) influence of nodes and is a variant of the PageRank algorithm. ArticleRank weakens Page Rank’s assumption that relationships from nodes that have a low out-degree are more important than relationships from nodes with a higher out-degree.
Was this page helpful?