This blog series is designed to help you better utilize graph analytics and graph algorithms so you can effectively innovate and develop intelligent solutions faster using a graph database.
Last week we continued our look at pathfinding and graph search algorithms, with the All Pairs Shortest Path algorithm, which calculates a shortest path forest (group) containing all shortest paths between all nodes in the graph.
This week we finish our look at pathfinding and graph search algorithms, with a focus on the Minimum Weight Spanning Tree algorithm, which calculates the paths along a connected tree structure with the smallest value (weight of the relationship such as cost, time or capacity) associated with visiting all nodes in the tree. We’ll show an example using Neo4j and provide links to other examples and use cases.
About Minimum Weight Spanning Tree
The Minimum Weight Spanning Tree starts from a given node, and finds all its reachable nodes and the set of relationships that connect the nodes together with the minimum possible weight. Prim’s algorithm is one of the simplest and best-known Minimum Weight Spanning Tree algorithms. The K-Means variant of this algorithm can be used to detect clusters in the graph.
The first known algorithm for finding a Minimum Weight Spanning Tree was developed by the Czech scientist Otakar Borůvka in 1926 while trying to design an efficient electricity network for Moravia. Prim’s algorithm was invented by Jarnik in 1930 and rediscovered by Prim in 1957. It is similar to Dijkstra’s Shortest Path algorithm, but rather than minimizing the total length of a path ending at each relationship, it minimizes the length of each relationship individually. Unlike Dijkstra’s, Prim’s tolerates negative-weight relationships.
NOTE: The algorithm operates as follows:
- Start with a tree containing only one node (and no relationships).
- Select the minimal-weight relationship coming from that node and add it to our tree.
- Repeatedly choose a minimal-weight relationship that joins any node in the tree to one that is not in the tree, adding the new relationship and node to our tree.
- When there are no more nodes to add, the tree we have built is a minimum spanning tree.
When Should I Use Minimum Weight Spanning Tree?
- Minimum Weight Spanning Tree was applied to analyze airline and sea connections of Papua New Guinea and minimize the travel cost of exploring the country. It could be used to help design low-cost tours that visit many destinations across a country. The research mentioned is found here: “An Application of Minimum Spanning Trees to Travel Planning.”
- Minimum Weight Spanning Tree has been used to analyze and visualize correlations in a network of currencies based on the correlation between currency returns. This is described in https://www.nbs.sk/_img/Documents/_PUBLIK_NBS_FSR/Biatec/Rok2013/07-2013/05_biatec13-7_resovsky_EN.pdf” target=”_blank”>”Minimum Spanning Tree Application in the Currency Market.”
- Exhaustive clinical research has shown Minimum Weight Spanning Tree to be useful in tracing the history of infection transmission in an outbreak. For more information, see “Use of the Minimum Spanning Tree Model for Molecular Epidemiological Investigation of a Nosocomial Outbreak of Hepatitis C Virus Infection.”
TIP: The Minimum Weight Spanning Tree algorithm only gives meaningful results when run on a graph where the relationships have different weights. If the graph has no weights, or all relationships have the same weight, then any spanning tree is a minimum spanning tree.
Minimum Weight Spanning Tree Example
Let’s see the Minimum Weight Spanning Tree algorithm in action. The following Cypher statement creates a graph containing places and links between them.
MERGE (a:Place {id:"A"}) MERGE (b:Place {id:"B"}) MERGE (c:Place {id:"C"}) MERGE (d:Place {id:"D"}) MERGE (e:Place {id:"E"}) MERGE (f:Place {id:"F"}) MERGE (g:Place {id:"G"}) MERGE (d)-[:LINK {cost:4}]->(b) MERGE (d)-[:LINK {cost:6}]->(e) MERGE (b)-[:LINK {cost:1}]->(a) MERGE (b)-[:LINK {cost:3}]->(c) MERGE (a)-[:LINK {cost:2}]->(c) MERGE (c)-[:LINK {cost:5}]->(e) MERGE (f)-[:LINK {cost:1}]->(g);
We run the algorithm to find the Minimum Weight Spanning Tree starting from D by executing the following query.
MATCH (n:Place {id:"D"}) CALL algo.spanningTree.minimum("Place", "LINK", "cost", id(n), {write:true, writeProperty:"MINST"}) YIELD loadMillis, computeMillis, writeMillis, effectiveNodeCount RETURN loadMillis, computeMillis, writeMillis, effectiveNodeCount; This procedure creates MINST relationships representing the minimum spanning tree. We then run the following query to find all pairs of nodes and the associated cost of the relationships between them. MATCH path = (n:Place {id:"D"})-[:MINST*]-() WITH relationships(path) AS rels UNWIND rels AS rel WITH DISTINCT rel AS rel RETURN startNode(rel).id AS source, endNode(rel).id AS destination, rel.cost AS cost
The Minimum Weight Spanning Tree excludes the relationship with cost
6
from D
to E
, and the one with cost 3
from B
to C
. Nodes F
and G
aren’t included because they’re unreachable from D
. There are also variations of the algorithm that find the Maximum Weight Spanning Tree or K-Spanning Tree.Conclusion
As we’ve shown, Minimum Weight Spanning Tree is widely used for network designs, and also has real-time applications with rolling optimizations such as processes in a chemical refinery or driving route corrections.
Next week we’ll begin our look at Centrality algorithms, with a focus on PageRank, which measures the transitive influence or connectivity of nodes.
Learn about the power of graph algorithms in the O’Reilly book,
Graph Algorithms: Practical Examples in Apache Spark and Neo4j by the authors of this article. Click below to get your free ebook copy.
Get the O’Reilly Ebook