Approximate Maximum k-cut

Glossary

Directed: Directed trait. The algorithm is well-defined on a directed graph.
Directed: Directed trait. The algorithm ignores the direction of the graph.
Directed: Directed trait. The algorithm does not run on a directed graph.
Undirected: Undirected trait. The algorithm is well-defined on an undirected graph.
Undirected: Undirected trait. The algorithm ignores the undirectedness of the graph.
Heterogeneous nodes: Heterogeneous nodes fully supported. The algorithm has the ability to distinguish between nodes of different types.
Heterogeneous nodes: Heterogeneous nodes allowed. The algorithm treats all selected nodes similarly regardless of their label.
Heterogeneous relationships: Heterogeneous relationships fully supported. The algorithm has the ability to distinguish between relationships of different types.
Heterogeneous relationships: Heterogeneous relationships allowed. The algorithm treats all selected relationships similarly regardless of their type.
Weighted relationships: Weighted trait. The algorithm supports a relationship property to be used as weight, specified via the relationshipWeightProperty configuration parameter.
Weighted relationships: Weighted trait. The algorithm treats each relationship as equally important, discarding the value of any relationship weight.

Introduction

A k-cut of a graph is an assignment of its nodes into k disjoint communities. So for example a 2-cut of a graph with nodes a,b,c,d could be the communities {a,b,c} and {d}.

A Maximum k-cut is a k-cut such that the total weight of relationships between nodes from different communities in the k-cut is maximized. That is, a k-cut that maximizes the sum of weights of relationships whose source and target nodes are assigned to different communities in the k-cut. Suppose in the simple a,b,c,d node set example above we only had one relationship b → c, and it was of weight 1.0. The 2-cut we outlined above would then not be a maximum 2-cut (with a cut cost of 0.0), whereas for example the 2-cut with communities {a,b} and {c,d} would be one (with a cut cost of 1.0).

Maximum k-cut is the same as Maximum Cut when k = 2.

This feature is in the alpha tier. For more information on feature tiers, see API Tiers.

Applications

Finding the maximum k-cut for a graph has several known applications, for example it is used to:

analyze protein interaction
design circuit (VLSI) layouts
solve wireless communication problems
analyze cryptocurrency transaction patterns
design computer networks

Approximation

In practice, finding the best cut is not feasible for larger graphs and only an approximation can be computed in reasonable time.

The approximate heuristic algorithm implemented in GDS is a parallelized GRASP style algorithm optionally enhanced (via config) with variable neighborhood search (VNS).

For detailed information about a serial version of the algorithm, with a slightly different construction phase, when k = 2 see GRASP+VNR in the paper:

Festa et al. Randomized Heuristics for the Max-Cut Problem, 2002.

To see how the algorithm above performs in terms of solution quality compared to other algorithms when k = 2 see FES02GV in the paper:

Dunning et al. What Works Best When? A Systematic Evaluation of Heuristics for Max-Cut and QUBO, 2018.

By the stochastic nature of the algorithm, the results it yields will not be deterministic unless running single-threaded (concurrency = 1) and using the same random seed (randomSeed = SOME_FIXED_VALUE).

Tuning the algorithm parameters

There are two important algorithm specific parameters which lets you trade solution quality for shorter runtime.

Iterations

GRASP style algorithms are iterative by nature. Every iteration they run the same well-defined steps to derive a solution, but each time with a different random seed yielding solutions that (highly likely) are different too. In the end the highest scoring solution is picked as the winner.

VNS max neighborhood order

Variable neighborhood search (VNS) works by slightly perturbing a locally optimal solution derived from the previous steps in an iteration of the algorithm, followed by locally optimizing this perturbed solution. Perturb in this case means to randomly move some nodes from their current (locally optimal) community to another community.

VNS will in turn move 1,2,…,vnsMaxNeighborhoodOrder random nodes and using each of the resulting solutions try to find a new locally optimal solution that’s better. This means that although potentially better solutions can be derived using VNS it will take more time, and additionally some more memory will be needed to temporarily store the perturbed solutions.

By default, VNS is not used (vnsMaxNeighborhoodOrder = 0). To use it, experimenting with a maximum order equal to 20 is a good place to start.

Syntax

This section covers the syntax used to execute the Approximate Maximum k-cut algorithm in each of its execution modes. We are describing the named graph variant of the syntax. To learn more about general syntax variants, see Syntax overview.

Example 1. Approximate Maximum k-cut syntax per mode

Run Approximate Maximum k-cut in stream mode on a named graph.

CALL gds.maxkcut.stream(
  graphName: String,
  configuration: Map
) YIELD
  nodeId: Integer,
  communityId: Integer

Table 1. Parameters
Name	Type	Default	Optional	Description
graphName	String	`n/a`	no	The name of a graph stored in the catalog.
configuration	Map	`{}`	yes	Configuration for algorithm-specifics and/or graph filtering.

Table 2. Configuration
Name	Type	Default	Optional	Description
nodeLabels	List of String	`['*']`	yes	Filter the named graph using the given node labels. Nodes with any of the given labels will be included.
relationshipTypes	List of String	`['*']`	yes	Filter the named graph using the given relationship types. Relationships with any of the given types will be included.
concurrency	Integer	`4`	yes	The number of concurrent threads used for running the algorithm.
jobId	String	`Generated internally`	yes	An ID that can be provided to more easily track the algorithm’s progress.
logProgress	Boolean	`true`	yes	If disabled the progress percentage will not be logged.
k	Integer	`2`	yes	The number of disjoint communities the nodes will be divided into.
iterations	Integer	`8`	yes	The number of iterations the algorithm will run before returning the best solution among all the iterations.
vnsMaxNeighborhoodOrder	Integer	`0 (VNS off)`	yes	The maximum number of nodes VNS will swap when perturbing solutions.
randomSeed	Integer	`n/a`	yes	A random seed which is used for all randomness in the computation. Requires `concurrency = 1`.
relationshipWeightProperty	String	`null`	yes	If set, the values stored at the given property are used as relationship weights during the computation. If not set, the graph is considered unweighted.
minCommunitySize	Integer	`0`	yes	Only nodes inside communities larger or equal the given value are returned.

Table 3. Results
Name	Type	Description
nodeId	Integer	Node ID.
communityId	Integer	Community ID.

Run Approximate Maximum k-cut in mutate mode on a named graph.

CALL gds.maxkcut.mutate(
  graphName: String,
  configuration: Map
) YIELD
  cutCost: Float,
  preProcessingMillis: Integer,
  computeMillis: Integer,
  postProcessingMillis: Integer,
  mutateMillis: Integer,
  nodePropertiesWritten: Integer,
  configuration: Map

Table 4. Parameters
Name	Type	Default	Optional	Description
graphName	String	`n/a`	no	The name of a graph stored in the catalog.
configuration	Map	`{}`	yes	Configuration for algorithm-specifics and/or graph filtering.

Table 5. Configuration
Name	Type	Default	Optional	Description
mutateProperty	String	`n/a`	no	The node property in the GDS graph to which the approximate maximum k-cut is written.
nodeLabels	List of String	`['*']`	yes	Filter the named graph using the given node labels.
relationshipTypes	List of String	`['*']`	yes	Filter the named graph using the given relationship types.
concurrency	Integer	`4`	yes	The number of concurrent threads used for running the algorithm.
jobId	String	`Generated internally`	yes	An ID that can be provided to more easily track the algorithm’s progress.
k	Integer	`2`	yes	The number of disjoint communities the nodes will be divided into.
iterations	Integer	`8`	yes	The number of iterations the algorithm will run before returning the best solution among all the iterations.
vnsMaxNeighborhoodOrder	Integer	`0 (VNS off)`	yes	The maximum number of nodes VNS will swap when perturbing solutions.
randomSeed	Integer	`n/a`	yes	A random seed which is used for all randomness in the computation. Requires `concurrency = 1`.
relationshipWeightProperty	String	`null`	yes	If set, the values stored at the given property are used as relationship weights during the computation. If not set, the graph is considered unweighted.

Table 6. Results
Name	Type	Description
cutCost	Float	Sum of weights of all relationships connecting nodes from different communities.
preProcessingMillis	Integer	Milliseconds for preprocessing the data.
computeMillis	Integer	Milliseconds for running the algorithm.
postProcessingMillis	Integer	Milliseconds for computing the statistics.
mutateMillis	Integer	Milliseconds for adding properties to the projected graph.
nodePropertiesWritten	Integer	Number of properties added to the projected graph.
configuration	Map	Configuration used for running the algorithm.

Examples

All the examples below should be run in an empty database.

The examples use Cypher projections as the norm. Native projections will be deprecated in a future release.

In this section we will show examples of running the Approximate Maximum k-cut algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small Bitcoin transactions graph of a handful nodes connected in a particular pattern. The example graph looks like this:

The following Cypher statement will create the example graph in the Neo4j database:

CREATE
  (alice:Person {name: 'Alice'}),
  (bridget:Person {name: 'Bridget'}),
  (charles:Person {name: 'Charles'}),
  (doug:Person {name: 'Doug'}),
  (eric:Person {name: 'Eric'}),
  (fiona:Person {name: 'Fiona'}),
  (george:Person {name: 'George'}),
  (alice)-[:TRANSACTION {value: 81.0}]->(bridget),
  (alice)-[:TRANSACTION {value: 7.0}]->(doug),
  (bridget)-[:TRANSACTION {value: 1.0}]->(doug),
  (bridget)-[:TRANSACTION {value: 1.0}]->(eric),
  (bridget)-[:TRANSACTION {value: 1.0}]->(fiona),
  (bridget)-[:TRANSACTION {value: 1.0}]->(george),
  (charles)-[:TRANSACTION {value: 45.0}]->(bridget),
  (charles)-[:TRANSACTION {value: 3.0}]->(eric),
  (doug)-[:TRANSACTION {value: 3.0}]->(charles),
  (doug)-[:TRANSACTION {value: 1.0}]->(bridget),
  (eric)-[:TRANSACTION {value: 1.0}]->(bridget),
  (fiona)-[:TRANSACTION {value: 3.0}]->(alice),
  (fiona)-[:TRANSACTION {value: 1.0}]->(bridget),
  (george)-[:TRANSACTION {value: 1.0}]->(bridget),
  (george)-[:TRANSACTION {value: 4.0}]->(charles)

With the graph in Neo4j we can now project it into the graph catalog to prepare it for algorithm execution. We do this using a Cypher projection targeting the Person nodes and the TRANSACTION relationships.

The following statement will project a graph store it in the graph catalog under the name 'myGraph'.

MATCH (source:Person)-[r:TRANSACTION]->(target:Person)
RETURN gds.graph.project(
  'myGraph',
  source,
  target,
  { relationshipProperties: r { .value } }
)

Memory Estimation

First off, we will estimate the cost of running the algorithm using the estimate procedure. This can be done with any execution mode. We will use the mutate mode in this example. Estimating the algorithm is useful to understand the memory impact that running the algorithm on your graph will have. When you later actually run the algorithm in one of the execution modes the system will perform an estimation. If the estimation shows that there is a very high probability of the execution going over its memory limitations, the execution is prohibited. To read more about this, see Automatic estimation and execution blocking.

For more details on estimate in general, see Memory Estimation.

The following will estimate the memory requirements for running the algorithm:

CALL gds.maxkcut.mutate.estimate('myGraph', {mutateProperty: 'community'})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory

Table 7. Results
nodeCount	relationshipCount	bytesMin	bytesMax	requiredMemory
7	15	488	488	"488 Bytes"

Mutate

The mutate execution mode extends the stats mode with an important side effect: updating the named graph with a new node property containing the approximate maximum k-cut for that node. The name of the new property is specified using the mandatory configuration parameter mutateProperty. The result is a single summary row, similar to stats, but with some additional metrics. The mutate mode is especially useful when multiple algorithms are used in conjunction.

For more details on the mutate mode in general, see Mutate.

The following will run the algorithm in mutate mode:

CALL gds.maxkcut.mutate('myGraph', {mutateProperty: 'community'})
YIELD cutCost, nodePropertiesWritten

Table 8. Results
cutCost	nodePropertiesWritten
13.0	7

We can see that when relationship weight is not taken into account we derive a cut into two (since we didn’t override the default k = 2) communities of cost 13.0. The total cost is represented by the cutCost column here. This is the value we want to be as high as possible. Additionally, the graph 'myGraph' now has a node property community which stores the community to which each node belongs.

To inspect which community each node belongs to we can stream node properties.

Stream node properties:

CALL gds.graph.nodeProperty.stream('myGraph', 'community')
YIELD nodeId, propertyValue
RETURN gds.util.asNode(nodeId).name as name, propertyValue AS community

Table 9. Results
name	community
"Alice"	0
"Bridget"	0
"Charles"	0
"Doug"	1
"Eric"	1
"Fiona"	1
"George"	1

Looking at our graph topology we can see that there are no relationships between the nodes of community 1, and two relationships between the nodes of community 0, namely Alice → Bridget and Charles → Bridget. However, since there are a total of eight relationships between Bridget and nodes of community 1, and our graph is unweighted assigning Bridget to community 1 would not yield a cut of a higher total weight. Thus, since the number of relationships connecting nodes of different communities greatly outnumber the number of relationships connecting nodes of the same community it seems like a good solution. In fact, this is the maximum 2-cut for this graph.

Because of the inherent randomness in the Approximate Maximum k-Cut algorithm (unless having concurrency = 1 and fixed randomSeed), running it another time might yield a different solution. For our case here it would be equally plausible to get the inverse solution, i.e. when our community 0 nodes are mapped to community 1 instead, and vice versa. Note however, that for that solution the cut cost would remain the same.

Mutate with relationship weights

In this example we will have a look at how adding relationship weight can affect our solution.

The following will run the algorithm in mutate mode, diving our nodes into two communities once again:

CALL gds.maxkcut.mutate(
   'myGraph',
   {
        relationshipWeightProperty: 'value',
        mutateProperty: 'weightedCommunity'
    }
)
YIELD cutCost, nodePropertiesWritten

Table 10. Results
cutCost	nodePropertiesWritten
146.0	7

Since the value properties on our TRANSACTION relationships were all at least 1.0 and several of a larger value it’s not surprising that we obtain a cut with a larger cost in the weighted case.

Let us now stream node properties to once again inspect the node community distribution.

Stream node properties:

CALL gds.graph.nodeProperty.stream('myGraph', 'weightedCommunity')
YIELD nodeId, propertyValue
RETURN gds.util.asNode(nodeId).name as name, propertyValue AS weightedCommunity

Table 11. Results
name	weightedCommunity
"Alice"	0
"Bridget"	1
"Charles"	0
"Doug"	1
"Eric"	1
"Fiona"	1
"George"	1

Comparing this result with that of unweighted case we can see that Bridget has moved to another community but the output is otherwise the same. Indeed, this makes sense by looking at our graph. Bridget is connected to nodes of community 1 by eight relationships, but these relationships all have weight 1.0. And although Bridget is only connected to two community 0 nodes, these relationships are of weight 81.0 and 45.0. Moving Bridget back to community 0 would lower the total cut cost of 81.0 + 45.0 - 8 * 1.0 = 118.0. Hence, it does make sense that Bridget is now in community 1. In fact, this is the maximum 2-cut in the weighted case.

Stream

In the stream execution mode, the algorithm returns the approximate maximum k-cut for each node. This allows us to inspect the results directly or post-process them in Cypher without any side effects.

For more details on the stream mode in general, see Stream.

The following will run the algorithm in stream mode using default configuration parameters:

CALL gds.maxkcut.stream('myGraph')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS name, communityId

Table 12. Results
name	communityId
"Alice"	0
"Bridget"	0
"Charles"	0
"Doug"	1
"Eric"	1
"Fiona"	1
"George"	1

We can see that the result is what we expect, namely the same as in the mutate unweighted example.