Release Date: 2 June 2022
GDS 2.1.0-preview is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.7

Breaking Changes

  • Removed the redundant information of parameter space and split config from the info of the models trained by gds.beta.pipeline.[nodeClassification|linkPrediction].train. The information is now accessible only via the Pipeline Catalog.
  • Removed the label parameter from gds.graph.removeNodeProperties.
  • Supported config parameters are timeoutInSeconds and concurrency

New Features

  • (Enterprise Only) Apache Arrow and Flight RPC can now be used to improve certain import and export tasks:
  • New Algorithm: K-Means Clustering. Added the following procedures:
    • gds.alpha.kmeans.mutate
    • gds.alpha.kmeans.stats
    • gds.alpha.kmeans.stream
  • New Algorithm: Leiden. Added the following procedures:
    • gds.alpha.leiden.mutate
    • gds.alpha.leiden.stats
    • Gds.alpha.leiden.stream
  • Added new similarity variant Filtered Node Similarity to alpha tier, accepting source and target node filters
    • gds.alpha.nodeSimilarity.filtered.mutate
    • gds.alpha.nodeSimilarity.filtered.stream
    • gds.alpha.nodeSimilarity.filtered.write
  • Added new similarity variant Filtered KNN to alpha tier, accepting source and target node filters
    • gds.alpha.knn.filtered.mutate
    • gds.alpha.knn.filtered.stream
  • Added new procedures for delta stepping:
    • gds.allShortestPaths.delta.stats
    • gds.allShortestPaths.delta.stats.estimate
  • Added new procedures for BFS:
    • Gds.bfs.stats
    • gds.bfs.stats.estimate
  • Added Node Regression Pipelines with the following procedures
    • gds.alpha.pipeline.nodeRegression.create
    • gds.alpha.pipeline.nodeRegression.configureAutoTuning
    • gds.alpha.pipeline.nodeRegression.configureSplit
    • gds.alpha.pipeline.nodeRegression.addLinearRegression
    • gds.alpha.pipeline.nodeRegression.addRandomForest
    • gds.alpha.pipeline.nodeRegression.addNodeProperty
    • gds.alpha.pipeline.nodeRegression.selectFeatures
    • gds.alpha.pipeline.nodeRegression.train
    • gds.alpha.pipeline.nodeRegression.predict.stream
    • gds.alpha.pipeline.nodeRegression.predict.mutate
  • Autotuning Support for Machine Learning Pipelines:
    • Added new procedures gds.alpha.pipeline.[nodeClassification|nodeRegression|linkPrediction].configureAutoTuning.
    • Added syntax to specify ranges for parameters in gds.alpha.pipeline.[linkPrediction|nodeClassification|nodeRegression].addRandomForest, gds.beta.pipeline.[linkPrediction|nodeClassification].addLogisticRegression, and gds.alpha.nodeRegression.addLinearRegression
  • Additional Machine Learning Pipeline Functionality:
    • Exposed learningRate for the LogisticRegression models, which can be added using gds.beta.pipeline.[nodeClassification|linkPrediction].addLogisticRegression
    • Exposed minLeafSize for RandomForest models, which can be added using gds.alpha.pipeline.[nodeClassification|linkPrediction].addRandomForest
    • Exposed criterion for RandomForestClassification models, which can be added using gds.alpha.pipeline.[nodeClassification|linkPrediction].addRandomForest. Also added support for the ENTROPY impurity criterion.
    • Updated structure of modelSelectionStats yield in gds.beta.pipeline.[linkPrediction, nodeClassification].train.
    • Support OUT_OF_BAG_ERROR metric in gds.beta.pipeline.[linkPrediction, nodeClassification].train which applies only to RandomForest models.
    • Expose batchesPerIteration in gds.beta.graphSage.train to configure the number of batches considered per iteration.
  • Cypher Aggregation now accepts any INTEGER value for source and target nodes
  • Added ShardedIdMap which adds support for external node ids ranging from 0 to Long.MAX_VALUE.
    • The id map is disabled by default and can be enabled via feature toggle USE_SHARDED_ID_MAP.
  • Added procedures for exporting graph properties to the alpha tier
    • gds.alpha.graph.streamGraphProperty
    • gds.alpha.graph.removeGraphProperty
  • Exposed a new string config parameter jobId for graph projection and algorithm procedures, which allows for easier tracking of a job via e.g. gds.beta.listProgress.

Bug fixes

  • Fixed a bug in gds.beta.pipeline.[nodeClassification|linkPrediction].addNodeProperty where gds.beta.graphSage.mutate could not be added.
  • Fixed a bug where the procedures gds.beta.pipeline.linkPrediction.predict.[mutate|stream] threw an error when given the argument initialSampler.
  • Fixed a bug with running Triangle Count on filtered graphs that could cause an ArrayIndexOutOfBounds Error.
  • Fixed a bug where graphSage.train incorrectly reported didConverge as false.
  • Fixed a bug in CollapsePath where a provided nodeFilter would be ignored.
  • Fixed a bug in gds.louvain.stream when the consecutiveIds parameter was enabled.
  • Fixed a bug in RandomWalk where not consuming all stream results could lead to a state where GDS would become unable to run further procedures

Improvements

  • When a query is failed by the memory guard, information is logged as well as sent to the user in the raised exception.
  • Machine learning pipelines
    • gds.beta.pipeline.[nodeClassification|linkPrediction].train.estimate now incorporates memory usage of random forest training into account when applicable.
    • gds.beta.pipeline.[nodeClassification|linkPrediction].predict.[mutate,stream,write].estimate now take random forest prediction memory overhead
    • Improve early validation of graph and prediction pipeline in gds.beta.pipeline.[nodeClassification|linkPrediction].predict.
    • Improve memory estimation for gds.beta.pipeline.[nodeClassification|linkPrediction].train.estimate.
    • Improve memory estimation in gds.beta.pipeline.linkPrediction.train.estimate.
    • Add training method specific debug level logging during the model selection phase of gds.beta.pipeline.linkPrediction.train, gds.beta.pipeline.nodeClassification.train and gds.alpha.pipeline.nodeRegression.train.
    • Improved logging in Link Prediction and Node Classification training.
    • Reduced computational complexity and constant overhead of random forest training, added via gds.alpha.pipeline[linkPrediction|nodeClassification].addRandomForest. It now runs up to 80% faster.
    • Improve runtime of gds.beta.pipeline.[nodeClassification|linkPrediction].train if the model candidate is of type LogisticRegression. Training may be up to 40% faster.
  • GraphSAGE:
    • Improved progress logging for GraphSage
    • Improve the modelInfo of models created by gds.beta.graphSage.train to include the loss per iteration and ranIterations per epoch.
    • Use the average loss per node in gds.beta.graphSage.train. This removes the implicit dependency between the tolerance and batchSize parameter.
    • We now validate for embedding generation using gds.beta.graphSage.[stream|write|mutate] to ensure that eithe both the input & model graph include relationshipWeightProperty, or neither include relationshipWeightProperties. Before, if the model was trained on an unweighted graph, the relationship-weight on the input graph was silently ignored (or vice versa)
    • Change the gradient computation in gds.beta.graphSage.train. Instead of averaging the gradient over all batches we use the batchSamplingRatio for setting the number of batches to consider. By default, this significantly improves the runtime by up to 90%
  • Expose training details by returning and logging lossPerIteration in gds.beta.node2vec.
  • Graph Projections:
    • Add support for query parameters for gds.beta.graph.project.subgraph by passing a parameters cypher map as part of the procedure configuration.
    • Improved error message for gds.beta.graph.project.subgraph when comparing expressions with incompatible types and one of them is a literal expression.
    • improved memory usage while projecting a graph that has multiple relationship properties for the same relationship type
    • It is now possible to specify relationshipTypes: [], in order to project a graph with no relationships.
  • Graph Export:
    • Changed gds.graph.export to export internal node identifiers instead of original ids. This avoids fragmentation of the newly created store.
    • Add progress tracking for gds.graph.export.
  • Add concurrency configuration parameter to gds.alpha.backup and gds.alpha.restore.
  • Added query support for mutated properties for Cypher on GDS.