Graph Data Science 1.8.0

Release Date: 1 December 2021

GDS 1.8 is compatible with Neo4j 4.1, 4.2, 4.3, and 4.4 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5

Breaking changes

GDS now throws error messages on identifiers with trailing whitespaces to avoid input errors. This affects graphName, modelName, and several property parameters such as nodeWeightProperty or seedProperty.
We have removed the separate concurrency parameter from the model parameter space in gds.alpha.ml.nodeClassification.train, gds.alpha.ml.linkPrediction.train and gds.alpha.ml.pipeline.linkPrediction.configureParams. The concurrency value in the configuration of the train procedure will be used.
The procedure gds.alpha.randomWalk.stream has graduated to the beta tier, as gds.beta.randomWalk.stream.
- Random Walk has been improved and aligned with the Node2Vec implementation. Please consult the documentation to find out about the new configuration options.
- gds.alpha.randomWalk.stream has been removed.
- A memory estimation procedure, gds.beta.randomWalk.estimate has been added
The procedure gds.beta.fastRPExtended has been merged with gds.fastRP.

New features

Link Prediction
- Add new link prediction stream procedure gds.alpha.ml.pipeline.linkPrediction.predict.stream.
- Added probabilityDistribution and samplingStats to the result of gds.alpha.ml.pipeline.linkPrediction.predict.mutate.
- To improve prediction performance, we’ve added kNN-based approximate search strategy option to link prediction procedures gds.alpha.ml.pipeline.linkPrediction.predict.stream|mutate.
- Node property steps in Link Prediction pipelines can use a relationship property.
Node Classification pipelines: similar to link prediction pipelines, we’ve added a pipeline procedure for node classification, where users can define the features, splitting strategy, and model training options. We’ve added:
- gds.alpha.ml.pipeline.nodeClassification.create
- gds.alpha.ml.pipeline.nodeClassification.addNodeProperty
- gds.alpha.ml.pipeline.nodeClassification.selectFeatures
- gds.alpha.ml.pipeline.nodeClassification.configureParams
- gds.alpha.ml.pipeline.nodeClassification.configureSplit
- gds.alpha.ml.pipeline.nodeClassification.train
- gds.alpha.ml.pipeline.nodeClassification.predict.mutate|stream|write
New algorithm: Conductance, gds.alpha.conductance.stream, can be used to compute a metric to evaluate the quality of communities identified by community detection algorithms.
Added support for preserving a relationship property in gds.alpha.ml.splitRelationships.mutate.
The procedure gds.fastRP has received additional configuration parameters:
- featureProperties: to configure using node properties as part of the embedding.
- propertyRatio: to control how much of the embedding is computed from properties.
- nodeSelfInfluence: allows using each node’s initial random vector as a contribution to the node’s embedding. Especially useful for graphs with disconnected nodes.

Bug fixes

Added check that concurrency is meeting determinism constraints for K-Nearest Neighbors whenever randomSeed is overridden.
Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
Fixed an issue where seeded algorithms (such as WCC) on graphs with multiple node labels could assign seeded communities to new nodes.
Fixed an issue where KNN did not add candidates to the topK result.
Fixed an issue where running an algorithm could return incorrect results on graphs filtered with the configuration parameter nodeLabels.
Fixed an issue where running gds.alpha.ml.pipeline.linkPrediction.train could result in an error on graphs filtered with the configuration parameter nodeLabels.
Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
Fixed an issue with unmapped Neo4j node ids throwing ArrayIndexOutOfBoundsException.
Fixed a bug where the in-memory storage engine would not find the correct graph store if the db name was not lowercase
Fixed a bug where the graph store would be released when storing the CypherGraphStore in the catalog
Fixed a bug where Node2Vec would produce an ArrayIndexOutOfBounds error on sufficiently large graphs.

Improvements

Added context information to log entries in debug and warning.
Log Training loss as part of general progress logging
Running transactions while projecting a graph now has less chance of breaking the projected graph
Improve runtime performance for FastRP
Use Neo4j node id instead of internal GDS node id when seeding generation of initial random vectors in FastRP.
The in-memory cypher db is now capable of querying relationship ids, types and properties
The procedure gds.alpha.randomWalk.stream has been improved and should now run faster and more stable.

Recent Graph Data Science Releases

See All Graph Data Science Releases →

Release Notes: Graph Data Science 1.8.0

Recent Graph Data Science Releases

Stay Connected