Release Date: 1 December 2021

GDS 1.8 is compatible with Neo4j 4.1, 4.2, 4.3, and 4.4 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5

Breaking changes

  • GDS now throws error messages on identifiers with trailing whitespaces to avoid input errors. This affects graphName, modelName, and several property parameters such as nodeWeightProperty or seedProperty.
  • We have removed the separate concurrency parameter from the model parameter space in gds.alpha.ml.nodeClassification.train, gds.alpha.ml.linkPrediction.train and gds.alpha.ml.pipeline.linkPrediction.configureParams. The concurrency value in the configuration of the train procedure will be used.
  • The procedure gds.alpha.randomWalk.stream has graduated to the beta tier, as gds.beta.randomWalk.stream.
    • Random Walk has been improved and aligned with the Node2Vec implementation. Please consult the documentation to find out about the new configuration options.
    • gds.alpha.randomWalk.stream has been removed.
    • A memory estimation procedure, gds.beta.randomWalk.estimate has been added
  • The procedure gds.beta.fastRPExtended has been merged with gds.fastRP.

New features

  • Link Prediction
    • Add new link prediction stream procedure gds.alpha.ml.pipeline.linkPrediction.predict.stream.
    • Added probabilityDistribution and samplingStats to the result of gds.alpha.ml.pipeline.linkPrediction.predict.mutate.
    • To improve prediction performance, we’ve added kNN-based approximate search strategy option to link prediction procedures gds.alpha.ml.pipeline.linkPrediction.predict.stream|mutate.
    • Node property steps in Link Prediction pipelines can use a relationship property.
  • Node Classification pipelines: similar to link prediction pipelines, we’ve added a pipeline procedure for node classification, where users can define the features, splitting strategy, and model training options. We’ve added:
    • gds.alpha.ml.pipeline.nodeClassification.create
    • gds.alpha.ml.pipeline.nodeClassification.addNodeProperty
    • gds.alpha.ml.pipeline.nodeClassification.selectFeatures
    • gds.alpha.ml.pipeline.nodeClassification.configureParams
    • gds.alpha.ml.pipeline.nodeClassification.configureSplit
    • gds.alpha.ml.pipeline.nodeClassification.train
    • gds.alpha.ml.pipeline.nodeClassification.predict.mutate|stream|write
  • New algorithm: Conductance, gds.alpha.conductance.stream, can be used to compute a metric to evaluate the quality of communities identified by community detection algorithms.
  • Added support for preserving a relationship property in gds.alpha.ml.splitRelationships.mutate.
  • The procedure gds.fastRP has received additional configuration parameters:
    • featureProperties: to configure using node properties as part of the embedding.
    • propertyRatio: to control how much of the embedding is computed from properties.
    • nodeSelfInfluence: allows using each node’s initial random vector as a contribution to the node’s embedding. Especially useful for graphs with disconnected nodes.

Bug fixes

  • Added check that concurrency is meeting determinism constraints for K-Nearest Neighbors whenever randomSeed is overridden.
  • Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
  • Fixed an issue where seeded algorithms (such as WCC) on graphs with multiple node labels could assign seeded communities to new nodes.
  • Fixed an issue where KNN did not add candidates to the topK result.
  • Fixed an issue where running an algorithm could return incorrect results on graphs filtered with the configuration parameter nodeLabels.
  • Fixed an issue where running gds.alpha.ml.pipeline.linkPrediction.train could result in an error on graphs filtered with the configuration parameter nodeLabels.
  • Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
  • Fixed an issue with unmapped Neo4j node ids throwing ArrayIndexOutOfBoundsException.
  • Fixed a bug where the in-memory storage engine would not find the correct graph store if the db name was not lowercase
  • Fixed a bug where the graph store would be released when storing the CypherGraphStore in the catalog
  • Fixed a bug where Node2Vec would produce an ArrayIndexOutOfBounds error on sufficiently large graphs.

Improvements

  • Added context information to log entries in debug and warning.
  • Log Training loss as part of general progress logging
  • Running transactions while projecting a graph now has less chance of breaking the projected graph
  • Improve runtime performance for FastRP
  • Use Neo4j node id instead of internal GDS node id when seeding generation of initial random vectors in FastRP.
  • The in-memory cypher db is now capable of querying relationship ids, types and properties
  • The procedure gds.alpha.randomWalk.stream has been improved and should now run faster and more stable.