Release Date: 1 December 2021
GDS 1.8 is compatible with Neo4j 4.1, 4.2, 4.3, and 4.4 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5
Breaking changes
- GDS now throws error messages on identifiers with trailing whitespaces to avoid input errors. This affects
graphName
,modelName
, and several property parameters such asnodeWeightProperty
orseedProperty
. - We have removed the separate
concurrency
parameter from the model parameter space ingds.alpha.ml.nodeClassification.train
,gds.alpha.ml.linkPrediction.train
andgds.alpha.ml.pipeline.linkPrediction.configureParams
. Theconcurrency
value in the configuration of the train procedure will be used. - The procedure
gds.alpha.randomWalk.stream
has graduated to thebeta
tier, asgds.beta.randomWalk.stream
.- Random Walk has been improved and aligned with the
Node2Vec
implementation. Please consult the documentation to find out about the new configuration options. gds.alpha.randomWalk.stream
has been removed.- A memory estimation procedure,
gds.beta.randomWalk.estimate
has been added
- Random Walk has been improved and aligned with the
- The procedure
gds.beta.fastRPExtended
has been merged withgds.fastRP
.
New features
- Link Prediction
- Add new link prediction stream procedure
gds.alpha.ml.pipeline.linkPrediction.predict.stream
. - Added
probabilityDistribution
andsamplingStats
to the result ofgds.alpha.ml.pipeline.linkPrediction.predict.mutate
. - To improve prediction performance, we’ve added kNN-based approximate search strategy option to link prediction procedures
gds.alpha.ml.pipeline.linkPrediction.predict.stream|mutate
. - Node property steps in Link Prediction pipelines can use a relationship property.
- Add new link prediction stream procedure
- Node Classification pipelines: similar to link prediction pipelines, we’ve added a pipeline procedure for node classification, where users can define the features, splitting strategy, and model training options. We’ve added:
gds.alpha.ml.pipeline.nodeClassification.create
gds.alpha.ml.pipeline.nodeClassification.addNodeProperty
gds.alpha.ml.pipeline.nodeClassification.selectFeatures
gds.alpha.ml.pipeline.nodeClassification.configureParams
gds.alpha.ml.pipeline.nodeClassification.configureSplit
gds.alpha.ml.pipeline.nodeClassification.train
gds.alpha.ml.pipeline.nodeClassification.predict.mutate|stream|write
- New algorithm: Conductance,
gds.alpha.conductance.stream
, can be used to compute a metric to evaluate the quality of communities identified by community detection algorithms. - Added support for preserving a relationship property in
gds.alpha.ml.splitRelationships.mutate
. - The procedure
gds.fastRP
has received additional configuration parameters:featureProperties
: to configure using node properties as part of the embedding.propertyRatio
: to control how much of the embedding is computed from properties.nodeSelfInfluence
: allows using each node’s initial random vector as a contribution to the node’s embedding. Especially useful for graphs with disconnected nodes.
Bug fixes
- Added check that
concurrency
is meeting determinism constraints for K-Nearest Neighbors wheneverrandomSeed
is overridden. - Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
- Fixed an issue where seeded algorithms (such as WCC) on graphs with multiple node labels could assign seeded communities to new nodes.
- Fixed an issue where KNN did not add candidates to the topK result.
- Fixed an issue where running an algorithm could return incorrect results on graphs filtered with the configuration parameter
nodeLabels
. - Fixed an issue where running
gds.alpha.ml.pipeline.linkPrediction.train
could result in an error on graphs filtered with the configuration parameternodeLabels
. - Fixed an ArrayIndexOutOfBounds error that could happen in triangle count on some graphs with multiple relationship types.
- Fixed an issue with unmapped Neo4j node ids throwing
ArrayIndexOutOfBoundsException
. - Fixed a bug where the in-memory storage engine would not find the correct graph store if the db name was not lowercase
- Fixed a bug where the graph store would be released when storing the CypherGraphStore in the catalog
- Fixed a bug where Node2Vec would produce an ArrayIndexOutOfBounds error on sufficiently large graphs.
Improvements
- Added context information to log entries in debug and warning.
- Log Training loss as part of general progress logging
- Running transactions while projecting a graph now has less chance of breaking the projected graph
- Improve runtime performance for FastRP
- Use Neo4j node id instead of internal GDS node id when seeding generation of initial random vectors in FastRP.
- The in-memory cypher db is now capable of querying relationship ids, types and properties
- The procedure
gds.alpha.randomWalk.stream
has been improved and should now run faster and more stable.
Recent Graph Data Science Releases
- Graph Data Science 2.11
- Graph Data Science 2.10.1
- Graph Data Science 2.9.0
- Graph Data Science 2.8.0
- Graph Data Science 2.7.0