Graph Data Science 1.4.0 Preview

Release Date: 19 October 2020

GDS 1.4.0 is compatible with Neo4j 4.0 and 4.1, but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.

Breaking changes

Removed sparsity parameter from gds.alpha.randomProjection.*
Renamed gds.alpha.randomProjection to gds.fastRP due to productization.
Renamed embeddingSize parameter to embeddingDimension for fastRP, GraphSAGE and Node2Vec.
Renamed gds.alpha.randomProjection to gds.fastRP due to productization.
Default parameters for gds.fastRP have changed on the following configuration parameters:
- iterationWeights now has default [0.0, 1.0, 1.0]
- normalizeL2 has been removed and its effect is always applied
Removed alpha procedures for GraphSage (replaced with beta tier, see New Features section)
- gds.alpha.graphSage.stream
- gds.alpha.graphSage.write
GraphSage no longer directly calculates embeddings, instead it has been split into train (to generate a named model) and write, mutate, and stream to apply the model predictions to your data.
Due to the creation of a train mode for graph sage, the following configuration parameters were moved:
- embeddingSize – moved as configuration parameter of gds.beta.graphSage.train
- aggregator – moved as configuration parameter of gds.beta.graphSage.train
- activationFunction – moved as configuration parameter of gds.beta.graphSage.train
- sampleSizes – moved as configuration parameter of gds.beta.graphSage.train
- nodePropertyNames – moved as configuration parameter of gds.beta.graphSage.train
- tolerance – moved as configuration parameter of gds.beta.graphSage.train
- learningRate – moved as configuration parameter of gds.beta.graphSage.train
- epochs – moved as configuration parameter of gds.beta.graphSage.train
- maxIterations – moved as configuration parameter of gds.beta.graphSage.train
- searchDepth – moved as configuration parameter of gds.beta.graphSage.train
- negativeSampleWeight – moved as configuration parameter of gds.beta.graphSage.train
- degreeAsProperty – moved as configuration parameter of gds.beta.graphSage.train
gds.beta.graphSage.stream procedure now requires modelName configuration parameter.
gds.beta.graphSage.write procedure requires modelName configuration parameter.
Removed startLoss and epochLosses from the result columns of gds.beta.graphSage.write.
Added the graph create config as a return field to the train procedure, affecting gds.beta.graphSage.train
Fixed result column name embeddings to embedding in GraphSAGE, to align with the other embeddings.
Removed configuration parameter maxCost from gds.alpha.bfs/dfs.
Unlocking the Enterprise Edition of the Graph Data Science library requires a license key. The previous config setting has been removed.
Removed degreeDistribution from gds.graph.drop return columns.
gds.pageRank now respects the concurrency setting. It will not run if there is insufficient memory for the given concurrency setting.
Alpha similarity algorithms no longer accept graph name as a parameter. The algorithm never used the named graph, and now the possibility to specify one is removed.

New features

Promote GraphSage to beta tier and added support for inductive models with the train mode
- This adds procedures
  - gds.beta.graphSage.mutate
  - gds.beta.graphSage.mutate.estimate
  - gds.beta.graphSage.stream
  - gds.beta.graphSage.stream.estimate
  - gds.beta.graphSage.train
  - gds.beta.graphSage.train.estimate
  - gds.beta.graphSage.write
  - gds.beta.graphSage.write.estimate
- And removes alpha procedures
  - gds.alpha.graphSage.stream
  - gds.alpha.graphSage.write
GraphSage supports relationship weights, driven by relationshipWeightProperty
GraphSage supports node labels via projectedFeatureSize
Introduced the model catalog to manage trained models, including:
- gds.beta.model.exists – a procedure to check if a model exists in the catalog
- Gds.beta.model.list– list all available models
- gds.beta.model.drop – removes a model from the catalog
The Random Projection algorithm has been promoted to the product tier and we have added:
- gds.fastRP.stats
- gds.fastRP.mutate
- gds.fastRP.estimate
- Added procedures for stats and mutate mode, as well as, estimates for all modes.
FastRP has been extended to support relationship weights and directions
FastRP supports integer configuration for iteration weights.
We’ve added support for node property features for FastRP in the beta namespace with FastRPExtended:
- gds.beta.fastRPExtended.mutate
- gds.beta.fastRPExtended.stream
- gds.beta.fastRPExtended.stats
- gds.beta.fastRPExtended.write
- gds.beta.fastRPExtended.mutate.estimate
- gds.beta.fastRPExtended.stream.estimate
- gds.beta.fastRPExtended.stats.estimate
- gds.beta.fastRPExtended.write.estimate
We’ve added the K-Nearest Neighbors (KNN) algorithm to the beta tier:
- gds.beta.knn.mutate and gds.beta.knn.mutate.estimate
- gds.beta.knn.stats and gds.beta.knn.stats.estimate
- gds.beta.knn.stream and gds.beta.knn.stream.estimate
- gds.beta.knn.write and gds.beta.knn.write.estimate
The in memory graph can now support list properties, enabling embedding results to be stored in memory, or loading embeddings from nodes for KNN or similarity calculations.
Pregel framework
- Added Pregel annotation processor to generate GDS procedures for custom Pregel algorithms.
- Pregel now supports long and double array node values.
- Add support for composite node state to allow complex data types on nodes.
- Reduced memory consumption.
- Improved memory estimation.
- Simplified message iteration in compute methods.
- Split context into Init- and ComputeContext and simplified API.
- Removed K1ColoringExample standalone project.
- Added pregel-bootstrap standalone project.
- Added pregel-examples module.
Licensing: GDS Enterprise edition now requires license keys issued by Neo4j to unlock enterprise features
Added density property to the output of graph in graph.list.
Added a failIfMissing flag to gds.graph.drop

Bug fixes

Pregel:
- Fixed a bug in Pregel that could lead to incorrect results when running in parallel.
- Fix cast exception when returning array node properties in generated Pregel procedures.
Fixed a bug in a multi-source BFS traversal strategy that could affect the following procedures:
- gds.alpha.closeness
- gds.alpha.closeness.harmonic
- gds.alpha.allShortestPaths
Weakly connected components:
- Fixed a bug in WCC where componentCount would be negative when the graph is empty.
- Fixed a regression where WCC could run more slowly with increased concurrency.
Fixed bugs in Louvain:
- communityCount is no longer negative when the graph is empty.
- changes to maxIterations are no longer ignored.
Fixed a bug in LabelPropagation where communityCount would be negative when the graph is empty.
Fixed a bug in gds.graph.export where at most one relationship property per relationship type would be exported.
Graph loading:
- Fixed a bug where using node label projections including properties on large graphs and high concurrency could lead to loss of some properties.
- Fixed bug in graph creation which could cause an AIOOB exception during node loading.
- The readConcurrency config parameter can no longer be overwritten by the concurrency param when it is explicitly set in an implicit graph creation config
Fixed a bug in memory estimation of large anonymous fictitious graphs.
Fixed bug in gds.alpha.dfs/bfs, where the algorithm did not terminate for graphs containing loops.
Fixed result column name embeddings to embedding in GraphSAGE, to align with the other embeddings.
Fixed a bug in Node2Vec where many disconnected nodes would cause a StackOverflowError
Fixed a bug in RandomProjection each iteration weight was multiplied all previous iteration weights.
Similarity algorithms:
- Fixed a bug where Alpha Similarity algorithms would load a graph even though it was not needed
- Fixed a bug where similarity algorithms would not remove the placeholder graph if config validation fails on invalid user input.
Fixed a bug where community statistic computation could overflow for large community ids.
Fixed a bug where DegreeCentrality returned incorrect values when concurrency > 1.
Fixed a bug where ClosenessCentrality was using a slightly incorrect formula for Wasserman-Faust algorithm.
Fixed a bug that affected gds.triangleCount() and gds.alpha.triangles() where not all triangles would be counted under certain conditions.
Parallel edges in a graph no longer lead to incorrect Local Clustering Coefficient and Triangle Count results.

Improvements

gds.fastRP now accepts integer iterationWeights
If graphSage.train is run on a graph without relationships, GDS now fails gracefully with an appropriate error message
Added validation that properties used by GraphSage exist on graph
Added validation for <code>embeddingSize</code>>=1
Added a failIfExists flag to graph creation to enable a user to specify that if a graph already exists, it should be overwritten without failing.
Progress logging:
- We now log progress in equally spaced percentages. This is 0-100% either in steps of 1, or in larger steps if there are fewer than 100 batches. For example, if there are 50 batches, completing one batch means 2% progress, so it would log in steps of 2.
- Decreased the logging frequency when running with a high concurrency.
Added postProcessingMillis to gds.localClusteringCoefficient and gds.triangleCount for modes:
- mutate, write, stats
- It is always zero for now, but this is a standard result column for these modes
Parallelized computation of result statistics for the following community detection procedures:
- gds.wcc.write, gds.wcc.mutate and gds.wcc.stats
- gds.louvain.write, gds.louvain.mutate and gds.louvain.stats
- gds.labelPropagation.write, gds.labelPropagation.mutate and gds.labelPropagation.stats
- gds.beta.modularityOptimization.write and gds.beta.modularityOptimization.mutate
- gds.alpha.scc.write
Add graph schema to the result columns of gds.model.list and gds.model.drop
Validate property existence (e.g. seedProperty) when running algorithms on Cypher projections.
Improved memory estimation for * node projections.
Added validation that properties used by GraphSage exist on graph
Introduced parallel graph construction to improve performance of Louvain and Node Similarity
In-memory graphs in multidatabase:
- When in-memory graphs are created, they are now associated with the database in use during creation time to prevent errors when running in a multi-database environment.
- gds.graph.info() returns the database name the graph has been created on.
- Named graphs can only be used on the database they have been created on.

Recent Graph Data Science Releases

See All Graph Data Science Releases →

Release Notes: Graph Data Science 1.4.0 Preview

Breaking changes

New features

Bug fixes

Improvements

Recent Graph Data Science Releases

Stay Connected