Release Date: 19 October 2020
GDS 1.4.0 is compatible with Neo4j 4.0 and 4.1, but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.
Breaking changes
- Removed sparsity parameter from
gds.alpha.randomProjection.* - Renamed
gds.alpha.randomProjectiontogds.fastRPdue to productization. - Renamed
embeddingSizeparameter toembeddingDimensionfor fastRP, GraphSAGE and Node2Vec. - Renamed
gds.alpha.randomProjectiontogds.fastRPdue to productization. - Default parameters for
gds.fastRPhave changed on the following configuration parameters:iterationWeightsnow has default[0.0, 1.0, 1.0]normalizeL2has been removed and its effect is always applied
- Removed alpha procedures for GraphSage (replaced with
betatier, see New Features section)gds.alpha.graphSage.streamgds.alpha.graphSage.write
- GraphSage no longer directly calculates embeddings, instead it has been split into
train(to generate a named model) andwrite, mutate, andstreamto apply the model predictions to your data. - Due to the creation of a
trainmode for graph sage, the following configuration parameters were moved:embeddingSize– moved as configuration parameter ofgds.beta.graphSage.trainaggregator– moved as configuration parameter ofgds.beta.graphSage.trainactivationFunction– moved as configuration parameter ofgds.beta.graphSage.trainsampleSizes– moved as configuration parameter ofgds.beta.graphSage.trainnodePropertyNames– moved as configuration parameter ofgds.beta.graphSage.traintolerance– moved as configuration parameter ofgds.beta.graphSage.trainlearningRate– moved as configuration parameter ofgds.beta.graphSage.trainepochs– moved as configuration parameter ofgds.beta.graphSage.trainmaxIterations– moved as configuration parameter ofgds.beta.graphSage.trainsearchDepth– moved as configuration parameter ofgds.beta.graphSage.trainnegativeSampleWeight– moved as configuration parameter ofgds.beta.graphSage.traindegreeAsProperty– moved as configuration parameter ofgds.beta.graphSage.train
gds.beta.graphSage.streamprocedure now requiresmodelNameconfiguration parameter.gds.beta.graphSage.writeprocedure requiresmodelNameconfiguration parameter.- Removed
startLossandepochLossesfrom the result columns ofgds.beta.graphSage.write. - Added the graph create config as a return field to the train procedure, affecting
gds.beta.graphSage.train - Fixed result column name
embeddingstoembeddingin GraphSAGE, to align with the other embeddings. - Removed configuration parameter
maxCostfromgds.alpha.bfs/dfs. - Unlocking the Enterprise Edition of the Graph Data Science library requires a license key. The previous config setting has been removed.
- Removed
degreeDistributionfromgds.graph.dropreturn columns. gds.pageRanknow respects the concurrency setting. It will not run if there is insufficient memory for the given concurrency setting.- Alpha similarity algorithms no longer accept graph name as a parameter. The algorithm never used the named graph, and now the possibility to specify one is removed.
New features
- Promote GraphSage to
betatier and added support for inductive models with thetrainmode- This adds procedures
gds.beta.graphSage.mutategds.beta.graphSage.mutate.estimategds.beta.graphSage.streamgds.beta.graphSage.stream.estimategds.beta.graphSage.traingds.beta.graphSage.train.estimategds.beta.graphSage.writegds.beta.graphSage.write.estimate
- And removes alpha procedures
gds.alpha.graphSage.streamgds.alpha.graphSage.write
- This adds procedures
- GraphSage supports relationship weights, driven by
relationshipWeightProperty - GraphSage supports node labels via
projectedFeatureSize - Introduced the model catalog to manage trained models, including:
gds.beta.model.exists– a procedure to check if a model exists in the catalogGds.beta.model.list– list all available modelsgds.beta.model.drop– removes a model from the catalog
- The Random Projection algorithm has been promoted to the product tier and we have added:
gds.fastRP.statsgds.fastRP.mutategds.fastRP.estimate- Added procedures for
statsandmutatemode, as well as,estimatesfor all modes.
- FastRP has been extended to support relationship weights and directions
- FastRP supports integer configuration for iteration weights.
- We’ve added support for node property features for FastRP in the beta namespace with FastRPExtended:
gds.beta.fastRPExtended.mutategds.beta.fastRPExtended.streamgds.beta.fastRPExtended.statsgds.beta.fastRPExtended.writegds.beta.fastRPExtended.mutate.estimategds.beta.fastRPExtended.stream.estimategds.beta.fastRPExtended.stats.estimategds.beta.fastRPExtended.write.estimate
- We’ve added the K-Nearest Neighbors (KNN) algorithm to the beta tier:
gds.beta.knn.mutateandgds.beta.knn.mutate.estimategds.beta.knn.statsandgds.beta.knn.stats.estimategds.beta.knn.streamandgds.beta.knn.stream.estimategds.beta.knn.writeandgds.beta.knn.write.estimate
- The in memory graph can now support list properties, enabling embedding results to be stored in memory, or loading embeddings from nodes for KNN or similarity calculations.
- Pregel framework
- Added Pregel annotation processor to generate GDS procedures for custom Pregel algorithms.
- Pregel now supports long and double array node values.
- Add support for composite node state to allow complex data types on nodes.
- Reduced memory consumption.
- Improved memory estimation.
- Simplified message iteration in
computemethods. - Split context into Init- and ComputeContext and simplified API.
- Removed
K1ColoringExamplestandalone project. - Added
pregel-bootstrapstandalone project. - Added
pregel-examplesmodule.
- Licensing: GDS Enterprise edition now requires license keys issued by Neo4j to unlock enterprise features
- Added
densityproperty to the output of graph ingraph.list. - Added a
failIfMissingflag togds.graph.drop
Bug fixes
- Pregel:
- Fixed a bug in Pregel that could lead to incorrect results when running in parallel.
- Fix cast exception when returning array node properties in generated Pregel procedures.
- Fixed a bug in a multi-source BFS traversal strategy that could affect the following procedures:
gds.alpha.closenessgds.alpha.closeness.harmonicgds.alpha.allShortestPaths
- Weakly connected components:
- Fixed a bug in WCC where
componentCountwould be negative when the graph is empty. - Fixed a regression where WCC could run more slowly with increased concurrency.
- Fixed a bug in WCC where
- Fixed bugs in Louvain:
-
communityCountis no longer negative when the graph is empty. - changes to
maxIterationsare no longer ignored.
-
- Fixed a bug in LabelPropagation where
communityCountwould be negative when the graph is empty. - Fixed a bug in
gds.graph.exportwhere at most one relationship property per relationship type would be exported. - Graph loading:
- Fixed a bug where using node label projections including properties on large graphs and high concurrency could lead to loss of some properties.
- Fixed bug in graph creation which could cause an AIOOB exception during node loading.
- The
readConcurrencyconfig parameter can no longer be overwritten by theconcurrencyparam when it is explicitly set in an implicit graph creation config
- Fixed a bug in memory estimation of large anonymous fictitious graphs.
- Fixed bug in
gds.alpha.dfs/bfs, where the algorithm did not terminate for graphs containing loops. - Fixed result column name
embeddingstoembeddingin GraphSAGE, to align with the other embeddings. - Fixed a bug in Node2Vec where many disconnected nodes would cause a StackOverflowError
- Fixed a bug in RandomProjection each iteration weight was multiplied all previous iteration weights.
- Similarity algorithms:
- Fixed a bug where Alpha Similarity algorithms would load a graph even though it was not needed
- Fixed a bug where similarity algorithms would not remove the placeholder graph if config validation fails on invalid user input.
- Fixed a bug where community statistic computation could overflow for large community ids.
- Fixed a bug where DegreeCentrality returned incorrect values when concurrency > 1.
- Fixed a bug where ClosenessCentrality was using a slightly incorrect formula for Wasserman-Faust algorithm.
- Fixed a bug that affected
gds.triangleCount()andgds.alpha.triangles()where not all triangles would be counted under certain conditions. - Parallel edges in a graph no longer lead to incorrect Local Clustering Coefficient and Triangle Count results.
Improvements
gds.fastRPnow accepts integer iterationWeights- If
graphSage.trainis run on a graph without relationships, GDS now fails gracefully with an appropriate error message - Added validation that properties used by GraphSage exist on graph
- Added validation for <code>embeddingSize</code>>=1
- Added a failIfExists flag to graph creation to enable a user to specify that if a graph already exists, it should be overwritten without failing.
- Progress logging:
- We now log progress in equally spaced percentages. This is 0-100% either in steps of 1, or in larger steps if there are fewer than 100 batches. For example, if there are 50 batches, completing one batch means 2% progress, so it would log in steps of 2.
- Decreased the logging frequency when running with a high concurrency.
- Added
postProcessingMillistogds.localClusteringCoefficientandgds.triangleCountfor modes:mutate,write,stats- It is always zero for now, but this is a standard result column for these modes
- Parallelized computation of result statistics for the following community detection procedures:
gds.wcc.write,gds.wcc.mutateandgds.wcc.statsgds.louvain.write,gds.louvain.mutateandgds.louvain.statsgds.labelPropagation.write,gds.labelPropagation.mutateandgds.labelPropagation.statsgds.beta.modularityOptimization.writeandgds.beta.modularityOptimization.mutategds.alpha.scc.write
- Add graph schema to the result columns of
gds.model.listandgds.model.drop - Validate property existence (e.g.
seedProperty) when running algorithms on Cypher projections. - Improved memory estimation for
*node projections. - Added validation that properties used by GraphSage exist on graph
- Introduced parallel graph construction to improve performance of Louvain and Node Similarity
- In-memory graphs in multidatabase:
- When in-memory graphs are created, they are now associated with the database in use during creation time to prevent errors when running in a multi-database environment.
gds.graph.info()returns the database name the graph has been created on.- Named graphs can only be used on the database they have been created on.
Recent Graph Data Science Releases
- Graph Data Science 2.22
- Graph Data Science 2.21
- Graph Data Science 2.20
- Graph Data Science 2.19
- Graph Data Science 2.18