Release Date: 10 October 2022
GDS 2.2.0 is compatible with Neo 4.3 versions ≥ 4.3.15 and 4.4 ≥ 4.4.9.
For GDS compatibility with previous releases of 4.3 and 4.4, please use please see GDS 2.1.6. The 2.1 series is also incompatible with Neo4j 3.5.x, 4.0, 4.1, and 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.8
Breaking changes
- Link Prediction filtering:
- Change graph filtering in
gds.beta.pipeline.linkPrediction.train- Replace parameter
nodeLabelswithsourceNodeLabelandtargetNodeLabel. - Replace parameter
relationshipTypeswithtargetRelationshipType.
- Replace parameter
- Change graph filtering in
gds.beta.pipeline.linkPrediction.predict- Replace parameter
nodeLabelswith optionalsourceNodeLabelsandtargetNodeLabels. By default, they will be derived from the model's train configuration. - Change the default value for
relationshipTypeswith thetargetRelationshipTypefrom the model's train configuration.
- Replace parameter
- Change graph filtering in
- Node Classification & Regression filtering:
- Change graph filtering in
gds.beta.pipeline.nodeClassification.trainandgds.beta.pipeline.nodeRegression.train- Replace parameter
nodeLabelswithtargetNodeLabels
- Replace parameter
- Change graph filtering in
gds.beta.pipeline.nodeClassification.predictandgds.beta.pipeline.nodeRegression.predict- Replace parameter
nodeLabelswithtargetNodeLabelsBy default, they will be derived from the model's train configuration.
- Replace parameter
- Change graph filtering in
- Promoting Collapse Path to beta tier
- Changed the procedure name to
gds.beta.collapsePath.mutate - Use parameter
pathTemplatesto now specify multiplepath templates.
- Changed the procedure name to
- Promoting CELF to
betatier- Moved
gds.alpha.influenceMaximization.celf.streamtogds.beta.influenceMaximization.celf.stream
- Moved
- For graphs created, with
gds.graph.project.cypher, reduce output ofgds.graph.listto only print the names ofparameters. This will avoid printing the parameter values, which potentially leads to long procedure execution times. - RandomWalk algorithm promoted to product tier
gds.beta.randomWalk.stats=>gds.randomWalk.statsgds.beta.randomWalk.stats.estimate=>gds.randomWalk.stats.estimategds.beta.randomWalk.stream=>gds.randomWalk.streamgds.beta.randomWalk.stream.estimate=>gds.randomWalk.stream.estimate
- Removed
debug_logconfig field from Arrow Create Database action. - Node2Vec uses new embedding initializer
NORMALIZEDas default. - Dropped support for older patches:
- for 4.3, only 4.3.15 and later is supported
- for 4.4, only 4.4.9 and later is supported
New features
- Link Prediction filtering:
- Supports heterogeneous LinkPrediction pipelines by allowing configuring which node labels and relationship type to train and predict for.
- See Breaking changes above for more details.
- K-means:
- Added centroids and average node-centroid distance to result for Mutate, Stats, and Write modes.
- Added distance to centroid per node result in Stream mode.
- Introduced a parameter
numberOfRestartsthat runs K-Means multiple times and picks the one with the minimum node-centroid distance. - Introduced a parameter
computeSilhouettethat if enabled will compute silhouette related metrics. - Introduced a parameter
initialSamplerwhich can select different sampling strategies for picking the first centroids.- Added the
K-means++initialization algorithm which can be enabled by settinginitialSampler=kmeans++.
- Added the
- Introduced the parameter
seedCentroidswhich seeds input centroids to k-means (in negation of the above).
- Introduced a new scaler
CenterforScalePropertiesthat subtracts the mean from each value. - Expose
penaltyL2to configure the l2 regularization term to the loss function ingds.beta.graphSage.train. - Add Multilayer Perceptron as a training method for node classification (
gds.alpha.pipeline.nodeClassification.addMLP) and link prediction (gds.alpha.pipeline.linkPrediction.addMLP). - Add
SAME_CATEGORYfeature type togds.beta.pipeline.linkPrediction.addFeature. - Added new procedure
gds.beta.graph.relationships.streamthat streams relationship topology. - Added arrow export endpoint
gds.beta.graph.relationships.streamthat streams relationship topology. - Added new procedure
gds.alpha.graph.sample.rwrthat creates a new graph projection by sampling using random walk with restarts. - Added the ability to collapse multiple paths using
gds.beta.collapsePath.mutate. - Promoting CELF algorithm to
betatier.- Added
gds.beta.influenceMaximization.celf.stats - Added
gds.beta.influenceMaximization.celf.mutate - Added
gds.beta.influenceMaximization.celf.write - Added progress tracking capabilities.
- Added memory estimation.
- Added
- Progress tracking for KMeans algorithm.
- Memory estimation for KMeans.
- added
gds.alpha.kmeans.mutate.estimate - added
gds.alpha.kmeans.stats.estimate - added
gds.alpha.kmeans.stream.estimate - added
gds.alpha.kmeans.write.estimate
- added
- Added procedure to compute modularity for pre-computed communities.
gds.alpha.modularity.statsgds.alpha.modularity.stream
- Added new config options to the GDS Flight server.
gds.arrow.encryption.neverdeactivates the server encryption even if it would otherwise be enabled.gds.arrow.advertised_listen_addresssets the server location that clients should connect to.
- Added support for importing
Stringnode identifiers for the ArrowCREATE_DATABASEaction. - Added capability to run BetweennessCentrality using relationship weights.
- Added
relationshipWeightPropertyoptional configuration parameter.
- Added
- Added
statsmode procedures for RandomWalk.gds.beta.randomWalk.statsgds.beta.randomWalk.stats.estimate
- Introduced the ability to configure defaults and limits for configuration parameters.
gds.alpha.config.defaults.listgds.alpha.config.defaults.setgds.alpha.config.limits.listgds.alpha.config.limits.set
- Introduce new configuration parameters
contextNodeLabelsandcontextRelationshipTypesin nodePropertySteps.gds.beta.pipeline.linkPrediction.addNodePropertygds.beta.pipeline.nodeClassification.addNodePropertygds.alpha.pipeline.nodeRegression.addNodeProperty- The context is used to enlarge the input graph to the node property steps when running
gds.beta.pipeline.linkPrediction.addNodeProperty.[train|predict],gds.beta.pipeline.nodeClassification.[train|predict]andgds.alpha.pipeline.nodeRegression.[train|predict].
Leiden- Add capability to mutate
intermediateCommunitieswhenincludeIntermediateCommunitiesis set totrue. - Add capability to write
intermediateCommunitieswhenincludeIntermediateCommunitiesis set totrue.
- Add capability to mutate
- Node2Vec adds new embedding initializer
NORMALIZEDconfigured with the parameterembeddingInitializer.
Bug fixes
- Fixed a bug where eager checking for business rules around GDS on a Neo4j cluster would cause the cluster to fail to start.
- Fixed a bug where Neo4j users with
adminrole could not see all graphs in the catalog on GDS enterprise. - Fixed a bug in random graph generation where the resulting graph can end up with an incorrect relationship schema.
- Fixed a bug where a schema filter would not create a deep copy of the property schema map.
- Fixed a bug where modularity could have been incorrectly updated in ModularityOptimization. This may affect the number of performed iterations for ModularityOptimization or number of levels for Louvain.
- Fixed a bug where restoring from csv could not read values wrapped in quotes.
- Fixed a bug where KNN did not use the expected search space. This will improve the result but also increase the runtime.
- Fixed a bug in ML autotuning where
maxTrialsincluded model evaluations with concrete configs. - Fixed a bug where
gds.triangleCountandgds.localClusteringCoefficientwere allowed to run on directed graphs. - Fixed a bug in
gds.graph.exportand Arrow DB import where thewriteConcurrencywas not respected. - Fixed a bug with Node Operations where
gds.graph.nodeProperties.write,gds.graph.nodeProperties.dropandgds.graph.nodeProperties/y.streamwould not acceptStringinput for parametersnodeLabelsand/ornodeProperties. - Fixed a bug, where Node2Vec would report negative losses.
- Fixed a bug with
gds.graph.nodeProperties/y.stream, where the wrong nodes where returned when specifying anodeLabelsfilter and using Arrow. - Fixed a bug in the Louvain algorithm, where aggregating dense communities could potentially lead to an exception.
- Fixed a bug where model loading is attempted even for unlicensed user, which might fail database startup.
Improvements
- Better error handling in K-means
- Improve memory estimation for
gds.beta.pipeline.linkPrediction.trainwhen the nodePropertySteps used a weighted graph. - Improve runtime of feature generation in
gds.beta.linkPrediction.[train|predict]. - Improve performance of
gds.graph.project.cypherby using the subscriber API. - Improve convergence criteria for
LogisticRegressionandLinearRegressiontrainers, by making it independent of the number of batches. This affectsgds.alpha.pipeline.nodeRegression.train,gds.beta.pipeline.[linkPrediction|nodeClassification].train. - Improve error handling on invalid user input.
- Cypher on GDS projections is now capable of setting labels on nodes.
- Promoting CELF algorithm to
betatier.- Improved performance.
- A new column
serverLocationingds.debug.arrow()that shows the actual location where the server is running, which might be different from the configured location. - Improve runtime of KNN by reusing similarity computations. This also affects
gds.beta.pipeline.linkPrediction.predictwhen using the approximate search strategy. - Configuration keys for Node-/Relationship- and Property-Projections are now case-insensitive.
- The
gds.debug.sysInfoprocedure now shows the license expiration date when run with a valid GDS license. - Role-based access control not only for licensed, commercial users, but for everyone.
- The arrow create database endpoint is now capable of creating properties with an improved range of property types: string, string[], datetime, local datetime
- Improve error message thrown when calling
gds.beta.[nodeClassification|linkPrediction].[train|predict]for too small graphs. - Added a new, optional method
closetoPregelComputation, allowing implementers to close any opened resources, such as ThreadLocals. - Added new feature toggle procedures for enabling / disabling Arrow database import (default: enabled)
gds.features.enableArrowDatabaseImportgds.features.enableArrowDatabaseImport.reset
- Runtime improvements for
RandomWalkespecially for the case of first order random walks.
Other changes
- Renamed and deprecated some graph management procedures:
- Renamed
gds.alpha.graph.removeGraphPropertytogds.alpha.graph.graphProperty.drop. - Renamed
gds.alpha.graph.streamGraphPropertytogds.alpha.graph.graphProperty.stream. - Deprecated
gds.graph.removeNodePropertiesby new proceduregds.graph.nodeProperties.drop. - Deprecated
gds.graph.streamNodePropertiesby new proceduregds.graph.nodeProperties.stream. - Deprecated
gds.graph.streamNodePropertyby new proceduregds.graph.nodeProperty.stream. - Deprecated
gds.graph.streamRelationshipPropertiesby new proceduregds.graph.relationshipProperties.stream. - Deprecated
gds.graph.streamRelationshipPropertyby new proceduregds.graph.relationshipProperty.stream. - Deprecated
gds.graph.writeNodePropertiesby new proceduregds.graph.nodeProperties.write. - Deprecated
gds.graph.writeRelationshipby new proceduregds.graph.relationship.write. - Deprecated
gds.graph.deleteRelationshipsby new proceduregds.graph.relationships.drop.
- Renamed
- CSV format changes for backup/restore
- Export no longer writes databaseId field
- Import still can read databaseId field but doesn't use it
- Deprecated
enableDebugLogconfig option forgds.graph.export.
Recent Graph Data Science Releases
- Graph Data Science 2.23
- Graph Data Science 2.22
- Graph Data Science 2.21
- Graph Data Science 2.20
- Graph Data Science 2.19