Release Date: 12 August 2019
Breaking changes
- LabelPropagation: Default parameter values for seeding and writing have been removed – users must specify property names for these parameters.
New features
- MemRec: implemented memory requirements procedure for Cypher projected graphs and named graphs, as well as LabelPropagation.
- PageRank: PageRank now includes a tolerance parameter, which allows the user run pageRank until values stabilize within the specified tolerance window. Previous versions only allowed a user to specify the number of iterations; tolerance allows users to terminate pageRank early if values have stabilized or run as many iterations as needed to return a stable value. The best practice is to specify both a tolerance and a maximum number of iterations.
- ConnectedComponents (Union Find):
- Accepts a seedProperty parameter to set initial partition values (when running Connected Components multiple times over a graph, after the addition of new data, this enables users to preserve existing partition IDs).
- Accepts a consecutiveIds parameter to allow users to specify that partitions should be labeled with successive integers (no gaps).
- Label Propagation: updates to label propagation enable it to return identical results when run multiple times on the same graph, and select the smaller community label when breaking ties. These changes make our LPA implementation compatible the LDBC graphalytics benchmark, and makes it possible for users to run lpa in production settings where maintaining the results across multiple runs is an important requirement.
- Beta Namespace: as the product team implements new algorithms or variants of existing algorithms, we have added a beta namespace to give users early access to these new features, without breaking existing syntax. The first algorithm with a beta implementation is label propagation (algo.beta.labelPropagation), which previews new syntax for seeding.
Bug fixes
- Huge Graph: Fixed a batch offset bug which sometimes caused the loader to stop reading properties after the first 260M nodes when importing a subgraph.
Improvements
- Error handling: Algorithms throw an exception if a user specified property does not exist for any node or relationship in the graph.
- Union Find: New implementation for parallel union find that makes better use of available threads and consumes less memory.
- LabelPropagation: in no writeProperty is specified, label propagation will not write partition values to the graph (to prevent inadvertently overwriting values), and the partitionProperty parameter has been renamed seedProperty to reflect its functionality.
Deprecations
- Connected Components (Union Find): algo.unionFind.forkJoin, algo.unionFind.forkJoinMerge, and algo.unionFind.queue have all been deprecated in favor of algo.unionFind (as well as their streaming variants). The improved parallel implementation of unionFind obviates the need for these experimental implementations.
Syntax Changes:
- LabelPropagation: partitionProperty has been renamed to seedProperty.
Recent Graph Data Science Releases
- Graph Data Science 2.12
- Graph Data Science 2.11
- Graph Data Science 2.10.1
- Graph Data Science 2.9.0
- Graph Data Science 2.8.0