Learn why Graph Data Science 2.1 Is Better Than Ever

We don’t take breaks at Neo4j – we’re following up GraphConnect with yet another awesome release of the Graph Data Science (GDS) library. Our engineers are constantly raising the bar, and some of the highlights in this release:

Node regression pipelines

Autotuning for ML pipelines

graph projection

database creation

graph export

ML Performance improvements

With this release, not only do you get more algorithms than ever before, but you also get access to the easiest to use and most scalable framework available.

Want to learn more? Keep reading ?

Making Graph Data Science Simple

With every release, we deliver features to make graph data science easier to use. This 2.1 update delivers on that promise with autotuning for machine learning, visual progress logging in the Python client, and filtering for similarity.

Autotuning: ML pipelines (nodeClassification, nodeRegression, linkPrediction) now support automated tuning for hyperparameters. Users configure the system, Neo4j Graph Data Science finds the best parameter combinations to provide the best performing models possible.
Source and Target filtering for KNN and Node Similarity: Similarity algorithms are some of the most popular, but often users do not need to compare every possible pair of nodes in their graph. Source and target filtering lets users limit the scope of similarity calculations to just the relevant nodes for each use case.

Visual Progress Logging in the Graph Data Science Python Client: Now, when users run algorithms or project graphs, a progress bar is displayed that shows the status of tasks.

Essential Data Science Capabilities

New alpha tier algorithm – Leiden: new community detection algorithm, a hierarchical clustering algorithm that guarantees well-connected communities. Similar to Louvain, users have requested this methodology to create more cohesive communities.

New alpha tier algorithm – K-means clustering: community detection algorithm intended to cluster nodes based on properties (like embeddings). Users can specify the numbers of clusters desired and Graph Data Science finds the optimal groupings.
New alpha tier ML pipeline – Node Regression: users can predict numerical property values for nodes using node regression pipelines. Node regression lets users fill in missing property values based on other node properties and graph topology.

Enterprise Scalability

Apache Arrow Integration for Graph Projections: import and export massive graphs directly into Graph Data Science at speeds up to 30 million objects/second.

Note: Apache Arrow integration for graph projections is available to Graph Data Science Enterprise Edition customers only.

Performance Improvements for Machine Learning: Through optimization of internal machine learning code, the training time for GraphSAGE embeddings is up to 90 percent faster, Random Forest model training is up to 80 percent faster, and Logistic Regression is up to 40 percent faster.

But Wait, There’s More!

Graph Data Science Python Client Improvements: The Graph Data Science Python Client can automatically use Apache Arrow for data movement on Enterprise licensed instances. Users can now specify the return format of data frames when streaming node properties or relationship results (pivoting rows and columns). The Graph Data Science Python Client supports all Graph Data Science 2.1 features.
Neo4j Data Warehouse Connector offers a simple way to move data between the Neo4j database and data warehouses like Snowflake, Google BigQuery, Amazon Redshift, or Microsoft Azure Synapse Analytics. It can be used as a Spark Submit Job (by providing a JSON configuration), or with a Scala/Python API that simplifies writing the Spark job to move data between the Neo4j database and the data warehouse.