Graph Embeddings

Goals

In this guide, we will learn about graph embeddings.

Prerequisites

Please have Neo4j (version 4.0 or later) and Graph Data Science Library (version 2.0 or later) downloaded and installed to use graph embeddings.

Intermediate

What are graph embeddings?

A graph embedding determines a fixed length vector representation for each entity (usually nodes) in our graph. These embeddings are a lower dimensional representation of the graph and preserve the graph’s topology.

Node embedding techniques usually consist of the following functions:

Figure 1. Image from https://arxiv.org/pdf/1709.05584.pdf

Similarity function: measures the similarity between nodes
Encoder function: generates the node embedding
Decoder function: reconstructs pairwise similarity
Loss function: checks the quality of the reconstruction

Use cases for graph embeddings

There are several use cases that are well suited for graph embeddings:

We can visually explore the data by reducing the embeddings to 2 or 3 dimensions with the help of algorithms like t-distributed stochastic neighbor embedding (t-SNE) and Principle Component Analysis (PCA).
We could build a kNN similarity graph from the embeddings. The similarity graph could then be used to make recommendations as part of a k-Nearest Neighbors query.
We can use the embeddings as the features to feed into a machine learning model, rather than generating those features by hand. In this use case, embeddings can be considered an implementation of Representational Learning.

Neo4j’s graph embeddings

The Neo4j Graph Data Science Library supports several graph embedding algorithms.

Name	Speed	Supports node properties?	Implementation details
Fast Random Projection	Fast	Yes	Linear Algebra
node2Vec	Intermediate	No	Neural Network
GraphSAGE	Slow	Yes	Neural Network

Name

Speed

Supports node properties?

Implementation details

Fast Random Projection

Fast

Yes

Linear Algebra

node2Vec

Intermediate

Neural Network

GraphSAGE

Slow

Yes

Neural Network

All the embedding algorithms work on a monopartite undirected input graph.

Fast Random Projection: The Fast Random Projection embedding uses sparse random projections to generate embeddings. It is an implementation of the FastRP algorithm. It is the fastest of the embedding algorithms and can therefore be useful for obtaining baseline embeddings. The embeddings it generates are often equally performant as more complex algorithms that take longer to run. Additionally, the FastRP algorithm supports the use of node properties as features for generating embeddings, as described here.

Read Fast Random Projection reference documentation

node2Vec: node2Vec computes embeddings based on biased random walks of a node’s neighborhood. The algorithm trains a single-layer feedforward neural network, which is used to predict the likelihood that a node will occur in a walk based on the occurrence of another node. node2Vec has parameters that can be tuned to control whether the random walks behave more like breadth first or depth first searches. This tuning allows the embedding to either capture homophily (similar embeddings capture network communities) or structural equivalence (similar embeddings capture similar structural roles of nodes).

Read node2Vec reference documentation

GraphSAGE: This algorithm is the only one that supports node properties. Training embeddings that include node properties can be useful for including information beyond the topology of the graph, like meta data, attributes, or the results of other graph algorithms. GraphSAGE differs from the other algorithms in that it learns a function to calculate an embedding rather than training individual embeddings for each node.

Read GraphSAGE reference documentation

Was this page helpful?