GraphSAGE
This feature is in the beta tier. For more information on feature tiers, see API Tiers.
Glossary
- Directed
-
Directed trait. The algorithm is well-defined on a directed graph.
- Directed
-
Directed trait. The algorithm ignores the direction of the graph.
- Directed
-
Directed trait. The algorithm does not run on a directed graph.
- Undirected
-
Undirected trait. The algorithm is well-defined on an undirected graph.
- Undirected
-
Undirected trait. The algorithm ignores the undirectedness of the graph.
- Heterogeneous nodes
-
Heterogeneous nodes fully supported. The algorithm has the ability to distinguish between nodes of different types.
- Heterogeneous nodes
-
Heterogeneous nodes allowed. The algorithm treats all selected nodes similarly regardless of their label.
- Heterogeneous relationships
-
Heterogeneous relationships fully supported. The algorithm has the ability to distinguish between relationships of different types.
- Heterogeneous relationships
-
Heterogeneous relationships allowed. The algorithm treats all selected relationships similarly regardless of their type.
- Weighted relationships
-
Weighted trait. The algorithm supports a relationship property to be used as weight, specified via the relationshipWeightProperty configuration parameter.
- Weighted relationships
-
Weighted trait. The algorithm treats each relationship as equally important, discarding the value of any relationship weight.
GraphSAGE is an inductive algorithm for computing node embeddings. GraphSAGE is using node feature information to generate node embeddings on unseen nodes or graphs. Instead of training individual embeddings for each node, the algorithm learns a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood.
The algorithm is defined for UNDIRECTED graphs. |
For more information on this algorithm see:
Considerations
Isolated nodes
If you are embedding a graph that has an isolated node, the aggregation step in GraphSAGE can only draw information from the node itself.
When all the properties of that node are 0.0
, and the activation function is ReLU, this leads to an all-zero vector for that node.
However, since GraphSAGE normalizes node embeddings using the L2-norm, and a zero vector cannot be normalized, we assign all-zero embeddings to such nodes under these special circumstances.
In scenarios where you generate all-zero embeddings for orphan nodes, that may have impacts on downstream tasks such as nearest neighbor or other similarity algorithms. It may be more appropriate to filter out these disconnected nodes prior to running GraphSAGE.
Memory estimation
When doing memory estimation of the training, the feature dimension is computed as if each feature property is scalar.
Graph pre-sampling to reduce time and memory
Since training a GraphSAGE model may take a lot of time and memory on large graphs, it can be helpful to sample a smaller subgraph prior to training, and then training on that subgraph. The trained model can still be applied to predict embeddings on the full graph (or other graphs) since GraphSAGE is inductive. To sample a structurally representative subgraph, see Random walk with restarts sampling.
Usage in machine learning pipelines
It may be useful to generate node embeddings with GraphSAGE as a node property step in a machine learning pipeline (like Link prediction pipelines and Node property prediction).
It is not supported to train the GraphSAGE model inside the pipeline, but rather one must first train the model outside the pipeline.
Once the model is trained, it is possible to add GraphSAGE as a node property step to a pipeline using gds.beta.graphSage
or the shorthand beta.graphSage
as the procedureName
procedure parameter, and referencing the trained model in the procedure configuration map as one would with the predict mutate mode.
Tuning parameters
In general tuning parameters is very dependent on the specific dataset.
Embedding dimension
The size of the node embedding as well as its hidden layer. A large embedding size captures more information, but increases the required memory and computation time. A small embedding size is faster, but can cause the input features and graph topology to be insufficiently encoded in the embedding.
Aggregator
An aggregator defines how to combine a node’s embedding and the sampled neighbor embeddings from the previous layer.
GDS supports the Mean
and Pool
aggregators.
Mean
is simpler, requires less memory and is faster to compute.
Pool
is more complex and can encode a richer neighbourhood.
Activation function
The activation function is used to convert the input of a neuron in the neural network.
We support Sigmoid
and leaky ReLu
.
Sample sizes
Each sample size represents a hidden layer with an output of size equal to the embedding dimension.
The layer uses the given aggregator and activation function.
More layers result in more distant neighbors being considered for a node’s embedding.
Layer N
uses the sampled neighbor embeddings of distance <\= N
at Layer N -1
.
The more layers the higher memory and computation time.
A sample size n
means we try to sample at most n
neighbors from a node.
Higher sample sizes also require more memory and computation time.
Batch size
This parameter defines how many training examples are grouped in a single batch.
For each training example, we will also sample a positive and a negative example.
The gradients are computed concurrently on the batches using concurrency
many threads.
The batch size does not affect the model quality, but can be used to tune for training speed. A larger batch size increases the memory consumption of the computation.
Epochs
This parameter defines the maximum number of epochs for the training. Before each epoch, the new neighbors are sampled for each layer as specified in Sample sizes. Independent of the model’s quality, the training will terminate after these many epochs. Note, that the training can also stop earlier if an epoch converged if the loss converged (see Tolerance).
Setting this parameter can be useful to limit the training time for a model. Restricting the computational budget can serve the purpose of regularization and mitigate overfitting, which becomes a risk with a large number of epochs.
Because each epoch resamples neighbors, multiple epochs avoid overfitting on specific neighborhoods.
Max Iterations
This parameter defines the maximum number of iterations run for a single epoch. Each iteration uses the gradients of randomly sampled batches, which are summed and scaled before updating the weights. The number of sampled batches is defined via Batch sampling ratio. Also, it is verified if the loss converged (see Tolerance).
A high number of iterations can lead to overfitting for a specific sample of neighbors.
Batch sampling ratio
This parameter defines the number of batches to sample for a single iteration.
The more batches are sampled, the more accurate the gradient computation will be. However, more batches also increase the runtime of each single iteration.
In general, it is recommended to make sure to use at least the same number of batches as the defined concurrency
.
Search depth
This parameter defines the maximum depth of the random walks which sample positive examples for each node in a batch.
How close similar nodes are depends on your dataset and use case.
Negative-sample weight
This parameter defines the weight of the negative samples compared to the positive samples in the loss computation. Higher values increase the impact of negative samples in the loss and decreases the impact of the positive samples.
Penalty L2
This parameter defines the influence of the regularization term on the loss function. The l2 penalty term is computed over all the weights from the layers defined based on the Aggregator and Sample sizes.
While the regularization can avoid overfitting, a high value can even lead to underfitting. The minimal value is zero, where the regularization term has no effect at all.
Learning rate
When updating the weights, we move in the direction dictated by the Adam optimizer based on the loss function’s gradients. The learning rate parameter dictates how much to update the weights after each iteration.
Tolerance
This parameter defines the convergence criteria of an epoch.
An epoch converges if the loss of the current iteration and the loss of the previous iteration differ by less than the tolerance
.
A lower tolerance results in more sensitive training with a higher probability to train longer. A high tolerance means a less sensitive training and hence resulting in earlier convergence.
Syntax
CALL gds.beta.graphSage.train(
graphName: String,
configuration: Map
) YIELD
modelInfo: Map,
configuration: Map,
trainMillis: Integer
Name | Type | Default | Optional | Description |
---|---|---|---|---|
graphName |
String |
|
no |
The name of a graph stored in the catalog. |
configuration |
Map |
|
yes |
Configuration for algorithm-specifics and/or graph filtering. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
modelName |
String |
|
no |
The name of the model to train, must not exist in the Model Catalog. |
featureProperties |
List of String |
|
no |
The names of the node properties that should be used as input features. All property names must exist in the projected graph and be of type Float or List of Float. |
List of String |
|
yes |
Filter the named graph using the given node labels. Nodes with any of the given labels will be included. |
|
List of String |
|
yes |
Filter the named graph using the given relationship types. Relationships with any of the given types will be included. |
|
Integer |
|
yes |
The number of concurrent threads used for running the algorithm. |
|
String |
|
yes |
An ID that can be provided to more easily track the algorithm’s progress. |
|
Boolean |
|
yes |
If disabled the progress percentage will not be logged. |
|
embeddingDimension |
Integer |
|
yes |
The dimension of the generated node embeddings as well as their hidden layer representations. |
aggregator |
String |
|
yes |
The aggregator to be used by the layers. Supported values are "Mean" and "Pool". |
activationFunction |
String |
|
yes |
The activation function to be used in the model architecture. Supported values are "Sigmoid" and "ReLu". |
sampleSizes |
List of Integer |
|
yes |
A list of Integer values, the size of the list determines the number of layers and the values determine how many nodes will be sampled by the layers. |
projectedFeatureDimension |
Integer |
|
yes |
The dimension of the projected |
batchSize |
Integer |
|
yes |
The number of nodes per batch. |
Float |
|
yes |
Tolerance used for the early convergence of an epoch, which is checked after each iteration. |
|
learningRate |
Float |
|
yes |
The learning rate determines the step size at each iteration while moving toward a minimum of a loss function. |
epochs |
Integer |
|
yes |
Number of times to traverse the graph. |
Integer |
|
yes |
Maximum number of iterations per epoch. Each iteration the weights are updated. |
|
batchSamplingRatio |
Float |
|
yes |
Sampling ratio of batches to consider per weight updates. By default, each thread evaluates a single batch. |
searchDepth |
Integer |
|
yes |
Maximum depth of the RandomWalks to sample nearby nodes for the training. |
negativeSampleWeight |
Integer |
|
yes |
The weight of the negative samples. |
String |
|
yes |
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted. |
|
randomSeed |
Integer |
|
yes |
A random seed which is used to control the randomness in computing the embeddings. |
penaltyL2 |
Float |
|
yes |
The influence of the l2 penalty term to the loss function. |
storeModelToDisk |
Boolean |
|
yes |
Automatically store model to disk after training. |
Name | Type | Description |
---|---|---|
|
Map |
Details of the trained model. |
|
Map |
The configuration used to run the procedure. |
|
Integer |
Milliseconds to train the model. |
Name | Type | Description |
---|---|---|
|
String |
The name of the trained model. |
|
String |
The type of the trained model. Always |
|
Map |
Metrics related to running the training, details in the table below. |
Name | Type | Description |
---|---|---|
|
Integer |
The number of ran epochs during training. |
|
List |
The average loss per node after each epoch. |
|
List of List of Float |
The average loss per node after each iteration for each epoch. |
|
Boolean |
Indicates if the training has converged. |
CALL gds.beta.graphSage.stream(
graphName: String,
configuration: Map
) YIELD
nodeId: Integer,
embedding: List
Name | Type | Default | Optional | Description |
---|---|---|---|---|
graphName |
String |
|
no |
The name of a graph stored in the catalog. |
configuration |
Map |
|
yes |
Configuration for algorithm-specifics and/or graph filtering. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
modelName |
String |
|
no |
The name of a GraphSAGE model in the model catalog. |
List of String |
|
yes |
Filter the named graph using the given node labels. Nodes with any of the given labels will be included. |
|
List of String |
|
yes |
Filter the named graph using the given relationship types. Relationships with any of the given types will be included. |
|
Integer |
|
yes |
The number of concurrent threads used for running the algorithm. |
|
String |
|
yes |
An ID that can be provided to more easily track the algorithm’s progress. |
|
Boolean |
|
yes |
If disabled the progress percentage will not be logged. |
|
batchSize |
Integer |
|
yes |
The number of nodes per batch. |
Name | Type | Description |
---|---|---|
|
Integer |
The Neo4j node ID. |
|
List of Float |
The computed node embedding. |
CALL gds.beta.graphSage.mutate(
graphName: String,
configuration: Map
)
YIELD
nodeCount: Integer,
nodePropertiesWritten: Integer,
preProcessingMillis: Integer,
computeMillis: Integer,
mutateMillis: Integer,
configuration: Map
Name | Type | Default | Optional | Description |
---|---|---|---|---|
graphName |
String |
|
no |
The name of a graph stored in the catalog. |
configuration |
Map |
|
yes |
Configuration for algorithm-specifics and/or graph filtering. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
modelName |
String |
|
no |
The name of a GraphSAGE model in the model catalog. |
mutateProperty |
String |
|
no |
The node property in the GDS graph to which the embedding is written. |
List of String |
|
yes |
Filter the named graph using the given node labels. |
|
List of String |
|
yes |
Filter the named graph using the given relationship types. |
|
Integer |
|
yes |
The number of concurrent threads used for running the algorithm. |
|
String |
|
yes |
An ID that can be provided to more easily track the algorithm’s progress. |
|
batchSize |
Integer |
|
yes |
The number of nodes per batch. |
Name | Type | Description |
---|---|---|
nodeCount |
Integer |
The number of nodes processed. |
nodePropertiesWritten |
Integer |
The number of node properties written. |
preProcessingMillis |
Integer |
Milliseconds for preprocessing data. |
computeMillis |
Integer |
Milliseconds for running the algorithm. |
mutateMillis |
Integer |
Milliseconds for writing result data back to the projected graph. |
configuration |
Map |
The configuration used for running the algorithm. |
CALL gds.beta.graphSage.write(
graphName: String,
configuration: Map
)
YIELD
nodeCount: Integer,
nodePropertiesWritten: Integer,
preProcessingMillis: Integer,
computeMillis: Integer,
writeMillis: Integer,
configuration: Map
Name | Type | Default | Optional | Description |
---|---|---|---|---|
graphName |
String |
|
no |
The name of a graph stored in the catalog. |
configuration |
Map |
|
yes |
Configuration for algorithm-specifics and/or graph filtering. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
modelName |
String |
|
no |
The name of a GraphSAGE model in the model catalog. |
List of String |
|
yes |
Filter the named graph using the given node labels. Nodes with any of the given labels will be included. |
|
List of String |
|
yes |
Filter the named graph using the given relationship types. Relationships with any of the given types will be included. |
|
Integer |
|
yes |
The number of concurrent threads used for running the algorithm. |
|
String |
|
yes |
An ID that can be provided to more easily track the algorithm’s progress. |
|
Boolean |
|
yes |
If disabled the progress percentage will not be logged. |
|
Integer |
|
yes |
The number of concurrent threads used for writing the result to Neo4j. |
|
String |
|
no |
The node property in the Neo4j database to which the embedding is written. |
|
batchSize |
Integer |
|
yes |
The number of nodes per batch. |
Name | Type | Description |
---|---|---|
nodeCount |
Integer |
The number of nodes processed. |
nodePropertiesWritten |
Integer |
The number of node properties written. |
preProcessingMillis |
Integer |
Milliseconds for preprocessing data. |
computeMillis |
Integer |
Milliseconds for running the algorithm. |
writeMillis |
Integer |
Milliseconds for writing result data back to Neo4j. |
configuration |
Map |
The configuration used for running the algorithm. |
Examples
All the examples below should be run in an empty database. The examples use Cypher projections as the norm. Native projections will be deprecated in a future release. |
In this section we will show examples of running the GraphSAGE algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small friends network graph of a handful nodes connected in a particular pattern. The example graph looks like this:
CREATE
// Persons
( dan:Person {name: 'Dan', age: 20, heightAndWeight: [185, 75]}),
(annie:Person {name: 'Annie', age: 12, heightAndWeight: [124, 42]}),
( matt:Person {name: 'Matt', age: 67, heightAndWeight: [170, 80]}),
( jeff:Person {name: 'Jeff', age: 45, heightAndWeight: [192, 85]}),
( brie:Person {name: 'Brie', age: 27, heightAndWeight: [176, 57]}),
( elsa:Person {name: 'Elsa', age: 32, heightAndWeight: [158, 55]}),
( john:Person {name: 'John', age: 35, heightAndWeight: [172, 76]}),
(dan)-[:KNOWS {relWeight: 1.0}]->(annie),
(dan)-[:KNOWS {relWeight: 1.6}]->(matt),
(annie)-[:KNOWS {relWeight: 0.1}]->(matt),
(annie)-[:KNOWS {relWeight: 3.0}]->(jeff),
(annie)-[:KNOWS {relWeight: 1.2}]->(brie),
(matt)-[:KNOWS {relWeight: 10.0}]->(brie),
(brie)-[:KNOWS {relWeight: 1.0}]->(elsa),
(brie)-[:KNOWS {relWeight: 2.2}]->(jeff),
(john)-[:KNOWS {relWeight: 5.0}]->(jeff)
MATCH (source:Person)
OPTIONAL MATCH (source:Person)-[r:KNOWS]->(target:Person)
RETURN gds.graph.project(
'persons',
source,
target,
{
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target),
sourceNodeProperties: source { .age, .heightAndWeight },
targetNodeProperties: target { .age, .heightAndWeight },
relationshipType: type(r),
relationshipProperties: r { .relWeight }
},
{ undirectedRelationshipTypes: ['KNOWS'] }
)
The algorithm is defined for UNDIRECTED graphs. |
Train
Before we are able to generate node embeddings we need to train a model and store it in the model catalog. Below is an example of how to do that.
The names specified in the featureProperties configuration parameter must exist in the projected graph.
|
CALL gds.beta.graphSage.train(
'persons',
{
modelName: 'exampleTrainModel',
featureProperties: ['age', 'heightAndWeight'],
aggregator: 'mean',
activationFunction: 'sigmoid',
randomSeed: 1337,
sampleSizes: [25, 10]
}
) YIELD modelInfo as info
RETURN
info.modelName as modelName,
info.metrics.didConverge as didConverge,
info.metrics.ranEpochs as ranEpochs,
info.metrics.epochLosses as epochLosses
modelName | didConverge | ranEpochs | epochLosses |
---|---|---|---|
"exampleTrainModel" |
true |
1 |
[26.5784954435] |
Due to the random initialisation of the weight variables the results may vary between different runs. |
Looking at the results we can draw the following conclusions, the training converged after a single epoch, the losses are almost identical.
Tuning the algorithm parameters, such as trying out different sampleSizes
, searchDepth
, embeddingDimension
or batchSize
can improve the losses.
For different datasets, GraphSAGE may require different train parameters for producing good models.
The trained model is automatically registered in the model catalog.
Train with multiple node labels
In this section we describe how to train on a graph with multiple labels. The different labels may have different sets of properties. To run on such a graph, GraphSAGE is run in multi-label mode, in which the feature properties are projected into a common feature space. Therefore, all nodes have feature vectors of the same dimension after the projection.
The projection for a label is linear and given by a matrix of weights. The weights for each label are learned jointly with the other weights of the GraphSAGE model.
In the multi-label mode, the following is applied prior to the usual aggregation layers:
-
A property representing the label is added to the feature properties for that label
-
The feature properties for each label are projected into a feature vector of a shared dimension
The projected feature dimension is configured with projectedFeatureDimension
, and specifying it enables the multi-label mode.
The feature properties used for a label are those present in the featureProperties
configuration parameter which exist in the graph for that label.
In the multi-label mode, it is no longer required that all labels have all the specified properties.
Assumptions
-
A requirement for multi-label mode is that each node belongs to exactly one label.
-
A GraphSAGE model trained in this mode must be applied on graphs with the same schema with regards to node labels and properties.
Examples
In order to demonstrate GraphSAGE with multiple labels, we add instruments and relationships of type LIKE
between person and instrument to the example graph.
MATCH
(dan:Person {name: "Dan"}),
(annie:Person {name: "Annie"}),
(matt:Person {name: "Matt"}),
(brie:Person {name: "Brie"}),
(john:Person {name: "John"})
CREATE
(guitar:Instrument {name: 'Guitar', cost: 1337.0}),
(synth:Instrument {name: 'Synthesizer', cost: 1337.0}),
(bongos:Instrument {name: 'Bongos', cost: 42.0}),
(trumpet:Instrument {name: 'Trumpet', cost: 1337.0}),
(dan)-[:LIKES]->(guitar),
(dan)-[:LIKES]->(synth),
(dan)-[:LIKES]->(bongos),
(annie)-[:LIKES]->(guitar),
(annie)-[:LIKES]->(synth),
(matt)-[:LIKES]->(bongos),
(brie)-[:LIKES]->(guitar),
(brie)-[:LIKES]->(synth),
(brie)-[:LIKES]->(bongos),
(john)-[:LIKES]->(trumpet)
MATCH (source:Person)-[r:LIKES]->(target:Instrument)
RETURN gds.graph.project(
'persons_with_instruments',
source,
target,
{
sourceNodeLabels: labels(source),
sourceNodeProperties: source { .age, .heightAndWeight },
targetNodeLabels: labels(target),
targetNodeProperties: target { .cost },
relationshipType: type(r),
relationshipProperties: r { .relWeight }
},
{ undirectedRelationshipTypes: ['LIKES'] }
)
We can now run GraphSAGE in multi-label mode on that graph by specifying the projectedFeatureDimension
parameter.
Multi-label GraphSAGE removes the requirement, that each node in the in-memory graph must have all featureProperties
.
However, the projections are independent per label and even if two labels have the same featureProperty
they are considered as different features before projection.
The projectedFeatureDimension
should equal the maximum length of the feature-array.
In our example, persons have age
(1) and heightAndWeight
(2), summing up to a length of 3.
Instruments only have cost
with length of 1.
Thus, the projectedFeatureDimension
should be set to 3.
For each node its unique labels properties is projected using a label specific projection to vector space of dimension projectedFeatureDimension
.
Note that the cost
feature is only defined for the instrument nodes, while age
and heightAndWeight
are only defined for persons.
CALL gds.beta.graphSage.train(
'persons_with_instruments',
{
modelName: 'multiLabelModel',
featureProperties: ['age', 'heightAndWeight', 'cost'],
projectedFeatureDimension: 3
}
)
Train with relationship weights
The GraphSAGE implementation supports training using relationship weights. Greater relationship weight between nodes signifies that the nodes should have more similar embedding values.
CALL gds.beta.graphSage.train(
'persons',
{
modelName: 'weightedTrainedModel',
featureProperties: ['age', 'heightAndWeight'],
relationshipWeightProperty: 'relWeight',
nodeLabels: ['Person'],
relationshipTypes: ['KNOWS']
}
)
Train when there are no node properties present in the graph
In the case when you have a graph that does not have node properties we recommend to use existing algorithm in mutate
mode to create node properties.
Good candidates are Centrality algorithms or Community algorithms.
The following example illustrates calling Degree Centrality in mutate
mode and then using the mutated property as feature of GraphSAGE training.
For the purpose of this example we are going to use the Persons
graph, but we will not load any properties to the in-memory graph.
MATCH (source:Person)-[r:KNOWS]->(target:Person)
RETURN gds.graph.project(
'noPropertiesGraph',
source,
target,
{},
{ undirectedRelationshipTypes: ['*'] }
)
CALL gds.degree.mutate(
'noPropertiesGraph',
{
mutateProperty: 'degree'
}
) YIELD nodePropertiesWritten
CALL gds.beta.graphSage.train(
'noPropertiesGraph',
{
modelName: 'myModel',
featureProperties: ['degree']
}
)
YIELD trainMillis
RETURN trainMillis
gds.degree.mutate
will create a new node property degree
for each of the nodes in the in-memory graph, which then can be used as featureProperty
in the GraphSAGE.train
mode.
Using separate algorithms to produce featureProperties can also be very useful to capture graph topology properties. |
Stream
To generate embeddings and stream them back to the client we can use the stream mode.
We must first train a model, which we do using the gds.beta.graphSage.train
procedure.
CALL gds.beta.graphSage.train(
'persons',
{
modelName: 'graphSage',
featureProperties: ['age', 'heightAndWeight'],
embeddingDimension: 3,
randomSeed: 19
}
)
Once we have trained a model (named 'graphSage'
) we can use it to generate and stream the embeddings.
CALL gds.beta.graphSage.stream(
'persons',
{
modelName: 'graphSage'
}
)
YIELD nodeId, embedding
RETURN gds.util.asNode(nodeId).name AS person, embedding
ORDER BY person, embedding
person | embedding |
---|---|
"Annie" |
[0.5285002573, 0.4682181872, 0.7081378445] |
"Brie" |
[0.5285002574, 0.4682181872, 0.7081378445] |
"Dan" |
[0.5285002573, 0.4682181872, 0.7081378445] |
"Elsa" |
[0.5285002574, 0.4682181872, 0.7081378444] |
"Jeff" |
[0.5285002573, 0.4682181872, 0.7081378445] |
"John" |
[0.5285002573, 0.4682181872, 0.7081378445] |
"Matt" |
[0.5285002573, 0.4682181872, 0.7081378445] |
Due to the random initialisation of the weight variables the results may vary slightly between the runs. |
Mutate
The model trained as part of the stream example can be reused to write the results to the in-memory graph using the mutate
mode of the procedure.
Below is an example of how to achieve this.
CALL gds.beta.graphSage.mutate(
'persons',
{
mutateProperty: 'inMemoryEmbedding',
modelName: 'graphSage'
}
) YIELD
nodeCount,
nodePropertiesWritten
nodeCount | nodePropertiesWritten |
---|---|
7 |
7 |
Write
The model trained as part of the stream example can be reused to write the results to Neo4j. Below is an example of how to achieve this.
CALL gds.beta.graphSage.write(
'persons',
{
writeProperty: 'embedding',
modelName: 'graphSage'
}
) YIELD
nodeCount,
nodePropertiesWritten
nodeCount | nodePropertiesWritten |
---|---|
7 |
7 |