Applying a trained model for prediction
This feature is in the alpha tier. For more information on feature tiers, see API Tiers.
In the previous sections we have seen how to build up a Node Regression training pipeline and train it to produce a regression model.
After training, the produced, runnable model is of type NodeRegression
and resides in the model catalog.
The regression model can be applied on a graph in the graph catalog to predict a property value for previously unseen nodes.
Since the model has been trained on features which are created using the feature pipeline, the same feature pipeline is stored within the model and executed at prediction time. As during training, intermediate node properties created by the node property steps in the feature pipeline are transient and not visible after execution.
The predict graph must contain the properties that the pipeline requires and the used array properties must have the same dimensions as in the train graph. If the predict and train graphs are distinct, it is also beneficial that they have similar origins and semantics, so that the model is able to generalize well.
Syntax
CALL gds.alpha.pipeline.nodeRegression.predict.stream(
graphName: String,
configuration: Map
) YIELD
nodeId: Integer,
predictedValue: Float
Name | Type | Default | Optional | Description |
---|---|---|---|---|
graphName |
String |
|
no |
The name of a graph stored in the catalog. |
configuration |
Map |
|
yes |
Configuration for algorithm-specifics and/or graph filtering. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
modelName |
String |
|
no |
The name of a NodeRegression model in the model catalog. |
targetNodeLabels |
List of String |
|
yes |
Filter the named graph using the given targetNodeLabels. |
relationshipTypes |
List of String |
|
yes |
Filter the named graph using the given relationship types. |
Integer |
|
yes |
The number of concurrent threads used for running the algorithm. |
|
String |
|
yes |
An ID that can be provided to more easily track the algorithm’s progress. |
Name | Type | Description |
---|---|---|
nodeId |
Integer |
Node ID. |
predictedValue |
Float |
Predicted property value for this node. |
CALL gds.alpha.pipeline.nodeRegression.predict.mutate(
graphName: String,
configuration: Map
) YIELD
preProcessingMillis: Integer,
computeMillis: Integer,
postProcessingMillis: Integer,
mutateMillis: Integer,
nodePropertiesWritten: Integer,
configuration: Map
Name | Type | Default | Optional | Description |
---|---|---|---|---|
graphName |
String |
|
no |
The name of a graph stored in the catalog. |
configuration |
Map |
|
yes |
Configuration for algorithm-specifics and/or graph filtering. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
modelName |
String |
|
no |
The name of a NodeRegression model in the model catalog. |
mutateProperty |
String |
|
no |
The node property in the GDS graph to which the predicted property is written. |
targetNodeLabels |
List of String |
|
yes |
Filter the named graph using the given targetNodeLabels. |
relationshipTypes |
List of String |
|
yes |
Filter the named graph using the given relationship types. |
Integer |
|
yes |
The number of concurrent threads used for running the algorithm. |
|
String |
|
yes |
An ID that can be provided to more easily track the algorithm’s progress. |
Name | Type | Description |
---|---|---|
preProcessingMillis |
Integer |
Milliseconds for preprocessing the graph. |
computeMillis |
Integer |
Milliseconds for running the algorithm. |
postProcessingMillis |
Integer |
Milliseconds for computing the global metrics. |
mutateMillis |
Integer |
Milliseconds for adding properties to the in-memory graph. |
nodePropertiesWritten |
Integer |
Number of node properties written. |
configuration |
Map |
Configuration used for running the algorithm. |
Examples
In the following examples we will show how to use a regression model to predict a property value of a node in your in-memory graph.
In order to do this, we must first have an already trained model registered in the Model Catalog.
We will use the model which we trained in the train example which we gave the name 'nr-pipeline-model'
.
Stream
CALL gds.alpha.pipeline.nodeRegression.predict.stream('myGraph', {
modelName: 'nr-pipeline-model',
targetNodeLabels: ['UnknownHouse']
}) YIELD nodeId, predictedValue
WITH gds.util.asNode(nodeId) AS houseNode, predictedValue AS predictedPrice
RETURN
houseNode.color AS houseColor, predictedPrice
ORDER BY predictedPrice
houseColor | predictedPrice |
---|---|
|
|
|
|
|
|
As we can see, the model is predicting the "Tan" house to be the cheaper than the "Yellow" house. This may not seem accurate given that the "Yellow" house has only one story. To get a prediction that better matches our expectations, we may need to tune the model candidate parameters.
Mutate
The mutate
execution mode updates the named graph with a new node property containing the predicted value for each node.
The name of the new property is specified using the mandatory configuration parameter mutateProperty
.
The result is a single summary row including information about timings and how many properties were written.
The mutate
mode is especially useful when multiple algorithms are used in conjunction.
For more details on the mutate
mode in general, see Mutate.
CALL gds.alpha.pipeline.nodeRegression.predict.mutate('myGraph', {
targetNodeLabels: ['UnknownHouse'],
modelName: 'nr-pipeline-model',
mutateProperty: 'predictedPrice'
}) YIELD nodePropertiesWritten
nodePropertiesWritten |
---|
3 |
The output tells us that we added a property for each of the UnknownHouse
nodes.
To use this property, we can run another algorithm using the predictedPrice
property, or inspect it using gds.graph.nodeProperty.stream
.