Linear regression
This feature is in the alpha tier. For more information on feature tiers, see API Tiers.
Linear regression is a fundamental supervised machine learning regression method. This trains a model by minimizing a loss function which depends on a weight matrix and on the training data. The loss can be minimized for example using gradient descent. Neo4j Graph Data Science uses the Adam optimizer which is a gradient descent type algorithm.
The weights are in the form of a feature-sized vector w
and a bias b
.
The loss function is then defined as:
MSE(wx + b)
where MSE
is the mean square error.
To avoid overfitting one may also add a regularization term to the loss.
Neo4j Graph Data Science supports the option of l2
regularization which can be configured using the penalty
parameter.
Tuning the hyperparameters
In order to balance matters such as bias vs variance of the model, and speed vs memory consumption of the training, GDS exposes several hyperparameters that one can tune. Each of these are described below.
In Gradient descent based training, we try to find the best weights for our model. In each epoch we process all training examples to compute the loss and the gradient of the weights. These gradients are then used to update the weights. For the update we use the Adam optimizer as described in https://arxiv.org/pdf/1412.6980.pdf.
Statistics about the training are reported in the neo4j debug log.
Max Epochs
This parameter defines the maximum number of epochs for the training. Independent of the model’s quality, the training will terminate after these many epochs. Note, that the training can also stop earlier if the loss converged (see Patience and Tolerance.
Setting this parameter can be useful to limit the training time for a model. Restricting the computational budget can serve the purpose of regularization and mitigate overfitting, which becomes a risk with a large number of epochs.
Min Epochs
This parameter defines the minimum number of epochs for the training. Independent of the model’s quality, the training will at least run this many epochs.
Setting this parameter can be useful to avoid early stopping, but also increases the minimal training time of a model.
Patience
This parameter defines the maximum number of unproductive consecutive epochs.
An epoch is unproductive if it does not improve the training loss by at least a tolerance
fraction of the current loss.
Assuming the training ran for minEpochs
, this parameter defines when the training converges.
Setting this parameter can lead to a more robust training and avoid early termination similar to minEpochs
.
However, a high patience can result in running more epochs than necessary.
In our experience, reasonable values for patience
are in the range 1
to 3
.
Tolerance
This parameter defines when an epoch is considered unproductive and together with patience
defines the convergence criteria for the training.
An epoch is unproductive if it does not improve the training loss by at least a tolerance
fraction of the current loss.
A lower tolerance results in more sensitive training with a higher probability to train longer. A high tolerance means a less sensitive training and hence resulting in more epochs counted as unproductive.
Learning rate
When updating the weights, we move in the direction dictated by the Adam optimizer based on the loss function’s gradients.
How much we move per weights update, you can configure via the learningRate
parameter.
Batch size
This parameter defines how many training examples are grouped in a single batch.
The gradients are computed concurrently on the batches using concurrency
many threads.
At the end of an epoch the gradients are summed and scaled before updating the weights.
The batchSize
does not affect the model quality, but can be used to tune for training speed.
A larger batchSize increases the memory consumption of the computation.