Node Similarity

This section describes the Node Similarity algorithm in Neo4j Graph Analytics for Snowflake. The algorithm is based on the Jaccard and Overlap similarity metrics.

Introduction

The Node Similarity algorithm compares a set of nodes based on the nodes they are connected to. Two nodes are considered similar if they share many of the same neighbors. Node Similarity computes pair-wise similarities based on the Jaccard metric, also known as the Jaccard Similarity Score, the Overlap coefficient, also known as the Szymkiewicz–Simpson coefficient, and the Cosine Similarity score. The first two are most frequently associated with unweighted sets, whereas Cosine with weighted input.

Given two sets A and B, the Jaccard Similarity is computed using the following formula:

The Overlap coefficient is computed using the following formula:

Formulas for the weighted case can be found in the weighted examples below.

The cosine similarity score is computed using the following formula, where entries are implicitly given a weight of 1 when A,B are unweighted:

The input of this algorithm is a bipartite, connected graph containing two disjoint node sets. Each relationship starts from a node in the first node set and ends at a node in the second node set.

The Node Similarity algorithm compares each node that has outgoing relationships with each other such node. For every node n, we collect the outgoing neighborhood N(n) of that node, that is, all nodes m such that there is a relationship from n to m. For each pair n, m, the algorithm computes a similarity for that pair that equals the outcome of the selected similarity metric for N(n) and N(m).

Node Similarity has time complexity O(n³) and space complexity O(n²). We compute and store neighbour sets in time and space O(n²), then compute pairwise similarity scores in time O(n³).

To bound memory usage, you can specify an explicit limit on the number of results to output per node. This is the 'topK' parameter. It can be set to any value, except 0. You will lose precision in the overall computation of course, and running time is unaffected—we still have to compute results before potentially throwing them away.

The output of the algorithm is new relationships between pairs of the first node set. Similarity scores are expressed via relationship properties.

For more information on this algorithm, see:

Syntax

This section covers the syntax used to execute the Node Similarity algorithm.

Run Node Similarity.

CALL Neo4j_Graph_Analytics.graph.node_similarity(
  'CPU_X64_XS',                    (1)
  {
    ['defaultTablePrefix': '...',] (2)
    'project': {...},              (3)
    'compute': {...},              (4)
    'write':   {...}               (5)
  }
);

1	Compute pool selector.
2	Optional prefix for table references.
3	Project config.
4	Compute config.
5	Write config.

Table 1. Parameters
Name	Type	Default	Optional	Description
computePoolSelector	String	`n/a`	no	The selector for the compute pool on which to run the Node Similarity job.
configuration	Map	`{}`	no	Configuration for graph project, algorithm compute and result write back.

The configuration map consists of the following three entries.

For more details on below Project configuration, refer to the Project documentation.

Table 2. Project configuration
Name	Type
nodeTables	List of node tables.
relationshipTables	Map of relationship types to relationship tables.

Table 3. Compute configuration
Name	Type	Default	Optional	Description
mutateProperty	String	`'similarity'`	yes	The relationship property that will be written back to the Snowflake database.
mutateRelationshipType	String	`'SIMILAR_TO'`	yes	The relationship type used for the relationships written back to the Snowflake database.
similarityCutoff	Float	`1e-42`	yes	Lower limit for the similarity score to be present in the result. Values must be between 0 and 1.
degreeCutoff	Integer	`1`	yes	Inclusive lower bound on the node degree for a node to be considered in the comparisons. This value can not be lower than 1.
upperDegreeCutoff	Integer	`2147483647`	yes	Inclusive upper bound on the node degree for a node to be considered in the comparisons. This value can not be lower than 1.
topK	Integer	`10`	yes	Limit on the number of scores per node. The K largest results are returned. This value cannot be lower than 1.
bottomK	Integer	`10`	yes	Limit on the number of scores per node. The K smallest results are returned. This value cannot be lower than 1.
topN	Integer	`0`	yes	Global limit on the number of scores computed. The N largest total results are returned. This value cannot be negative, a value of 0 means no global limit.
bottomN	Integer	`0`	yes	Global limit on the number of scores computed. The N smallest total results are returned. This value cannot be negative, a value of 0 means no global limit.
relationshipWeightProperty	String	`null`	yes	Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted.
similarityMetric	String	`JACCARD`	yes	The metric used to compute similarity. Can be either `JACCARD`, `OVERLAP` or `COSINE`.
useComponents	Boolean or String	`false`	yes	If enabled, Node Similarity will use components to improve the performance of the computation, skipping comparisons of nodes in different components. Set to `false` (Default): the algorithm does not use components, but computes similarity across the entire graph. Set to `true`: the algorithm uses components, and will compute these components before computing similarity. Set to String: use pre-computed components stored in graph, String is the key for a node property representing components.

For more details on below Write configuration, refer to the Write documentation.

Table 4. Write configuration
Name	Type	Default	Optional	Description
sourceLabel	String	`n/a`	no	Node label in the in-memory graph for start nodes of relationships to be written back.
targetLabel	String	`n/a`	no	Node label in the in-memory graph for end nodes of relationships to be written back.
outputTable	String	`n/a`	no	Table in Snowflake database to which relationships are written.
relationshipType	String	`'SIMILAR_TO'`	yes	The relationship type that will be written back to the Snowflake database.
relationshipProperty	String	`'similarity'`	yes	The relationship property that will be written back to the Snowflake database.

Examples

In this section we will show examples of running the Node Similarity algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small knowledge graph of a handful of nodes, connected in a particular pattern. The example graph looks like this:

The following SQL statement will create the example graph tables in the Snowflake database:

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.PERSONS (NODEID VARCHAR);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.PERSONS VALUES
  ('Alice'),
  ('Bob'),
  ('Carol'),
  ('Dave'),
  ('Eve');

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS (NODEID VARCHAR);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS VALUES
  ('Guitar'),
  ('Synthesizer'),
  ('Bongos'),
  ('Trumpet');

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.LIKES (SOURCENODEID VARCHAR, TARGETNODEID VARCHAR, WEIGHT FLOAT);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.LIKES VALUES
  ('Alice', 'Guitar',      1.0),
  ('Alice', 'Synthesizer', 1.0),
  ('Alice', 'Bongos',      0.5),
  ('Bob',   'Guitar',      1.0),
  ('Bob',   'Synthesizer', 1.0),
  ('Carol', 'Bongos',      1.0),
  ('Dave',  'Guitar',      1.0),
  ('Dave',  'Trumpet',     1.5),
  ('Dave',  'Bongos',      1.0);

This bipartite graph has two node sets, Person nodes and Instrument nodes. The two node sets are connected via LIKES relationships. Each relationship starts at a Person node and ends at an Instrument node.

In the example, we want to use the Node Similarity algorithm to compare people based on the instruments they like.

The Node Similarity algorithm will only compute similarity for nodes that have a degree of at least 1. In the example graph, the Eve node will not be compared to other Person nodes.

In the following examples, we will demonstrate using the Node Similarity algorithm on this graph.

Run job

Running a Node Similarity job involves the three steps: Project, Compute and Write.

To run the query, there is a required setup of grants for the application, your consumer role and your environment. Please see the Getting started page for more on this.

We also assume that the application name is the default Neo4j_Graph_Analytics. If you chose a different app name during installation, please replace it with that.

The following will run a Node Similarity job:

CALL Neo4j_Graph_Analytics.graph.node_similarity('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': ['PERSONS', 'INSTRUMENTS'],
        'relationshipTables': {
          'LIKES': {
            'sourceTable': 'PERSONS',
            'targetTable': 'INSTRUMENTS'
          }
        }
    },
    'compute': {
        'mutateProperty': 'score',
        'mutateRelationshipType': 'SIMILAR'
    },
    'write': [{
        'outputTable': 'PERSONS_SIMILARITY',
        'sourceLabel': 'PERSONS',
        'targetLabel': 'PERSONS',
        'relationshipType': 'SIMILAR',
        'relationshipProperty': 'score'
    }]
});

Table 5. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_36fa8f572b8b412fabb7d9343ed038f8	2025-06-25 13:30:03.460	2025-06-25 13:30:10.288	{ "node_similarity_1": { "computeMillis": 60, "configuration": { "bottomK": 10, "bottomN": 0, "concurrency": 2, "degreeCutoff": 1, "jobId": "1fe90d6f-eae4-4a11-9b80-9e7869fd9977", "logProgress": true, "mutateProperty": "score", "mutateRelationshipType": "SIMILAR", "nodeLabels": [""], "relationshipTypes": [""], "similarityCutoff": 1.000000000000000e-42, "similarityMetric": "JACCARD", "sudo": false, "topK": 10, "topN": 0, "upperDegreeCutoff": 2147483647, "useComponents": false }, "mutateMillis": 404, "nodesCompared": 4, "postProcessingMillis": 0, "preProcessingMillis": 8, "relationshipsWritten": 10, "similarityDistribution": { "max": 0.6666679382324218, "mean": 0.41666641235351565, "min": 0.25, "p1": 0.25, "p10": 0.25, "p100": 0.6666660308837891, "p25": 0.3333320617675781, "p5": 0.25, "p50": 0.3333320617675781, "p75": 0.5000019073486328, "p90": 0.6666660308837891, "p95": 0.6666660308837891, "p99": 0.6666660308837891, "stdDev": 0.14907148283512542 } }, "project_1": { "graphName": "snowgraph", "nodeCount": 9, "nodeMillis": 872, "relationshipCount": 9, "relationshipMillis": 469, "totalMillis": 1341 }, "write_relationship_type_1": { "exportMillis": 1917, "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY", "relationshipProperty": "score", "relationshipType": "SIMILAR", "relationshipsExported": 10 } }

Table 5. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_36fa8f572b8b412fabb7d9343ed038f8

2025-06-25 13:30:03.460

2025-06-25 13:30:10.288

 {
    "node_similarity_1": {
      "computeMillis": 60,
      "configuration": {
        "bottomK": 10,
        "bottomN": 0,
        "concurrency": 2,
        "degreeCutoff": 1,
        "jobId": "1fe90d6f-eae4-4a11-9b80-9e7869fd9977",
        "logProgress": true,
        "mutateProperty": "score",
        "mutateRelationshipType": "SIMILAR",
        "nodeLabels": ["*"],
        "relationshipTypes": ["*"],
        "similarityCutoff": 1.000000000000000e-42,
        "similarityMetric": "JACCARD",
        "sudo": false,
        "topK": 10,
        "topN": 0,
        "upperDegreeCutoff": 2147483647,
        "useComponents": false
      },
      "mutateMillis": 404,
      "nodesCompared": 4,
      "postProcessingMillis": 0,
      "preProcessingMillis": 8,
      "relationshipsWritten": 10,
      "similarityDistribution": {
        "max": 0.6666679382324218,
        "mean": 0.41666641235351565,
        "min": 0.25,
        "p1": 0.25,
        "p10": 0.25,
        "p100": 0.6666660308837891,
        "p25": 0.3333320617675781,
        "p5": 0.25,
        "p50": 0.3333320617675781,
        "p75": 0.5000019073486328,
        "p90": 0.6666660308837891,
        "p95": 0.6666660308837891,
        "p99": 0.6666660308837891,
        "stdDev": 0.14907148283512542
      }
    },
    "project_1": {
      "graphName": "snowgraph",
      "nodeCount": 9,
      "nodeMillis": 872,
      "relationshipCount": 9,
      "relationshipMillis": 469,
      "totalMillis": 1341
    },
    "write_relationship_type_1": {
      "exportMillis": 1917,
      "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY",
      "relationshipProperty": "score",
      "relationshipType": "SIMILAR",
      "relationshipsExported": 10
    }
}

The returned result contains information about the job execution and result distribution. Additionally, each similarity score computed for the compared node pairs has been written back to the Snowflake database. We can query it like so:

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY ORDER BY SCORE DESC;

Which shows the computation results as stored in the database:

Table 6. Results
SOURCENODEID	TARGETNODEID	SCORE
Alice	Bob	0.6666666667
Bob	Alice	0.6666666667
Alice	Dave	0.5
Dave	Alice	0.5
Alice	Carol	0.3333333333
Carol	Alice	0.3333333333
Carol	Dave	0.3333333333
Dave	Carol	0.3333333333
Bob	Dave	0.25
Dave	Bob	0.25

We use default values for the procedure configuration parameter. TopK is set to 10, topN is set to 0. Because of that, the result set contains the top 10 similarity scores for each node.

If we would like to instead compare the Instruments to each other, we would then project the LIKES relationship type using REVERSE orientation. This would return similarities for pairs of Instruments and not compute any similarities between Persons.

Limit results

There are four limits that can be applied to the similarity results. Top limits the result to the highest similarity scores. Bottom limits the result to the lowest similarity scores. Both top and bottom limits can apply to the result as a whole ("N"), or to the result per node ("K").

There must always be a "K" limit, either bottomK or topK, which is a positive number. The default value for topK and bottomK is 10.

Table 7. Result limits
	total results	results per node
highest score	topN	topK
lowest score	bottomN	bottomK

topK and bottomK

TopK and bottomK are limits on the number of scores computed per node. For topK, the K largest similarity scores per node are returned. For bottomK, the K smallest similarity scores per node are returned. TopK and bottomK cannot be 0, used in conjunction, and the default value is 10. If neither is specified, topK is used.

The following will run a Node Similarity job demonstrating topK:

CALL Neo4j_Graph_Analytics.graph.node_similarity('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': ['PERSONS', 'INSTRUMENTS'],
        'relationshipTables': {
          'LIKES': {
            'sourceTable': 'PERSONS',
            'targetTable': 'INSTRUMENTS'
          }
        }
    },
    'compute': {
        'mutateProperty': 'score',
        'mutateRelationshipType': 'SIMILAR',
        'topK': 1
    },
    'write': [{
        'outputTable': 'PERSONS_SIMILARITY',
        'sourceLabel': 'PERSONS',
        'targetLabel': 'PERSONS',
        'relationshipType': 'SIMILAR',
        'relationshipProperty': 'score'
    }]
});

Table 8. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_c87ccfc6c46742548940fff74eeeeea6	2025-06-25 13:47:42.029	2025-06-25 13:47:47.708	{ "node_similarity_1": { "computeMillis": 33, "configuration": { "bottomK": 10, "bottomN": 0, "concurrency": 2, "degreeCutoff": 1, "jobId": "f7c4d22f-48f5-4e50-80f8-81c6aecfb33b", "logProgress": true, "mutateProperty": "score", "mutateRelationshipType": "SIMILAR", "nodeLabels": [""], "relationshipTypes": [""], "similarityCutoff": 1.000000000000000e-42, "similarityMetric": "JACCARD", "sudo": false, "topK": 1, "topN": 0, "upperDegreeCutoff": 2147483647, "useComponents": false }, "mutateMillis": 410, "nodesCompared": 4, "postProcessingMillis": 0, "preProcessingMillis": 8, "relationshipsWritten": 4, "similarityDistribution": { "max": 0.6666679382324218, "mean": 0.5416665077209473, "min": 0.3333320617675781, "p1": 0.3333320617675781, "p10": 0.3333320617675781, "p100": 0.6666660308837891, "p25": 0.3333320617675781, "p5": 0.3333320617675781, "p50": 0.5000019073486328, "p75": 0.6666660308837891, "p90": 0.6666660308837891, "p95": 0.6666660308837891, "p99": 0.6666660308837891, "stdDev": 0.13819274752746397 } }, "project_1": { "graphName": "snowgraph", "nodeCount": 9, "nodeMillis": 218, "relationshipCount": 9, "relationshipMillis": 322, "totalMillis": 540 }, "write_relationship_type_1": { "exportMillis": 1958, "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY", "relationshipProperty": "score", "relationshipType": "SIMILAR", "relationshipsExported": 4 } }

Table 8. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_c87ccfc6c46742548940fff74eeeeea6

2025-06-25 13:47:42.029

2025-06-25 13:47:47.708

 {
    "node_similarity_1": {
      "computeMillis": 33,
      "configuration": {
        "bottomK": 10,
        "bottomN": 0,
        "concurrency": 2,
        "degreeCutoff": 1,
        "jobId": "f7c4d22f-48f5-4e50-80f8-81c6aecfb33b",
        "logProgress": true,
        "mutateProperty": "score",
        "mutateRelationshipType": "SIMILAR",
        "nodeLabels": ["*"],
        "relationshipTypes": ["*"],
        "similarityCutoff": 1.000000000000000e-42,
        "similarityMetric": "JACCARD",
        "sudo": false,
        "topK": 1,
        "topN": 0,
        "upperDegreeCutoff": 2147483647,
        "useComponents": false
      },
      "mutateMillis": 410,
      "nodesCompared": 4,
      "postProcessingMillis": 0,
      "preProcessingMillis": 8,
      "relationshipsWritten": 4,
      "similarityDistribution": {
        "max": 0.6666679382324218,
        "mean": 0.5416665077209473,
        "min": 0.3333320617675781,
        "p1": 0.3333320617675781,
        "p10": 0.3333320617675781,
        "p100": 0.6666660308837891,
        "p25": 0.3333320617675781,
        "p5": 0.3333320617675781,
        "p50": 0.5000019073486328,
        "p75": 0.6666660308837891,
        "p90": 0.6666660308837891,
        "p95": 0.6666660308837891,
        "p99": 0.6666660308837891,
        "stdDev": 0.13819274752746397
      }
    },
    "project_1": {
      "graphName": "snowgraph",
      "nodeCount": 9,
      "nodeMillis": 218,
      "relationshipCount": 9,
      "relationshipMillis": 322,
      "totalMillis": 540
    },
    "write_relationship_type_1": {
      "exportMillis": 1958,
      "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY",
      "relationshipProperty": "score",
      "relationshipType": "SIMILAR",
      "relationshipsExported": 4
    }
}

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY ORDER BY SCORE DESC;

Table 9. Results
SOURCENODEID	TARGETNODEID	SCORE
Alice	Bob	0.6666666667
Bob	Alice	0.6666666667
Dave	Alice	0.5
Carol	Alice	0.3333333333

And here is the example of how to use bottomK:

The following will run a Node Similarity job demonstrating bottomK:

CALL Neo4j_Graph_Analytics.graph.node_similarity('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': ['PERSONS', 'INSTRUMENTS'],
        'relationshipTables': {
          'LIKES': {
            'sourceTable': 'PERSONS',
            'targetTable': 'INSTRUMENTS'
          }
        }
    },
    'compute': {
        'mutateProperty': 'score',
        'mutateRelationshipType': 'SIMILAR',
        'bottomK': 1
    },
    'write': [{
        'outputTable': 'PERSONS_SIMILARITY',
        'sourceLabel': 'PERSONS',
        'targetLabel': 'PERSONS',
        'relationshipType': 'SIMILAR',
        'relationshipProperty': 'score'
    }]
});

Table 10. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_2fdc12517bc54905906e2f2eb396d00b	2025-06-25 14:14:47.657	2025-06-25 14:14:53.313	{ "node_similarity_1": { "computeMillis": 25, "configuration": { "bottomK": 1, "bottomN": 0, "concurrency": 2, "degreeCutoff": 1, "jobId": "0c5796bf-8872-4f8f-8817-09c63c37bb04", "logProgress": true, "mutateProperty": "score", "mutateRelationshipType": "SIMILAR", "nodeLabels": [""], "relationshipTypes": [""], "similarityCutoff": 1.000000000000000e-42, "similarityMetric": "JACCARD", "sudo": false, "topK": 10, "topN": 0, "upperDegreeCutoff": 2147483647, "useComponents": false }, "mutateMillis": 254, "nodesCompared": 4, "postProcessingMillis": 0, "preProcessingMillis": 8, "relationshipsWritten": 4, "similarityDistribution": { "max": 0.3333339691162109, "mean": 0.29166603088378906, "min": 0.25, "p1": 0.25, "p10": 0.25, "p100": 0.3333320617675781, "p25": 0.25, "p5": 0.25, "p50": 0.25, "p75": 0.3333320617675781, "p90": 0.3333320617675781, "p95": 0.3333320617675781, "p99": 0.3333320617675781, "stdDev": 0.04166603088378906 } }, "project_1": { "graphName": "snowgraph", "nodeCount": 9, "nodeMillis": 188, "relationshipCount": 9, "relationshipMillis": 369, "totalMillis": 557 }, "write_relationship_type_1": { "exportMillis": 2105, "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY", "relationshipProperty": "score", "relationshipType": "SIMILAR", "relationshipsExported": 4 } }

Table 10. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_2fdc12517bc54905906e2f2eb396d00b

2025-06-25 14:14:47.657

2025-06-25 14:14:53.313

 {
    "node_similarity_1": {
      "computeMillis": 25,
      "configuration": {
        "bottomK": 1,
        "bottomN": 0,
        "concurrency": 2,
        "degreeCutoff": 1,
        "jobId": "0c5796bf-8872-4f8f-8817-09c63c37bb04",
        "logProgress": true,
        "mutateProperty": "score",
        "mutateRelationshipType": "SIMILAR",
        "nodeLabels": ["*"],
        "relationshipTypes": ["*"],
        "similarityCutoff": 1.000000000000000e-42,
        "similarityMetric": "JACCARD",
        "sudo": false,
        "topK": 10,
        "topN": 0,
        "upperDegreeCutoff": 2147483647,
        "useComponents": false
      },
      "mutateMillis": 254,
      "nodesCompared": 4,
      "postProcessingMillis": 0,
      "preProcessingMillis": 8,
      "relationshipsWritten": 4,
      "similarityDistribution": {
        "max": 0.3333339691162109,
        "mean": 0.29166603088378906,
        "min": 0.25,
        "p1": 0.25,
        "p10": 0.25,
        "p100": 0.3333320617675781,
        "p25": 0.25,
        "p5": 0.25,
        "p50": 0.25,
        "p75": 0.3333320617675781,
        "p90": 0.3333320617675781,
        "p95": 0.3333320617675781,
        "p99": 0.3333320617675781,
        "stdDev": 0.04166603088378906
      }
    },
    "project_1": {
      "graphName": "snowgraph",
      "nodeCount": 9,
      "nodeMillis": 188,
      "relationshipCount": 9,
      "relationshipMillis": 369,
      "totalMillis": 557
    },
    "write_relationship_type_1": {
      "exportMillis": 2105,
      "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY",
      "relationshipProperty": "score",
      "relationshipType": "SIMILAR",
      "relationshipsExported": 4
    }
}

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY ORDER BY SCORE DESC;

Table 11. Results
SOURCENODEID	TARGETNODEID	SCORE
Alice	Carol	0.3333333333
Carol	Alice	0.3333333333
Bob	Dave	0.25
Dave	Bob	0.25

topN and bottomN

TopN and bottomN limit the number of similarity scores across all nodes. This is a limit on the total result set, in addition to the topK or bottomK limit on the results per node. For topN, the N largest similarity scores are returned. For bottomN, the N smallest similarity scores are returned. A value of 0 means no global limit is imposed and all results from topK or bottomK are returned.

The following will run the algorithm, and stream the 3 highest out of the top 1 results per node:

CALL Neo4j_Graph_Analytics.graph.node_similarity('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': ['PERSONS', 'INSTRUMENTS'],
        'relationshipTables': {
          'LIKES': {
            'sourceTable': 'PERSONS',
            'targetTable': 'INSTRUMENTS'
          }
        }
    },
    'compute': {
        'mutateProperty': 'score',
        'mutateRelationshipType': 'SIMILAR',
        'topK': 1,
        'topN': 3
    },
    'write': [{
        'outputTable': 'PERSONS_SIMILARITY',
        'sourceLabel': 'PERSONS',
        'targetLabel': 'PERSONS',
        'relationshipType': 'SIMILAR',
        'relationshipProperty': 'score'
    }]
});

Table 12. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_ab183c8a38e64d23936b61781cebe2e7	2025-06-26 07:15:15.951	2025-06-26 07:15:22.810	{ "node_similarity_1": { "computeMillis": 45, "configuration": { "bottomK": 10, "bottomN": 0, "concurrency": 2, "degreeCutoff": 1, "jobId": "649f07ec-7a06-43d9-aed2-a114bbd914af", "logProgress": true, "mutateProperty": "score", "mutateRelationshipType": "SIMILAR", "nodeLabels": [""], "relationshipTypes": [""], "similarityCutoff": 1.000000000000000e-42, "similarityMetric": "JACCARD", "sudo": false, "topK": 1, "topN": 3, "upperDegreeCutoff": 2147483647, "useComponents": false }, "mutateMillis": 83, "nodesCompared": 4, "postProcessingMillis": 0, "preProcessingMillis": 8, "relationshipsWritten": 3, "similarityDistribution": { "max": 0.6666679382324218, "mean": 0.6111094156901041, "min": 0.5, "p1": 0.5, "p10": 0.5, "p100": 0.6666641235351562, "p25": 0.5, "p5": 0.5, "p50": 0.6666641235351562, "p75": 0.6666641235351562, "p90": 0.6666641235351562, "p95": 0.6666641235351562, "p99": 0.6666641235351562, "stdDev": 0.07856622128814764 } }, "project_1": { "graphName": "snowgraph", "nodeCount": 9, "nodeMillis": 914, "relationshipCount": 9, "relationshipMillis": 662, "totalMillis": 1576 }, "write_relationship_type_1": { "exportMillis": 2113, "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY", "relationshipProperty": "score", "relationshipType": "SIMILAR", "relationshipsExported": 3 } }

Table 12. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_ab183c8a38e64d23936b61781cebe2e7

2025-06-26 07:15:15.951

2025-06-26 07:15:22.810

 {
    "node_similarity_1": {
      "computeMillis": 45,
      "configuration": {
        "bottomK": 10,
        "bottomN": 0,
        "concurrency": 2,
        "degreeCutoff": 1,
        "jobId": "649f07ec-7a06-43d9-aed2-a114bbd914af",
        "logProgress": true,
        "mutateProperty": "score",
        "mutateRelationshipType": "SIMILAR",
        "nodeLabels": ["*"],
        "relationshipTypes": ["*"],
        "similarityCutoff": 1.000000000000000e-42,
        "similarityMetric": "JACCARD",
        "sudo": false,
        "topK": 1,
        "topN": 3,
        "upperDegreeCutoff": 2147483647,
        "useComponents": false
      },
      "mutateMillis": 83,
      "nodesCompared": 4,
      "postProcessingMillis": 0,
      "preProcessingMillis": 8,
      "relationshipsWritten": 3,
      "similarityDistribution": {
        "max": 0.6666679382324218,
        "mean": 0.6111094156901041,
        "min": 0.5,
        "p1": 0.5,
        "p10": 0.5,
        "p100": 0.6666641235351562,
        "p25": 0.5,
        "p5": 0.5,
        "p50": 0.6666641235351562,
        "p75": 0.6666641235351562,
        "p90": 0.6666641235351562,
        "p95": 0.6666641235351562,
        "p99": 0.6666641235351562,
        "stdDev": 0.07856622128814764
      }
    },
    "project_1": {
      "graphName": "snowgraph",
      "nodeCount": 9,
      "nodeMillis": 914,
      "relationshipCount": 9,
      "relationshipMillis": 662,
      "totalMillis": 1576
    },
    "write_relationship_type_1": {
      "exportMillis": 2113,
      "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY",
      "relationshipProperty": "score",
      "relationshipType": "SIMILAR",
      "relationshipsExported": 3
    }
}

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY ORDER BY SCORE DESC;

Table 13. Results
SOURCENODEID	TARGETNODEID	SCORE
Alice	Bob	0.6666666667
Bob	Alice	0.6666666667
Dave	Alice	0.5

Degree cutoffs and similarity cutoff

Node Similarity can be tuned to ignore certain nodes based on degree constraints via two integer parameters named degreeCutoff and upperDegreeCutoff. If set, degreeCutoff imposes a lower limit on the degree in order for a node to be considered in the comparisons, and skips any nodes with degree below degreeCutoff. If set, upperDegreeCutoff imposes an upper limit on the node degree, and skips any nodes with degree higher than upperDegreeCutoff. The two parameters can also be combined so that only those nodes whose degree falls under a certain segment are considered.

The minimum value for both parameters is 1.

The following will run the algorithm, and compute the 3 highest out of the top 1 results per node:

CALL Neo4j_Graph_Analytics.graph.node_similarity('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': ['PERSONS', 'INSTRUMENTS'],
        'relationshipTables': {
          'LIKES': {
            'sourceTable': 'PERSONS',
            'targetTable': 'INSTRUMENTS'
          }
        }
    },
    'compute': {
        'mutateProperty': 'score',
        'mutateRelationshipType': 'SIMILAR',
        'degreeCutoff': 3
    },
    'write': [{
        'outputTable': 'PERSONS_SIMILARITY',
        'sourceLabel': 'PERSONS',
        'targetLabel': 'PERSONS',
        'relationshipType': 'SIMILAR',
        'relationshipProperty': 'score'
    }]
});

Table 14. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_3687173082d14993a228eab0663ff557	2025-06-26 08:14:43.945	2025-06-26 08:14:49.441	{ "node_similarity_1": { "computeMillis": 23, "configuration": { "bottomK": 10, "bottomN": 0, "concurrency": 2, "degreeCutoff": 3, "jobId": "47703fe2-ee8a-4fb6-81f6-4a7a1cc761e6", "logProgress": true, "mutateProperty": "score", "mutateRelationshipType": "SIMILAR", "nodeLabels": [""], "relationshipTypes": [""], "similarityCutoff": 1.000000000000000e-42, "similarityMetric": "JACCARD", "sudo": false, "topK": 10, "topN": 0, "upperDegreeCutoff": 2147483647, "useComponents": false }, "mutateMillis": 207, "nodesCompared": 2, "postProcessingMillis": 0, "preProcessingMillis": 8, "relationshipsWritten": 2, "similarityDistribution": { "max": 0.5000038146972655, "mean": 0.5, "min": 0.5, "p1": 0.5, "p10": 0.5, "p100": 0.5, "p25": 0.5, "p5": 0.5, "p50": 0.5, "p75": 0.5, "p90": 0.5, "p95": 0.5, "p99": 0.5, "stdDev": 0 } }, "project_1": { "graphName": "snowgraph", "nodeCount": 9, "nodeMillis": 227, "relationshipCount": 9, "relationshipMillis": 426, "totalMillis": 653 }, "write_relationship_type_1": { "exportMillis": 1931, "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY", "relationshipProperty": "score", "relationshipType": "SIMILAR", "relationshipsExported": 2 } }

Table 14. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_3687173082d14993a228eab0663ff557

2025-06-26 08:14:43.945

2025-06-26 08:14:49.441

 {
    "node_similarity_1": {
      "computeMillis": 23,
      "configuration": {
        "bottomK": 10,
        "bottomN": 0,
        "concurrency": 2,
        "degreeCutoff": 3,
        "jobId": "47703fe2-ee8a-4fb6-81f6-4a7a1cc761e6",
        "logProgress": true,
        "mutateProperty": "score",
        "mutateRelationshipType": "SIMILAR",
        "nodeLabels": ["*"],
        "relationshipTypes": ["*"],
        "similarityCutoff": 1.000000000000000e-42,
        "similarityMetric": "JACCARD",
        "sudo": false,
        "topK": 10,
        "topN": 0,
        "upperDegreeCutoff": 2147483647,
        "useComponents": false
      },
      "mutateMillis": 207,
      "nodesCompared": 2,
      "postProcessingMillis": 0,
      "preProcessingMillis": 8,
      "relationshipsWritten": 2,
      "similarityDistribution": {
        "max": 0.5000038146972655,
        "mean": 0.5,
        "min": 0.5,
        "p1": 0.5,
        "p10": 0.5,
        "p100": 0.5,
        "p25": 0.5,
        "p5": 0.5,
        "p50": 0.5,
        "p75": 0.5,
        "p90": 0.5,
        "p95": 0.5,
        "p99": 0.5,
        "stdDev": 0
      }
    },
    "project_1": {
      "graphName": "snowgraph",
      "nodeCount": 9,
      "nodeMillis": 227,
      "relationshipCount": 9,
      "relationshipMillis": 426,
      "totalMillis": 653
    },
    "write_relationship_type_1": {
      "exportMillis": 1931,
      "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY",
      "relationshipProperty": "score",
      "relationshipType": "SIMILAR",
      "relationshipsExported": 2
    }
}

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY ORDER BY SCORE DESC;

Table 15. Results
SOURCENODEID	TARGETNODEID	SCORE
Alice	Dave	0.5
Dave	Alice	0.5

Similarity cutoff is a lower limit for the similarity score to be present in the result. The default value is very small (1E-42) to exclude results with a similarity score of 0.

Setting similarity cutoff to 0 may yield a very large result set, increased runtime and memory consumption.

The following will ignore node pairs with a similarity score less than 0.5:

CALL Neo4j_Graph_Analytics.graph.node_similarity('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': ['PERSONS', 'INSTRUMENTS'],
        'relationshipTables': {
          'LIKES': {
            'sourceTable': 'PERSONS',
            'targetTable': 'INSTRUMENTS'
          }
        }
    },
    'compute': {
        'mutateProperty': 'score',
        'mutateRelationshipType': 'SIMILAR',
        'similarityCutoff': 0.5
    },
    'write': [{
        'outputTable': 'PERSONS_SIMILARITY',
        'sourceLabel': 'PERSONS',
        'targetLabel': 'PERSONS',
        'relationshipType': 'SIMILAR',
        'relationshipProperty': 'score'
    }]
});

Table 16. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_5f6827d7e57949209a48a876875ab65b	2025-06-26 09:45:59.862	2025-06-26 09:46:05.996	{ "node_similarity_1": { "computeMillis": 28, "configuration": { "bottomK": 10, "bottomN": 0, "concurrency": 2, "degreeCutoff": 1, "jobId": "a4f1f58f-e894-481b-bc5a-7809b4c57b09", "logProgress": true, "mutateProperty": "score", "mutateRelationshipType": "SIMILAR", "nodeLabels": [""], "relationshipTypes": [""], "similarityCutoff": 0.5, "similarityMetric": "JACCARD", "sudo": false, "topK": 10, "topN": 0, "upperDegreeCutoff": 2147483647, "useComponents": false }, "mutateMillis": 233, "nodesCompared": 4, "postProcessingMillis": 0, "preProcessingMillis": 7, "relationshipsWritten": 4, "similarityDistribution": { "max": 0.6666679382324218, "mean": 0.5833320617675781, "min": 0.5, "p1": 0.5, "p10": 0.5, "p100": 0.6666641235351562, "p25": 0.5, "p5": 0.5, "p50": 0.5, "p75": 0.6666641235351562, "p90": 0.6666641235351562, "p95": 0.6666641235351562, "p99": 0.6666641235351562, "stdDev": 0.08333206176757812 } }, "project_1": { "graphName": "snowgraph", "nodeCount": 9, "nodeMillis": 340, "relationshipCount": 9, "relationshipMillis": 522, "totalMillis": 862 }, "write_relationship_type_1": { "exportMillis": 2245, "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY", "relationshipProperty": "score", "relationshipType": "SIMILAR", "relationshipsExported": 4 } }

Table 16. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_5f6827d7e57949209a48a876875ab65b

2025-06-26 09:45:59.862

2025-06-26 09:46:05.996

 {
    "node_similarity_1": {
      "computeMillis": 28,
      "configuration": {
        "bottomK": 10,
        "bottomN": 0,
        "concurrency": 2,
        "degreeCutoff": 1,
        "jobId": "a4f1f58f-e894-481b-bc5a-7809b4c57b09",
        "logProgress": true,
        "mutateProperty": "score",
        "mutateRelationshipType": "SIMILAR",
        "nodeLabels": ["*"],
        "relationshipTypes": ["*"],
        "similarityCutoff": 0.5,
        "similarityMetric": "JACCARD",
        "sudo": false,
        "topK": 10,
        "topN": 0,
        "upperDegreeCutoff": 2147483647,
        "useComponents": false
      },
      "mutateMillis": 233,
      "nodesCompared": 4,
      "postProcessingMillis": 0,
      "preProcessingMillis": 7,
      "relationshipsWritten": 4,
      "similarityDistribution": {
        "max": 0.6666679382324218,
        "mean": 0.5833320617675781,
        "min": 0.5,
        "p1": 0.5,
        "p10": 0.5,
        "p100": 0.6666641235351562,
        "p25": 0.5,
        "p5": 0.5,
        "p50": 0.5,
        "p75": 0.6666641235351562,
        "p90": 0.6666641235351562,
        "p95": 0.6666641235351562,
        "p99": 0.6666641235351562,
        "stdDev": 0.08333206176757812
      }
    },
    "project_1": {
      "graphName": "snowgraph",
      "nodeCount": 9,
      "nodeMillis": 340,
      "relationshipCount": 9,
      "relationshipMillis": 522,
      "totalMillis": 862
    },
    "write_relationship_type_1": {
      "exportMillis": 2245,
      "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY",
      "relationshipProperty": "score",
      "relationshipType": "SIMILAR",
      "relationshipsExported": 4
    }
}

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY ORDER BY SCORE DESC;

Table 17. Results
SOURCENODEID	TARGETNODEID	SCORE
Alice	Bob	0.6666666667
Bob	Alice	0.6666666667
Alice	Dave	0.5
Dave	Alice	0.5

Weighted Similarity

Relationship properties can be used to modify the similarity induced by certain relationships by taking their value as a way of measuring importance. By default, weighted node similarity uses weighted Jaccard similarity, according to the formula:

Formally, given two nodes and their weighted neighbour lists A' and B', we extend the lists to A and B, index over the union of their neighbours A' ∪ B' by setting weight = 0 for any non-neighbour, and then apply the weighted Jaccard similarity.

It also supports weighted Overlap similarity, according to the formula:

In addition, Cosine similarity can be used in the weighted case as mentioned in the introduction.

Weighted similarity metrics are only defined for values greater or equal to 0.

The following query will respect relationship properties in the similarity computation:

CALL Neo4j_Graph_Analytics.graph.node_similarity('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': ['PERSONS', 'INSTRUMENTS'],
        'relationshipTables': {
          'LIKES': {
            'sourceTable': 'PERSONS',
            'targetTable': 'INSTRUMENTS'
          }
        }
    },
    'compute': {
        'mutateProperty': 'score',
        'mutateRelationshipType': 'SIMILAR',
        'relationshipWeightProperty': 'WEIGHT',
        'similarityCutoff': 0.3
    },
    'write': [{
        'outputTable': 'PERSONS_SIMILARITY',
        'sourceLabel': 'PERSONS',
        'targetLabel': 'PERSONS',
        'relationshipType': 'SIMILAR',
        'relationshipProperty': 'score'
    }]
});

Table 18. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_532fbaf2ca3345168509c103dab5a419	2025-07-02 08:18:07.452	2025-07-02 08:18:13.844	{ "node_similarity_1": { "computeMillis": 25, "configuration": { "bottomK": 10, "bottomN": 0, "concurrency": 2, "degreeCutoff": 1, "jobId": "f9121717-4452-442f-993f-ade2e2259fb5", "logProgress": true, "mutateProperty": "score", "mutateRelationshipType": "SIMILAR", "nodeLabels": [""], "relationshipTypes": [""], "relationshipWeightProperty": "WEIGHT", "similarityCutoff": 0.3, "similarityMetric": "JACCARD", "sudo": false, "topK": 10, "topN": 0, "upperDegreeCutoff": 2147483647, "useComponents": false }, "mutateMillis": 182, "nodesCompared": 4, "postProcessingMillis": 0, "preProcessingMillis": 10, "relationshipsWritten": 4, "similarityDistribution": { "max": 0.8000030517578124, "mean": 0.5666666030883789, "min": 0.3333320617675781, "p1": 0.3333320617675781, "p10": 0.3333320617675781, "p100": 0.8000011444091797, "p25": 0.3333320617675781, "p5": 0.3333320617675781, "p50": 0.3333320617675781, "p75": 0.8000011444091797, "p90": 0.8000011444091797, "p95": 0.8000011444091797, "p99": 0.8000011444091797, "stdDev": 0.23333454132080078 } }, "project_1": { "graphName": "snowgraph", "nodeCount": 9, "nodeMillis": 640, "relationshipCount": 9, "relationshipMillis": 719, "totalMillis": 1359 }, "write_relationship_type_1": { "exportMillis": 1973, "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY", "relationshipProperty": "score", "relationshipType": "SIMILAR", "relationshipsExported": 4 } }

Table 18. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_532fbaf2ca3345168509c103dab5a419

2025-07-02 08:18:07.452

2025-07-02 08:18:13.844

 {
    "node_similarity_1": {
      "computeMillis": 25,
      "configuration": {
        "bottomK": 10,
        "bottomN": 0,
        "concurrency": 2,
        "degreeCutoff": 1,
        "jobId": "f9121717-4452-442f-993f-ade2e2259fb5",
        "logProgress": true,
        "mutateProperty": "score",
        "mutateRelationshipType": "SIMILAR",
        "nodeLabels": ["*"],
        "relationshipTypes": ["*"],
        "relationshipWeightProperty": "WEIGHT",
        "similarityCutoff": 0.3,
        "similarityMetric": "JACCARD",
        "sudo": false,
        "topK": 10,
        "topN": 0,
        "upperDegreeCutoff": 2147483647,
        "useComponents": false
      },
      "mutateMillis": 182,
      "nodesCompared": 4,
      "postProcessingMillis": 0,
      "preProcessingMillis": 10,
      "relationshipsWritten": 4,
      "similarityDistribution": {
        "max": 0.8000030517578124,
        "mean": 0.5666666030883789,
        "min": 0.3333320617675781,
        "p1": 0.3333320617675781,
        "p10": 0.3333320617675781,
        "p100": 0.8000011444091797,
        "p25": 0.3333320617675781,
        "p5": 0.3333320617675781,
        "p50": 0.3333320617675781,
        "p75": 0.8000011444091797,
        "p90": 0.8000011444091797,
        "p95": 0.8000011444091797,
        "p99": 0.8000011444091797,
        "stdDev": 0.23333454132080078
      }
    },
    "project_1": {
      "graphName": "snowgraph",
      "nodeCount": 9,
      "nodeMillis": 640,
      "relationshipCount": 9,
      "relationshipMillis": 719,
      "totalMillis": 1359
    },
    "write_relationship_type_1": {
      "exportMillis": 1973,
      "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY",
      "relationshipProperty": "score",
      "relationshipType": "SIMILAR",
      "relationshipsExported": 4
    }
}

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_SIMILARITY ORDER BY SCORE DESC;

Table 19. Results
SOURCENODEID	TARGETNODEID	SCORE
Alice	Bob	0.8
Alice	Dave	0.3333333333
Bob	Alice	0.8
Dave	Alice	0.3333333333

It can be seen that the similarity between Alice and Dave decreased (from 0.5 to 0.33) compared to the non-weighted version of this algorithm.

Alice likes Guitar, Synthesize and Bongos with strengths (1, 1, 0.5). Dave likes Guitar, Bongos and Trumpet with strengths (1, 1, 1.5). Therefore, taking Alice and Dave’s neighbours, we have list of strengths for Alice as A = (1, 1, 0.5, 0) and for Dave B = (1, 0, 1, 1.5), indexed as Guitar, Synthesizer, Bongos, Trumpet.

The weighted (Jaccard) node similarity of Alice and Dave is hence:

Analogously, the similarity between Alice and Bob increased (from 0.66 to 0.8) as the missing liked instrument has a lower impact on the similarity score.