Cypher projection
A Cypher projection creates an in-memory graph from the context of a Cypher query. With a Cypher projection, you can read data from one or more Neo4j databases, load local or remote files, or create data on the fly.
A Cypher projection has two main parts:
-
One or more clauses to construct a set of nodes or source-target node pairs.
-
A call to the
gds.graph.projectfunction.
See the Examples section for some common projection patterns.
|
Cypher projections are more flexible and expressive than native projections. |
Considerations
Lifecycle
Projected graphs reside in memory (in the graph catalog) until any of the following happens:
-
The graph is dropped with the
gds.graph.dropprocedure. -
The Neo4j database from which the graph was projected is stopped or dropped.
-
The Neo4j DBMS is stopped.
Node property support
Cypher projections can only project a limited set of node property types from a Cypher query. The Node Properties page details which node property types are supported. Other types of node properties have to be transformed or encoded into one of the supported types in order to be projected using a Cypher projection.
Selection of node properties and labels
If a node occurs multiple times, the node properties and labels of the first occurrence will be used for the projection.
This is important when a node can be a source node as well as a target node and their configuration differs.
Relevant configuration options are sourceNodeProperties, targetNodeProperties, sourceNodeLabels and targetNodeLabels.
Parallel Cypher Runtime
Cypher projection is compatible with the parallel runtime, which can be used to speed up the execution of the projection. The achieved speedup depends on how well the query can be parallelized. Note that the parallel runtime is only available on Neo4j Enterprise Edition and since version 5.13.
Syntax
A Cypher projection is an aggregation function over the relationships that are being projected; as such, it returns an object containing information on the projected graph.
The projection function takes two mandatory arguments, graphName and sourceNode.
The third parameter is targetNode and is usually provided.
The parameter is optional and can be null to project an unconnected node.
The next and fourth optional dataConfig parameter can be used to project node properties and labels as well as relationship properties and type.
The last and fifth optional configuration parameter can be used for general configuration of the projection such as readConcurrency.
RETURN gds.graph.project(
graphName: String,
sourceNode: Node or Integer,
targetNode: Node or Integer,
dataConfig: Map,
configuration: Map
) YIELD
graphName: String,
nodeCount: Integer,
relationshipCount: Integer,
projectMillis: Integer,
query: String,
configuration: Map
| Name | Optional | Description |
|---|---|---|
graphName |
no |
The name under which the graph is stored in the catalog. |
sourceNode |
no |
The source node of the relationship. Must not be null. |
targetNode |
yes |
The target node of the relationship. The targetNode can be null (for example due to an |
yes |
Properties and labels configuration for the source and target nodes as well as properties and type configuration for the relationship. |
|
yes |
Additional parameters to configure the projection. |
| Name | Type | Default | Description |
|---|---|---|---|
sourceNodeProperties |
Map |
{} |
The properties of the source node. |
targetNodeProperties |
Map |
{} |
The properties of the target node. |
sourceNodeLabels |
List of String or String |
[] |
The label(s) of the source node. |
targetNodeLabels |
List of String or String |
[] |
The label(s) of the target node. |
relationshipProperties |
Map |
{} |
The properties of the relationship. |
relationshipType |
String |
'*' |
The type of the relationship. |
| Name | Type | Default | Optional | Description |
|---|---|---|---|---|
readConcurrency |
Integer |
4 [1] |
yes |
The number of concurrent threads used for creating the graph. |
undirectedRelationshipTypes |
List of String |
[] |
yes |
Declare a number of relationship types as undirected. Relationships with the specified types will be imported as undirected. |
inverseIndexedRelationshipTypes |
List of String |
[] |
yes |
Declare a number of relationship types which will also be indexed in inverse direction. |
memory Aura Graph Analytics |
String |
- |
no [2] |
Declare the memory used for the GDS Session created for the projected graph. |
ttl Aura Graph Analytics |
Duration |
PT1H |
yes |
Declare the time to live before the GDS Session created for the projected graph expires due to inactivity. |
sessionId Aura Graph Analytics |
String |
- |
yes [2] |
The ID of the GDS Session the graph should be projected on. |
batchSize Aura Graph Analytics |
Integer |
10000 |
yes |
Size of batches transmitted from the DBMS to the session. Lower the value to reduce the memory usage on the DBMS side. Increase the value to send fewer batches to the GDS Session. |
1. In a GDS Session, the default is the number of available processors. 2. Only required for Aura Graph Analytics |
||||
| Name | Type | Description |
|---|---|---|
graphName |
String |
The name under which the graph is stored in the catalog. |
nodeCount |
Integer |
The number of nodes stored in the projected graph. |
relationshipCount |
Integer |
The number of relationships stored in the projected graph. |
projectMillis |
Integer |
Milliseconds for projecting the graph. |
query |
String |
The query used for this projection. |
configuration |
Integer |
The configuration used for this projection. |
| To get information about a stored graph, such as its schema, one can use gds.graph.list. |
Examples
|
All the examples below should be run in an empty database. |
In order to demonstrate the GDS Cypher Aggregation we are going to create a small social network graph in Neo4j. The example graph looks like this:
CREATE
(florentin:Person { name: 'Florentin', age: 16 }),
(adam:Person { name: 'Adam', age: 18 }),
(veselin:Person { name: 'Veselin', age: 20, ratings: [5.0] }),
(hobbit:Book { name: 'The Hobbit', isbn: 1234, numberOfPages: 310, ratings: [1.0, 2.0, 3.0, 4.5] }),
(frankenstein:Book { name: 'Frankenstein', isbn: 4242, price: 19.99 }),
(florentin)-[:KNOWS { since: 2010 }]->(adam),
(florentin)-[:KNOWS { since: 2018 }]->(veselin),
(florentin)-[:READ { numberOfPages: 4 }]->(hobbit),
(florentin)-[:READ { numberOfPages: 42 }]->(hobbit),
(adam)-[:READ { numberOfPages: 30 }]->(hobbit),
(veselin)-[:READ]->(frankenstein)
Simple graph
A simple graph is a graph with only one node label and relationship type, i.e., a monopartite graph.
We are going to start with demonstrating how to load a simple graph by projecting only the Person node label and KNOWS relationship type.
Person nodes and KNOWS relationships:MATCH (source:Person)-[r:KNOWS]->(target:Person)
WITH gds.graph.project('persons', source, target) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"persons" |
3 |
|
Graph with unconnected nodes
In order to project nodes that are not connected, we can use an OPTIONAL MATCH.
To demonstrate we are projecting all nodes, where some might be connected with the KNOWS relationship type.
KNOWS relationships:MATCH (source) OPTIONAL MATCH (source)-[r:KNOWS]->(target)
WITH gds.graph.project('persons', source, target) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"persons" |
5 |
|
Using the parallel runtime
Cypher projection is compatible with the parallel runtime.
CYPHER runtime=parallel
MATCH (source:Person)-[r:KNOWS]->(target:Person)
WITH gds.graph.project('persons', source, target) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"persons" |
3 |
|
Arbitrary source and target ID values
So far, the examples showed how to project a graph based on existing nodes. It is also possible to pass INTEGER values directly.
UNWIND [ [42, 84], [13, 37], [19, 84] ] AS sourceAndTarget
WITH sourceAndTarget[0] AS source, sourceAndTarget[1] AS target
WITH gds.graph.project('arbitrary', source, target) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"arbitrary" |
5 |
|
|
The projected graph can no longer connect to projected nodes to existing nodes in the underlying database.
As such, |
Multi-graph
A multi-graph is a graph with multiple node labels and relationship types.
To retain the label when we load multiple node labels, we can add a sourceNodeLabels key and a targetNodeLabels key to the fourth dataConfig parameter. — To retain the type information when we load multiple relationship types, we can add a relationshipType key to the fourth dataConfig parameter.
Person and Book nodes and KNOWS and READ relationships:MATCH (source)
WHERE source:Person OR source:Book
OPTIONAL MATCH (source)-[r:KNOWS|READ]->(target)
WHERE target:Person OR target:Book
WITH gds.graph.project(
'personsAndBooks',
source,
target,
{
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target),
relationshipType: type(r)
}
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"personsAndBooks" |
|
|
The value for sourceNodeLabels or targetNodeLabels can be one of the following:
| type | example | description |
|---|---|---|
List of String |
|
Associate all labels in that list with the source or target node. |
String |
|
Associate that label with the source or target node. |
Boolean |
|
Associate all labels of the source or target node; same as |
Boolean |
|
Don’t load any label information for the source or target node; same as if |
The value for relationshipType must be a String:
| type | example | description |
|---|---|---|
String |
|
Associate that type with the relationship. |
Relationship orientation
The native projection supports specifying an orientation per relationship type.
The Cypher Aggregation will treat every relationship returned by the relationship query as if it was in NATURAL orientation by default.
Reverse relationships
The orientation of a relationship can be reversed by switching the source and target nodes.
Person and Book nodes and KNOWS and READ relationships:MATCH (source)-[r:KNOWS|READ]->(target)
WHERE source:Book OR source:Person
WITH gds.graph.project(
'graphWithReverseRelationships',
target,
source
) as g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"graphWithReverseRelationships" |
5 |
6 |
Undirected relationships
Relationships can be projected as undirected by specifying the undirectedRelationshipTypes parameter.
Person and Book nodes and KNOWS and READ relationships:MATCH (source)-[r:KNOWS|READ]->(target)
WHERE source:Book OR source:Person
WITH gds.graph.project(
'graphWithUndirectedRelationships',
source,
target,
{},
{undirectedRelationshipTypes: ['*']}
) as g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"graphWithUndirectedRelationships" |
5 |
12 |
Add both natural and reverse relationships
To add both a relationship with its natural and with its reverse orientation, you can use a UNION clause within a subquery.
Person nodes and KNOWS and KNOWN_BY relationships:MATCH (source:Person)-[:KNOWS]->(target:Person)
CALL {
WITH source, target
RETURN id(source) AS sourceId, id(target) AS targetId, 'KNOWS' AS rType
UNION
WITH source, target
RETURN id(target) AS sourceId, id(source) AS targetId, 'KNOWN_BY' AS rType
}
WITH gds.graph.project(
'graphWithNaturalAndReverseRelationships',
sourceId,
targetId,
{
sourceNodeLabels: 'Person',
targetNodeLabels: 'Person',
relationshipType: rType
}
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"graphWithNaturalAndReverseRelationships" |
3 |
4 |
Node properties
To load node properties, we add a map of all properties for the source and target nodes. Thereby, we use the Cypher function coalesce() function to specify the default value, if the node does not have the property.
The properties for the source node are specified as sourceNodeProperties key in the fourth dataConfig parameter.
The properties for the target node are specified as targetNodeProperties key in the fourth dataConfig parameter.
Person and Book nodes and KNOWS and READ relationships:MATCH (source)-[r:KNOWS|READ]->(target)
WHERE source:Book OR source:Person
WITH gds.graph.project(
'graphWithProperties',
source,
target,
{
sourceNodeProperties: source { age: coalesce(source.age, 18), price: coalesce(source.price, 5.0), .ratings },
targetNodeProperties: target { age: coalesce(target.age, 18), price: coalesce(target.price, 5.0), .ratings }
}
) as g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"graphWithProperties" |
5 |
6 |
The projected graphWithProperties graph contains five nodes and six relationships.
In a Cypher Aggregation every node will get the same properties, which means you can’t have node-specific properties.
For instance in the example above the Person nodes will also get ratings and price properties, while Book nodes get the age property.
Further, the price property has a default value of 5.0.
Not every book has a price specified in the example graph.
In the following we check if the price was correctly projected:
MATCH (n:Book)
RETURN n.name AS name, gds.util.nodeProperty('graphWithProperties', n, 'price') AS price
ORDER BY price
| name | price |
|---|---|
"The Hobbit" |
5.0 |
"Frankenstein" |
19.99 |
We can see, that the price was projected with the Hobbit having the default price of 5.0.
Relationship properties
Analogous to node properties, we can project relationship properties using the fourth parameter.
Person and Book nodes and READ relationships with numberOfPages property:MATCH (source)-[r:READ]->(target)
WITH gds.graph.project(
'readWithProperties',
source,
target,
{ relationshipProperties: r { .numberOfPages } }
) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"readWithProperties" |
5 |
4 |
Next, we will verify that the relationship property numberOfPages was correctly loaded.
numberOfPages from the projected graph:CALL gds.graph.relationshipProperty.stream('readWithProperties', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY person ASC, numberOfPages DESC
| person | book | numberOfPages |
|---|---|---|
"Adam" |
"The Hobbit" |
30.0 |
"Florentin" |
"The Hobbit" |
42.0 |
"Florentin" |
"The Hobbit" |
4.0 |
"Veselin" |
"Frankenstein" |
NaN |
We can see, that the numberOfPages are loaded. The default property value is Double.Nan and can be changed as in the previous example Node properties by using the Cypher function coalesce().
Parallel relationships
The Property Graph Model in Neo4j supports parallel relationships, i.e., multiple relationships between two nodes. By default, GDS preserves the parallel relationships. For some algorithms, we want the projected graph to contain at most one relationship between two nodes.
The simplest way to achieve relationship deduplication is to use the DISTINCT operator in the relationship query.
Alternatively, we can aggregate the parallel relationship by using the count() function and store the count as a relationship property.
Person and Book nodes and COUNT aggregated READ relationships:MATCH (source)-[r:READ]->(target)
WITH source, target, count(r) AS numberOfReads
WITH gds.graph.project('readCount', source, target, { relationshipProperties: { numberOfReads: numberOfReads } }) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"readCount" |
5 |
3 |
Next, we will verify that the READ relationships were correctly aggregated.
numberOfReads of the projected graph:CALL gds.graph.relationshipProperty.stream('readCount', 'numberOfReads')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfReads
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfReads
ORDER BY numberOfReads DESC, person
| person | book | numberOfReads |
|---|---|---|
"Florentin" |
"The Hobbit" |
2.0 |
"Adam" |
"The Hobbit" |
1.0 |
"Veselin" |
"Frankenstein" |
1.0 |
We can see, that the two READ relationships between Florentin and the Hobbit result in 2 numberOfReads.
Parallel relationships with properties
For graphs with relationship properties we can also use other aggregations documented in the Cypher Manual.
Person and Book nodes and aggregated READ relationships by summing the numberOfPages:MATCH (source)-[r:READ]->(target)
WITH source, target, sum(r.numberOfPages) AS numberOfPages
WITH gds.graph.project('readSums', source, target, { relationshipProperties: { numberOfPages: numberOfPages } }) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"readSums" |
5 |
3 |
Next, we will verify that the relationship property numberOfPages were correctly aggregated.
numberOfPages of the projected graph:CALL gds.graph.relationshipProperty.stream('readSums', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY numberOfPages DESC, person
| person | book | numberOfPages |
|---|---|---|
"Florentin" |
"The Hobbit" |
46.0 |
"Adam" |
"The Hobbit" |
30.0 |
"Veselin" |
"Frankenstein" |
0.0 |
We can see, that the two READ relationships between Florentin and the Hobbit sum up to 46 numberOfPages.
Projecting filtered Neo4j graphs
Cypher-projections allow us to specify the graph to project in a more fine-grained way.
The following examples will demonstrate how to filter out READ relationships if they do not have a numberOfPages property.
Person and Book nodes and READ relationships where numberOfPages is present:MATCH (source) OPTIONAL MATCH (source)-[r:READ]->(target)
WHERE r.numberOfPages IS NOT NULL
WITH gds.graph.project('existingNumberOfPages', source, target, { relationshipProperties: r { .numberOfPages } }) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"existingNumberOfPages" |
5 |
3 |
Next, we will verify that the relationship property numberOfPages was correctly loaded.
numberOfPages from the projected graph:CALL gds.graph.relationshipProperty.stream('existingNumberOfPages', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY person ASC, numberOfPages DESC
| person | book | numberOfPages |
|---|---|---|
"Adam" |
"The Hobbit" |
30.0 |
"Florentin" |
"The Hobbit" |
42.0 |
"Florentin" |
"The Hobbit" |
4.0 |
If we compare the results to the ones from Relationship properties, we can see that using IS NOT NULL is filtering out the relationship from Veselin to the book Frankenstein.
This functionality is only expressible with native projections by projecting a subgraph.