Cypher projection (deprecated)
|
This page describes the Legacy Cypher projection, which is deprecated. The replacement is to use the new Cypher projection, which is described in Projecting graphs using Cypher. A migration guide is available at Appendix C, Migration from Legacy to new Cypher projection. |
Legacy Cypher projections are a more flexible and expressive approach compared to native projections. A Legacy Cypher projection uses Cypher to create (project) an in-memory graph from the Neo4j database.
Considerations
Lifecycle
|
The projected graphs will reside in the catalog until either:
|
Node property support
Legacy Cypher projections can only project a limited set of node property types from a Cypher query. The Node Properties page details which node property types are supported. Other types of node properties have to be transformed or encoded into one of the supported types in order to be projected using a Legacy Cypher projection.
Syntax
A Legacy Cypher projection takes three mandatory arguments: graphName, nodeQuery and relationshipQuery.
In addition, the optional configuration parameter allows us to further configure graph creation.
CALL gds.graph.project.cypher(
graphName: String,
nodeQuery: String,
relationshipQuery: String,
configuration: Map
) YIELD
graphName: String,
nodeQuery: String,
nodeCount: Integer,
relationshipQuery: String,
relationshipCount: Integer,
projectMillis: Integer
| Name | Optional | Description |
|---|---|---|
graphName |
no |
The name under which the graph is stored in the catalog. |
nodeQuery |
no |
Cypher query to project nodes. The query result must contain an |
relationshipQuery |
no |
Cypher query to project relationships. The query result must contain |
configuration |
yes |
Additional parameters to configure the Legacy Cypher projection. |
| Name | Type | Default | Description |
|---|---|---|---|
readConcurrency |
Integer |
4 |
The number of concurrent threads used for creating the graph. |
validateRelationships |
Boolean |
true |
Whether to throw an error if the |
parameters |
Map |
{} |
A map of user-defined query parameters that are passed into the node and relationship queries. |
jobId |
String |
Generated internally |
An ID that can be provided to more easily track the projection’s progress. |
| Name | Type | Description |
|---|---|---|
graphName |
String |
The name under which the graph is stored in the catalog. |
nodeQuery |
String |
The Cypher query used to project the nodes in the graph. |
nodeCount |
Integer |
The number of nodes stored in the projected graph. |
relationshipQuery |
String |
The Cypher query used to project the relationships in the graph. |
relationshipCount |
Integer |
The number of relationships stored in the projected graph. |
projectMillis |
Integer |
Milliseconds for projecting the graph. |
| To get information about a stored graph, such as its schema, one can use gds.graph.list. |
Examples
|
All the examples below should be run in an empty database. |
In order to demonstrate the GDS Graph Project capabilities we are going to create a small social network graph in Neo4j. The example graph looks like this:
CREATE
(florentin:Person { name: 'Florentin', age: 16 }),
(adam:Person { name: 'Adam', age: 18 }),
(veselin:Person { name: 'Veselin', age: 20, ratings: [5.0] }),
(hobbit:Book { name: 'The Hobbit', isbn: 1234, numberOfPages: 310, ratings: [1.0, 2.0, 3.0, 4.5] }),
(frankenstein:Book { name: 'Frankenstein', isbn: 4242, price: 19.99 }),
(florentin)-[:KNOWS { since: 2010 }]->(adam),
(florentin)-[:KNOWS { since: 2018 }]->(veselin),
(florentin)-[:READ { numberOfPages: 4 }]->(hobbit),
(florentin)-[:READ { numberOfPages: 42 }]->(hobbit),
(adam)-[:READ { numberOfPages: 30 }]->(hobbit),
(veselin)-[:READ]->(frankenstein)
Simple graph
A simple graph is a graph with only one node label and relationship type, i.e., a monopartite graph.
We are going to start with demonstrating how to load a simple graph by projecting only the Person node label and KNOWS relationship type.
Person nodes and KNOWS relationships:CALL gds.graph.project.cypher(
'persons',
'MATCH (n:Person) RETURN id(n) AS id',
'MATCH (n:Person)-[r:KNOWS]->(m:Person) RETURN id(n) AS source, id(m) AS target')
YIELD
graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels
| graph | nodeQuery | nodes | relationshipQuery | rels |
|---|---|---|---|---|
"persons" |
|
3 |
"MATCH (n:Person)-[r:KNOWS]→(m:Person) RETURN id(n) AS source, id(m) AS target" |
|
Multi-graph
A multi-graph is a graph with multiple node labels and relationship types.
To retain the label and type information when we load multiple node labels and relationship types, we can add a labels column to the node query and a type column to the relationship query.
Person and Book nodes and KNOWS and READ relationships:CALL gds.graph.project.cypher(
'personsAndBooks',
'MATCH (n) WHERE n:Person OR n:Book RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:KNOWS|READ]->(m) RETURN id(n) AS source, id(m) AS target, type(r) AS type')
YIELD
graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipCount AS rels
| graph | nodeQuery | nodes | rels |
|---|---|---|---|
"personsAndBooks" |
|
|
|
Relationship orientation
The native projection supports specifying an orientation per relationship type.
The Legacy Cypher projection treats every relationship returned by the relationship query as if it were in NATURAL orientation and creates a directed relationship from the first provided id (source) to the second (target).
Projecting in REVERSE orientation can be achieved by switching the order of ids in the RETURN clause such as MATCH (n)-[r:KNOWS]→(m) RETURN id(m) AS source, id(n) AS target, type(r) AS type.
It not possible to project graphs in UNDIRECTED orientation when Legacy Cypher projections are used.
|
Some algorithms require that the graph was loaded with |
Node properties
To load node properties, we add a column to the result of the node query for each property. Thereby, we use the Cypher function coalesce() function to specify the default value, if the node does not have the property.
Person and Book nodes and KNOWS and READ relationships:CALL gds.graph.project.cypher(
'graphWithProperties',
'MATCH (n)
WHERE n:Book OR n:Person
RETURN
id(n) AS id,
labels(n) AS labels,
coalesce(n.age, 18) AS age,
coalesce(n.price, 5.0) AS price,
n.ratings AS ratings',
'MATCH (n)-[r:KNOWS|READ]->(m) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)
YIELD
graphName, nodeCount AS nodes, relationshipCount AS rels
RETURN graphName, nodes, rels
| graphName | nodes | rels |
|---|---|---|
"graphWithProperties" |
5 |
6 |
The projected graphWithProperties graph contains five nodes and six relationships.
In a Legacy Cypher projection every node from the nodeQuery gets the same node properties, which means you can’t have label-specific properties.
For instance in the example above the Person nodes will also get ratings and price properties, while Book nodes get the age property.
Further, the price property has a default value of 5.0.
Not every book has a price specified in the example graph.
In the following we check if the price was correctly projected:
MATCH (n:Book)
RETURN n.name AS name, gds.util.nodeProperty('graphWithProperties', id(n), 'price') AS price
ORDER BY price
| name | price |
|---|---|
"The Hobbit" |
5.0 |
"Frankenstein" |
19.99 |
We can see, that the price was projected with the Hobbit having the default price of 5.0.
Relationship properties
Analogous to node properties, we can project relationship properties using the relationshipQuery.
Person and Book nodes and READ relationships with numberOfPages property:CALL gds.graph.project.cypher(
'readWithProperties',
'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:READ]->(m)
RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"readWithProperties" |
5 |
4 |
Next, we will verify that the relationship property numberOfPages was correctly loaded.
numberOfPages from the projected graph:CALL gds.graph.relationshipProperty.stream('readWithProperties', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY person ASC, numberOfPages DESC
| person | book | numberOfPages |
|---|---|---|
"Adam" |
"The Hobbit" |
30.0 |
"Florentin" |
"The Hobbit" |
42.0 |
"Florentin" |
"The Hobbit" |
4.0 |
"Veselin" |
"Frankenstein" |
NaN |
We can see, that the numberOfPages are loaded. The default property value is Double.Nan and can be changed as in the previous example Node properties by using the Cypher function coalesce().
Parallel relationships
The Property Graph Model in Neo4j supports parallel relationships, i.e., multiple relationships between two nodes. By default, GDS preserves the parallel relationships. For some algorithms, we want the projected graph to contain at most one relationship between two nodes.
The simplest way to achieve relationship deduplication is to use the DISTINCT operator in the relationship query.
Alternatively, we can aggregate the parallel relationship by using the count() function and store the count as a relationship property.
Person and Book nodes and COUNT aggregated READ relationships:CALL gds.graph.project.cypher(
'readCount',
'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:READ]->(m)
RETURN id(n) AS source, id(m) AS target, type(r) AS type, count(r) AS numberOfReads'
)
YIELD
graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"readCount" |
5 |
3 |
Next, we will verify that the READ relationships were correctly aggregated.
numberOfReads of the projected graph:CALL gds.graph.relationshipProperty.stream('readCount', 'numberOfReads')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfReads
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfReads
ORDER BY numberOfReads DESC, person
| person | book | numberOfReads |
|---|---|---|
"Florentin" |
"The Hobbit" |
2.0 |
"Adam" |
"The Hobbit" |
1.0 |
"Veselin" |
"Frankenstein" |
1.0 |
We can see, that the two READ relationships between Florentin and the Hobbit result in 2 numberOfReads.
Parallel relationships with properties
For graphs with relationship properties we can also use other aggregations documented in the Cypher Manual.
Person and Book nodes and aggregated READ relationships by summing the numberOfPages:CALL gds.graph.project.cypher(
'readSums',
'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:READ]->(m)
RETURN id(n) AS source, id(m) AS target, type(r) AS type, sum(r.numberOfPages) AS numberOfPages'
)
YIELD
graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"readSums" |
5 |
3 |
Next, we will verify that the relationship property numberOfPages were correctly aggregated.
numberOfPages of the projected graph:CALL gds.graph.relationshipProperty.stream('readSums', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY numberOfPages DESC, person
| person | book | numberOfPages |
|---|---|---|
"Florentin" |
"The Hobbit" |
46.0 |
"Adam" |
"The Hobbit" |
30.0 |
"Veselin" |
"Frankenstein" |
0.0 |
We can see, that the two READ relationships between Florentin and the Hobbit sum up to 46 numberOfPages.
Projecting filtered Neo4j graphs
Cypher-projections allow us to specify the graph to project in a more fine-grained way.
The following examples will demonstrate how we to filter out READ relationship if they do not have a numberOfPages property.
Person and Book nodes and READ relationships where numberOfPages is present:CALL gds.graph.project.cypher(
'existingNumberOfPages',
'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:READ]->(m)
WHERE r.numberOfPages IS NOT NULL
RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"existingNumberOfPages" |
5 |
3 |
Next, we will verify that the relationship property numberOfPages was correctly loaded.
numberOfPages from the projected graph:CALL gds.graph.relationshipProperty.stream('existingNumberOfPages', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY person ASC, numberOfPages DESC
| person | book | numberOfPages |
|---|---|---|
"Adam" |
"The Hobbit" |
30.0 |
"Florentin" |
"The Hobbit" |
42.0 |
"Florentin" |
"The Hobbit" |
4.0 |
If we compare the results to the ones from Relationship properties, we can see that using IS NOT NULL is filtering out the relationship from Veselin to the book Frankenstein.
This functionality is only expressible with native projections by projecting a subgraph.
Using query parameters
Similar to Cypher, it is also possible to set query parameters. In the following example we supply a list of strings to limit the cities we want to project.
Person and Book nodes and READ relationships where numberOfPages is greater than 9:CALL gds.graph.project.cypher(
'existingNumberOfPages',
'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:READ]->(m)
WHERE r.numberOfPages > $minNumberOfPages
RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages',
{ parameters: { minNumberOfPages: 9} }
)
YIELD
graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
| graph | nodes | rels |
|---|---|---|
"existingNumberOfPages" |
5 |
2 |
Further usage of parameters
The parameters can also be used to directly pass in a list of nodes or a list of relationships. For example, pre-computing the list of nodes can be useful if the node filter is expensive.
Person nodes younger than 17 and their name not beginning with V, and KNOWS relationships:CALL gds.graph.project.cypher(
'personSubset',
'MATCH (n)
WHERE n.age < 20 AND NOT n.name STARTS WITH "V"
RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:KNOWS]->(m)
WHERE (n.age < 20 AND NOT n.name STARTS WITH "V") AND
(m.age < 20 AND NOT m.name STARTS WITH "V")
RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
graphName, nodeCount AS nodes, relationshipCount AS rels
| graphName | nodes | rels |
|---|---|---|
"personSubset" |
2 |
1 |
By passing the relevant Persons as a parameter, the above query can be transformed into the following:
Person nodes younger than 20 and their name not beginning with V, and KNOWS relationships by using parameters:MATCH (n)
WHERE n.age < 20 AND NOT n.name STARTS WITH "V"
WITH collect(n) AS olderPersons
CALL gds.graph.project.cypher(
'personSubsetViaParameters',
'UNWIND $nodes AS n RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n)-[r:KNOWS]->(m)
WHERE (n IN $nodes) AND (m IN $nodes)
RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages',
{ parameters: { nodes: olderPersons} }
)
YIELD
graphName, nodeCount AS nodes, relationshipCount AS rels
RETURN graphName, nodes, rels
| graphName | nodes | rels |
|---|---|---|
"personSubsetViaParameters" |
2 |
1 |