Filtering

This feature is in the beta tier. For more information on feature tiers, see API Tiers.

Subgraph projection is featured in the end-to-end example Jupyter notebooks:

In GDS, algorithms can be executed on a named graph that has been filtered based on its node labels and relationship types. However, that filtered graph only exists during the execution of the algorithm, and it is not possible to filter on property values. If a filtered graph needs to be used multiple times, one can use the subgraph catalog procedure to project a new graph in the graph catalog.

The filter predicates in the subgraph procedure can take labels, relationship types as well as node and relationship properties into account. The new graph can be used in the same way as any other in-memory graph in the catalog. Projecting subgraphs of subgraphs is also possible.

Syntax

A new graph can be projected by using the gds.graph.filter() procedure:
CALL gds.graph.filter(
  graphName: String,
  fromGraphName: String,
  nodeFilter: String,
  relationshipFilter: String,
  configuration: Map
) YIELD
  graphName: String,
  fromGraphName: String,
  nodeFilter: String,
  relationshipFilter: String,
  nodeCount: Integer,
  relationshipCount: Integer,
  projectMillis: Integer
Table 1. Parameters
Name Type Description

graphName

String

The name of the new graph that is stored in the graph catalog.

fromGraphName

String

The name of the original graph in the graph catalog.

nodeFilter

String

A Cypher predicate for filtering nodes in the input graph. * can be used to allow all nodes.

relationshipFilter

String

A Cypher predicate for filtering relationships in the input graph. * can be used to allow all relationships.

configuration

Map

Additional parameters to configure subgraph creation.

Table 2. Subgraph specific configuration
Name Type Default Optional Description

concurrency

Integer

4

yes

The number of concurrent threads used for filtering the graph.

jobId

String

Generated internally

yes

An ID that can be provided to more easily track the projection’s progress.

parameters

Map

{}

yes

A map of user-defined query parameters that are passed into the node and relationship filters.

Table 3. Results
Name Type Description

graphName

String

The name of the new graph that is stored in the graph catalog.

fromGraphName

String

The name of the original graph in the graph catalog.

nodeFilter

String

Filter predicate for nodes.

relationshipFilter

String

Filter predicate for relationships.

nodeCount

Integer

Number of nodes in the subgraph.

relationshipCount

Integer

Number of relationships in the subgraph.

projectMillis

Integer

Milliseconds for projecting the subgraph.

The nodeFilter and relationshipFilter configuration keys can be used to express filter predicates. Filter predicates are Cypher predicates bound to a single entity. An entity is either a node or a relationship. The filter predicate always needs to evaluate to true or false. A node is contained in the subgraph if the node filter evaluates to true. A relationship is contained in the subgraph if the relationship filter evaluates to true and its source and target nodes are contained in the subgraph.

A predicate is a combination of expressions. The simplest form of expression is a literal. GDS currently supports the following literals:

  • float literals, e.g., 13.37

  • integer literals, e.g., 42

  • boolean literals, i.e., TRUE and FALSE

Property, label and relationship type expressions are bound to an entity. The node entity is always identified by the variable n, the relationship entity is identified by r. Using the variable, we can refer to:

  • node label expression, e.g., n:Person

  • relationship type expression, e.g., r:KNOWS

  • node property expression, e.g., n.age

  • relationship property expression, e.g., r.since

Boolean predicates combine two expressions and return either true or false. GDS supports the following boolean predicates:

  • greater/lower than, such as n.age > 42 or r.since < 1984

  • greater/lower than or equal, such as n.age >= 42 or r.since ⇐ 1984

  • equality, such as n.age = 23 or r.since = 2020

  • logical operators, such as

  • n.age > 23 AND n.age < 42

  • n.age = 23 OR n.age = 42

  • n.age = 23 XOR n.age = 42

  • n.age IS NOT 23

Variable names that can be used within predicates are not arbitrary. A node predicate must refer to variable n. A relationship predicate must refer to variable r.

An exception is the degree function, which simply returns the node’s degree. The function takes relationship types as arguments. Multiple types are considered as disjunctive. For example, degree() > 42 filters nodes with a degree greater than 42 across all relationship types. The expression degree('Foo', 'Bar') > 42 filters nodes where the sum of Foo and Bar relationships is greater than 42.

Examples

All the examples below should be run in an empty database.

The examples use Cypher projections as the norm. Native projections will be deprecated in a future release.

In order to demonstrate the GDS project subgraph capabilities we are going to create a small social graph in Neo4j.

The following Cypher statement will create the example graph in the Neo4j database:
CREATE
  (p0:Person { age: 16 }),
  (p1:Person { age: 18 }),
  (p2:Person { age: 20 }),
  (b0:Book   { isbn: 1234 }),
  (b1:Book   { isbn: 4242 }),
  (p0)-[:KNOWS { since: 2010 }]->(p1),
  (p0)-[:KNOWS { since: 2018 }]->(p2),
  (p0)-[:READS]->(b0),
  (p1)-[:READS]->(b0),
  (p2)-[:READS]->(b1)
Project the social network graph:
MATCH (n:Person)-[r:KNOWS|READS]->(m:Person|Book)
RETURN gds.graph.project('social-graph', n, m,
  {
    sourceNodeLabels: labels(n),
    targetNodeLabels: labels(m),
    sourceNodeProperties: n { .age },
    targetNodeProperties: CASE WHEN m:Person THEN m { .age } ELSE {} END,
    relationshipType: type(r),
    relationshipProperties: CASE WHEN r:KNOWS THEN r { .since } ELSE {} END
  }
)

Node filtering

Create a new graph containing only users of a certain age group:
CALL gds.graph.filter(
  'teenagers',
  'social-graph',
  'n.age > 13 AND n.age <= 18',
  '*'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
Table 4. Results
graphName fromGraphName nodeCount relationshipCount

"teenagers"

"social-graph"

2

1

Node degree Filtering

Create a new graph containing only nodes with more than two relationships:
CALL gds.graph.filter(
  'degree-graph',
  'social-graph',
  'degree() > 2',
  '*'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
Table 5. Results
graphName fromGraphName nodeCount relationshipCount

"degree-graph"

"social-graph"

1

0

Node and relationship filtering

Create a new graph containing only users of a certain age group that know each other since a given point a time:
CALL gds.graph.filter(
  'teenagers',
  'social-graph',
  'n.age > 13 AND n.age <= 18',
  'r.since >= 2012.0'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
Table 6. Results
graphName fromGraphName nodeCount relationshipCount

"teenagers"

"social-graph"

2

0

Bipartite subgraph

Create a new bipartite graph between books and users connected by the READS relationship type:
CALL gds.graph.filter(
  'teenagers-books',
  'social-graph',
  'n:Book OR n:Person',
  'r:READS'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
Table 7. Results
graphName fromGraphName nodeCount relationshipCount

"teenagers-books"

"social-graph"

5

3

Bipartite graph node filtering

The previous example can be extended with an additional filter applied only to persons:
CALL gds.graph.filter(
  'teenagers-books',
  'social-graph',
  'n:Book OR (n:Person AND n.age > 18)',
  'r:READS'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
Table 8. Results
graphName fromGraphName nodeCount relationshipCount

"teenagers-books"

"social-graph"

3

1

Using query parameters

Similar to Cypher, it is also possible to set query parameters. As an example we can rewrite the node filter example from above using parameters instead of integer literals:

Create a new graph containing only users of a certain age group:
CALL gds.graph.filter(
  'teenagers-parameterized',
  'social-graph',
  'n.age > $lower AND n.age <= $upper',
  '*',
  { parameters: { lower: 13, upper: 18 } }
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
Table 9. Results
graphName fromGraphName nodeCount relationshipCount

"teenagers-parameterized"

"social-graph"

2

1