Projecting graphs and using the graph catalog

Follow along with a notebook in Colab Google Colab

This example shows how to:

  • load Neo4j on-disk data into in-memory projected graphs;

  • use the graph catalog to manage projected graphs.

Setup

For more information on how to get started using Python, refer to the Connecting with Python tutorial.

pip install graphdatascience
# Import the client
from graphdatascience import GraphDataScience

# Replace with the actual URI, username, and password
AURA_CONNECTION_URI = "neo4j+s://xxxxxxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = ""

# Configure the client with AuraDS-recommended settings
gds = GraphDataScience(
    AURA_CONNECTION_URI,
    auth=(AURA_USERNAME, AURA_PASSWORD),
    aura_ds=True
)

In the following code examples we use the print function to print Pandas DataFrame and Series objects. You can try different ways to print a Pandas object, for instance via the to_string and to_json methods; if you use a JSON representation, in some cases you may need to include a default handler to handle Neo4j DateTime objects. Check the Python connection section for some examples.

For more information on how to get started using the Cypher Shell, refer to the Neo4j Cypher Shell tutorial.

Run the following commands from the directory where the Cypher shell is installed.
export AURA_CONNECTION_URI="neo4j+s://xxxxxxxx.databases.neo4j.io"
export AURA_USERNAME="neo4j"
export AURA_PASSWORD=""

./cypher-shell -a $AURA_CONNECTION_URI -u $AURA_USERNAME -p $AURA_PASSWORD

For more information on how to get started using Python, refer to the Connecting with Python tutorial.

pip install neo4j
# Import the driver
from neo4j import GraphDatabase

# Replace with the actual URI, username, and password
AURA_CONNECTION_URI = "neo4j+s://xxxxxxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = ""

# Instantiate the driver
driver = GraphDatabase.driver(
    AURA_CONNECTION_URI,
    auth=(AURA_USERNAME, AURA_PASSWORD)
)
# Import to prettify results
import json

# Import for the JSON helper function
from neo4j.time import DateTime

# Helper function for serializing Neo4j DateTime in JSON dumps
def default(o):
    if isinstance(o, (DateTime)):
        return o.isoformat()

Load data from Neo4j with native projections

Native projections are used to load into memory a graph stored on disk. The gds.graph.project procedure allows to project a graph by selecting the node labels, relationship types and properties to be projected.

The gds.graph.project procedure can use a "shorthand syntax", where the nodes and relationships projections are simply passed as single values or arrays, or an "extended syntax", where each node or relationship projection has its own configuration. The extended syntax is especially useful if additional transformation of the data or the graph structure are needed. Both methods are shown in this section, using the following graph as an example.

# Cypher query to create an example graph on disk
gds.run_cypher("""
    MERGE (a:EngineeringManagement {name: 'Alistair'})
    MERGE (j:EngineeringManagement {name: 'Jennifer'})
    MERGE (d:Developer {name: 'Leila'})
    MERGE (a)-[:MANAGES {start_date: 987654321}]->(d)
    MERGE (j)-[:MANAGES {start_date: 123456789, end_date: 987654321}]->(d)
""")
MERGE (a:EngineeringManagement {name: 'Alistair'})
MERGE (j:EngineeringManagement {name: 'Jennifer'})
MERGE (d:Developer {name: 'Leila'})
MERGE (a)-[:MANAGES {start_date: 987654321}]->(d)
MERGE (j)-[:MANAGES {start_date: 123456789, end_date: 987654321}]->(d)
# Cypher query to create an example graph on disk
write_example_graph_query = """
    MERGE (a:EngineeringManagement {name: 'Alistair'})
    MERGE (j:EngineeringManagement {name: 'Jennifer'})
    MERGE (d:Developer {name: 'Leila'})
    MERGE (a)-[:MANAGES {start_date: 987654321}]->(d)
    MERGE (j)-[:MANAGES {start_date: 123456789, end_date: 987654321}]->(d)
"""

# Create the driver session
with driver.session() as session:
    session.run(write_example_graph_query)

Project using the shorthand syntax

In this example we use the shorthand syntax to simply project all node labels and relationship types.

# Project a graph using the shorthand syntax
shorthand_graph, result = gds.graph.project(
    "shorthand-example-graph",
    ["EngineeringManagement", "Developer"],
    ["MANAGES"]
)

print(result)
CALL gds.graph.project(
  'shorthand-example-graph',
  ['EngineeringManagement', 'Developer'],
  ['MANAGES']
)
YIELD graphName, nodeCount, relationshipCount
RETURN *
shorthand_graph_create_call = """
    CALL gds.graph.project(
      'shorthand-example-graph',
      ['EngineeringManagement', 'Developer'],
      ['MANAGES']
    )
    YIELD graphName, nodeCount, relationshipCount
    RETURN *
"""

# Create the driver session
with driver.session() as session:
    # Call to project a graph using the shorthand syntax
    result = session.run(shorthand_graph_create_call).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True))

Project using the extended syntax

In this example we use the extended syntax for node and relationship projections to:

  • transform the EngineeringManagement and Developer labels to PersonEM and PersonD respectively;

  • transform the directed MANAGES relationship into the KNOWS undirected relationship;

  • keep the start_date and end_date relationship properties, adding a default value of 999999999 to end_date.

The projected graph becomes the following:

(:PersonEM {first_name: 'Alistair'})-
  [:KNOWS {start_date: 987654321, end_date: 999999999}]-
  (:PersonD {first_name: 'Leila'})-
  [:KNOWS {start_date: 123456789, end_date: 987654321}]-
  (:PersonEM {first_name: 'Jennifer'})
# Project a graph using the extended syntax
extended_form_graph, result = gds.graph.project(
    "extended-form-example-graph",
    {
        "PersonEM": {
            "label": "EngineeringManagement"
        },
        "PersonD": {
            "label": "Developer"
        }
    },
    {
        "KNOWS": {
            "type": "MANAGES",
            "orientation": "UNDIRECTED",
            "properties": {
                "start_date": {
                    "property": "start_date"
                },
                "end_date": {
                    "property": "end_date",
                    "defaultValue": 999999999
                }
            }
        }
    }
)

print(result)
CALL gds.graph.project(
  'extended-form-example-graph',
  {
    PersonEM: {
      label: 'EngineeringManagement'
    },
    PersonD: {
      label: 'Developer'
    }
  },
  {
    KNOWS: {
      type: 'MANAGES',
      orientation: 'UNDIRECTED',
      properties: {
        start_date: {
          property: 'start_date'
        },
        end_date: {
          property: 'end_date',
          defaultValue: 999999999
        }
      }
    }
  }
)
YIELD graphName, nodeCount, relationshipCount
RETURN *
extended_form_graph_create_call = """
    CALL gds.graph.project(
      'extended-form-example-graph',
      {
        PersonEM: {
          label: 'EngineeringManagement'
        },
        PersonD: {
          label: 'Developer'
        }
      },
      {
        KNOWS: {
          type: 'MANAGES',
          orientation: 'UNDIRECTED',
          properties: {
            start_date: {
              property: 'start_date'
            },
            end_date: {
              property: 'end_date',
              defaultValue: 999999999
            }
          }
        }
      }
    )
    YIELD graphName, nodeCount, relationshipCount
    RETURN *
"""

# Create the driver session
with driver.session() as session:
    # Call to project a graph using the extended syntax
    result = session.run(extended_form_graph_create_call).data()

    # Prettify the results
    print(json.dumps(result, indent=2, sort_keys=True))

Use the graph catalog

The graph catalog can be used to retrieve information on and manage the projected graphs.

List all the graphs

The gds.graph.list procedure can be used to list all the graphs currently stored in memory.

# List all in-memory graphs
all_graphs = gds.graph.list()

print(all_graphs)
CALL gds.graph.list()
show_in_memory_graphs_call = """
    CALL gds.graph.list()
"""

# Create the driver session
with driver.session() as session:
    # Run the Cypher procedure
    results = session.run(show_in_memory_graphs_call).data()

    # Prettify the results
    print(json.dumps(results, indent=2, sort_keys=True, default=default))

Check that a graph exists

The gds.graph.exists procedure can be called to check for the existence of a graph by its name.

# Check whether the "shorthand-example-graph" graph exists in memory
graph_exists = gds.graph.exists("shorthand-example-graph")

print(graph_exists)
CALL gds.graph.exists('example-graph')
check_graph_exists_call = """
    CALL gds.graph.exists('example-graph')
"""

# Create the driver session
with driver.session() as session:
    # Run the Cypher procedure and print the result
    print(session.run(check_graph_exists_call).data())

Drop a graph

When a graph is no longer needed, it can be dropped to free up memory using the gds.graph.drop procedure.

# Drop a graph object and keep the result of the call
result = gds.graph.drop(shorthand_graph)

# Print the result
print(result)

# Drop a graph object and just print the result of the call
gds.graph.drop(extended_form_graph)
CALL gds.graph.drop('shorthand-example-graph');

CALL gds.graph.drop('extended-form-example-graph');
delete_shorthand_graph_call = """
    CALL gds.graph.drop('shorthand-example-graph')
"""

delete_extended_form_graph_call = """
    CALL gds.graph.drop('extended-form-example-graph')
"""

# Create the driver session
with driver.session() as session:
    # Drop a graph and keep the result of the call
    result = session.run(delete_shorthand_graph_call).data()

    # Prettify the result
    print(json.dumps(result, indent=2, sort_keys=True, default=default))

    # Drop a graph discarding the result of the call
    session.run(delete_extended_form_graph_call).data()

Cleanup

When the projected graphs are dropped, the underlying data on the disk are not deleted. If such data are no longer needed, they need to be deleted manually via a Cypher query.

# Delete on-disk data
gds.run_cypher("""
    MATCH (example)
    WHERE example:EngineeringManagement OR example:Developer
    DETACH DELETE example
""")
MATCH (example)
WHERE example:EngineeringManagement OR example:Developer
DETACH DELETE example;
delete_example_graph_query = """
    MATCH (example)
    WHERE example:EngineeringManagement OR example:Developer
    DETACH DELETE example
"""

# Create the driver session
with driver.session() as session:
    # Run Cypher call
    print(session.run(delete_example_graph_query).data())

Closing the connection

The connection should always be closed when no longer needed.

Although the GDS client automatically closes the connection when the object is deleted, it is good practice to close it explicitly.

# Close the client connection
gds.close()
# Close the driver connection
driver.close()

References

Cypher