Visualization

The Neo4j Graph Analytics app has built-in capabilities that allow you to easily visualize your Snowflake tables as graphs inside Snowflake notebooks. The visualization is interactive, and supports features such as zooming, panning, moving nodes, and hovering over nodes and relationships to see their properties. This functionality is available via the experimental.visualize procedure, which is powered by the neo4j-viz Python library.

The Cora dataset
Figure 1. A visualization of the Cora dataset

Syntax

This section covers the syntax used to generate an interactive graph visualization from Snowflake tables.

The experimental.visualize procedure takes two mandatory parameters: the Project configuration and the Visualization configuration.

Generate a graph visualization
CALL Neo4j_Graph_Analytics.experimental.visualize(
  {...},  (1)
  {...},  (2)
);
1 Project configuration.
2 Visualization configuration.

The procedure returns a string containing HTML/JavaScript for the desired graph visualization. This string can then be rendered in various ways, for example using streamlit inside a Snowflake notebook (see example below).

Project configuration

The Project configuration is used to specify which data should be included in the visualization. It is the same configuration as the project configuration used for algorithm jobs.

Table 1. Project configuration
Name Type

nodeTables

List of node tables.

relationshipTables

Map of relationship types to relationship tables.

For more details on the Project configuration, refer to the Project documentation.

Special columns

It is possible to modify the visualization by including columns of certain specific names in the node and relationship tables.

All such special columns can be found here for nodes and here for relationships. Though listed in snake_case here, SCREAMING_SNAKE_CASE and camelCase are also supported. Some of the most commonly used special columns are:

  • Node sizes: The sizes of nodes can be controlled by including a column named "SIZE" in node tables. The values in these columns should be of a numeric type. This can be useful for visualizing the relative importance or size of nodes in the graph, for example using a computed centrality score.

  • Captions: The caption text of nodes and relationships can be controlled by including a column named "CAPTION" in the tables. The values in these columns should be of a string type. This can be useful for displaying additional information about the nodes, such as their names or labels. If no "CAPTION" column is provided, the default captions in the visualization will be the names of the corresponding node and relationship tables.

If the columns you want to use are of different names, we recommend using views to rename them to the desired names.

Visualization configuration

The Visualization configuration is used to specify how the provided graph should be rendered, and is made up of two main parts.

Table 2. Visualization configuration
Name Type Default Optional Description

nodeColoring

Map

{}

yes

Configuration for node coloring.

renderOptions

Map

{}

yes

Configuration for rendering options.

Node coloring configuration

The node coloring configuration allows you to specify how the nodes in the graph should be colored based on the values in a specific column.

Table 3. Node coloring configuration
Name Type Default Optional Description

byColumn

String

n/a

no

The column whose values make up the basis for coloring the nodes.

colorSpace

String

"discrete"

yes

The color space to use for coloring the nodes. Either "discrete" or "continuous".

If the "discrete" color space is used, each unique value in the specified byColumn column will be assigned a different color (as long as the colors last). This can be useful for visualizing categorical data, like community detection output, where each category is represented by a distinct color.

If the "continuous" color space is used, a gradient of colors will be applied based on the values in the specified byColumn column. This is useful for visualizing numerical data, such as centrality measures, where the color intensity represents the magnitude of the value.

By default, nodes are colored using the "discrete" color space and unique node captions to distinguish between colors (which in turn defaults to table names).

Render options configuration

The render options configuration allows you to specify how the graph should be rendered, including the height and width of the rendered graph, the maximum number of nodes allowed in the rendered graph, and the renderer to use.

Table 4. Render options configuration
Name Type Default Optional Description

height

String

"600px"

yes

The height of the rendered graph.

width

String

"100%"

yes

The width of the rendered graph.

max_allowed_nodes

Integer

10000

yes

The maximum number of nodes allowed in the rendered graph. The rendering will fail if the number of nodes exceeds this limit, to prevent performance issues.

renderer

String

"canvas"

yes

The renderer to use for rendering the graph. Either "webgl" or "canvas".

The WebGL renderer is optimized for performance and handles large graphs better. However, it does not render text, icons, and arrowheads on relationships. The canvas renderer is less performant than the WebGL renderer, making it less suited to render large graphs. However, it can render text, icons, and arrowheads on relationships.

Example

In this example we will visualize a small graph representing a patient journey network inside a Snowflake notebook.

The dataset contains two node tables:

  • PATIENT — representing patients in the network.

  • ENCOUNTER — representing medical encounters of patients.

And two relationship tables:

  • FIRST — from patients to encounter, representing the first encounter of a patient.

  • NEXT — from encounter to encounter, representing a chain of subsequent encounters in a patient’s journey.

Calling the visualize procedure

We provide our tables in the Project configuration, and leave the Visualization configuration empty to use the default settings. We call the experimental.visualize procedure within a SQL notebook cell.

Calling `experimental.visualize`

We can see that the procedure returns a string containing HTML/JavaScript for the desired graph visualization. It is inside a table of one row with a single column named "VISUALIZE".

Rendering the visualization

We can access the output of the previous cell by referencing its cell name, in this case cell1. In our next Python notebook cell, we extract the HTML/JavaScript string we want by interpreting the cell1 output as a Pandas DataFrame, then accessing the first row of the "VISUALIZE" column.

The HTML/JavaScript string we have derived can now be rendered in various ways. In this example we will use the streamlit library. We set the height to be 600 pixels, which is the same as the default height in the renderOptions configuration for experimental.visualize.

Rendering the HTML

The graph renders nicely, and we see that our two node types, PATIENT and ENCOUNTER, are colored and captioned differently. The FIRST and NEXT relationships are also rendered with different captions.

We can zoom in and out, pan around, move nodes, and hover over nodes and relationships to see their properties. The buttons on the top right also allow us to zoom, in addition to taking PNG snapshots of the graph.

Performance considerations

The performance of the experimental.visualize procedure depends on the size and complexity of the graph being visualized, as well as the machine it runs on.

The experimental.visualize procedure runs inside a Snowflake warehouse, and as such this warehouse should be sized appropriately for the graph being visualized.

The max_allowed_nodes parameter in the renderOptions part of the Visualization configuration is used to limit the number of nodes in the rendered graph, so that the rendering does not take too long or consume too much memory by default. If you need to visualize a large graphs, you can increase this limit, but be aware that this may lead to performance issues. To limit such performance issues, make sure that you are using an appropriately sized warehouse, and consider using the webgl renderer (of renderOptions).