Set up and use a Composite database
Composite databases allow queries that access multiple graphs at once. This is a function that enables:
-
Data Federation: the ability to access data available in distributed sources in the form of disjoint graphs.
-
Data Sharding: the ability to access data available in distributed sources in the form of a common graph partitioned on multiple databases.
In this tutorial, you will learn how to:
Model your data for Composite database use
The example data in this tutorial is based on the Northwind dataset, created by Microsoft. It contains the sales data of a fictitious small company called “Northwind Traders”. The data includes customers, products, customer orders, warehouse stock, shipping, suppliers, employees, and sales territories.
For more information on how Northwind (a relational dataset) is modeled into a graph, run |
The Northwind graph model consists of the following data:
-
Node labels
-
:Product
-
:Category
-
:Supplier
-
:Order
-
:Customer
-
-
Relationship types
-
:SUPPLIES
-
:PART_OF
-
:ORDERS
-
:PURCHASED
-
In this scenario, assume that data privacy constraints require customers’ data to be stored in their original region.
For simplicity, there are two regions: the Americas (AME) and Europe (EU).
The first step is to remodel the Northwind dataset, so that customer data can be separated from the Product catalog, which has no privacy constraints.
You create two graphs: one for the Product catalog, which includes :Product
, :Category
, :Supplier
, :PART_OF
, :SUPPLIES
, and one partitioned graph in two databases for the Customer orders in EU and AME, with :Product
, :Order
, :Customer
, :PURCHASED
, and :ORDERS
.
Data Federation
This way, the Product and Customer data are in two disjoint graphs, with different labels and relationship types. This is called Data Federation.
To query across them, you have to federate the graphs, because relationships cannot span across them.
This is done by using a proxy node modeling pattern: nodes with the :Product
label must be present in both federated domains.
In the Product catalog graph, nodes with the :Product
label contain all the data related to a product, while in the Customer graphs, the same label is associated to a proxy node which only contains productID
.
The productID
property allows you to link data across the graphs in this federation.
Data Sharding
Since the Customer data is for two regions (EU and AME), you have to partition it into two databases. The resulting two graphs have the same model (same labels, same relationship types), but different data. This is called Data Sharding.
In general, there are a couple of main use cases that require sharding. The most common is scalability, i.e. different shards can be deployed on different servers, splitting the load on different resources. Another reason could be data regulations: different shards can be deployed on servers, residing in different locations, and managed independently.
Create databases for the composite
For this tutorial, you will create the following databases:
-
db0
for the Product catalog. -
db1
for the EU customer data. -
db2
for the AME customers.
-
Start the Neo4j DBMS.
bin/neo4j start
-
Check all available databases.
ls -al /data/databases/
total 0 drwxr-xr-x@ 5 username staff 160 9 Jun 12:53 . drwxr-xr-x@ 5 username staff 160 9 Jun 12:53 .. drwxr-xr-x 37 username staff 1184 9 Jun 12:53 neo4j -rw-r--r-- 1 username staff 0 9 Jun 12:53 store_lock drwxr-xr-x 38 username staff 1216 9 Jun 12:53 system
-
Connect to the Neo4j DBMS using
cypher-shell
with the default credentials and change the password when prompted:bin/cypher-shell -u neo4j -p neo4j
Password change required new password: ******** Connected to Neo4j 5 at neo4j://localhost:7687 as user neo4j. Type :help for a list of available commands or :exit to exit the shell. Note that Cypher queries must end with a semicolon.
For more information about the Cypher Shell command-line interface (CLI) and how to use it, see Cypher Shell.
-
Run the command
SHOW DATABASES
to list all available databases:SHOW DATABASES;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | name | type | aliases | access | address | role | writer | requestedStatus | currentStatus | statusMessage | default | home | constituents | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "neo4j" | "standard" | [] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | TRUE | TRUE | [] | | "system" | "system" | [] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | FALSE | FALSE | [] | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 2 rows available after 102 ms, consumed after another 11 ms
-
Run the command
CREATE DATABASE <database-name>
to create the databases:CREATE DATABASE db0;
0 rows available after 137 ms, consumed after another 0 ms
CREATE DATABASE db1;
0 rows available after 141 ms, consumed after another 0 ms
CREATE DATABASE db2;
0 rows available after 135 ms, consumed after another 0 ms
-
Run the command
SHOW DATABASES
again to verify that the new databases have been created and areonline
:SHOW DATABASES;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | name | type | aliases | access | address | role | writer | requestedStatus | currentStatus | statusMessage | default | home | constituents | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "db0" | "standard" | [] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | FALSE | FALSE | [] | | "db1" | "standard" | [] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | FALSE | FALSE | [] | | "db2" | "standard" | [] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | FALSE | FALSE | [] | | "neo4j" | "standard" | [] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | TRUE | TRUE | [] | | "system" | "system" | [] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | FALSE | FALSE | [] | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 5 rows available after 8 ms, consumed after another 7 ms
Import data to your databases
You can use the command LOAD CSV WITH HEADERS FROM
to import data to the databases.
Load the Product catalog in db0
-
Run the following Cypher query to change the active database to
db0
, and add the Product data::use db0; LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row CREATE (n:Product) SET n = row, n.unitPrice = toFloat(row.unitPrice), n.unitsInStock = toInteger(row.unitsInStock), n.unitsOnOrder = toInteger(row.unitsOnOrder), n.reorderLevel = toInteger(row.reorderLevel), n.discontinued = (row.discontinued <> "0"); LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/categories.csv" AS row CREATE (n:Category) SET n = row; LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/suppliers.csv" AS row CREATE (n:Supplier) SET n = row; CREATE INDEX FOR (p:Product) ON (p.productID); CREATE INDEX FOR (c:Category) ON (c.categoryID); CREATE INDEX FOR (s:Supplier) ON (s.supplierID); MATCH (p:Product),(c:Category) WHERE p.categoryID = c.categoryID CREATE (p)-[:PART_OF]->(c); MATCH (p:Product),(s:Supplier) WHERE p.supplierID = s.supplierID CREATE (s)-[:SUPPLIES]->(p);
-
Press Enter.
-
Verify that the product data is loaded in
db0
:MATCH (s:Supplier)-[:SUPPLIES]->(p:Product)-[:PART_OF]->(c:Category) RETURN s.companyName AS Supplier, p.productName AS Product, c.categoryName AS Category LIMIT 5;
+--------------------------------------------------------------------------+ | Supplier | Product | Category | +--------------------------------------------------------------------------+ | "Bigfoot Breweries" | "Sasquatch Ale" | "Beverages" | | "Pavlova" | "Outback Lager" | "Beverages" | | "Bigfoot Breweries" | "Laughing Lumberjack Lager" | "Beverages" | | "Bigfoot Breweries" | "Steeleye Stout" | "Beverages" | | "Aux joyeux ecclésiastiques" | "Côte de Blaye" | "Beverages" | +--------------------------------------------------------------------------+ 5 rows available after 202 ms, consumed after another 5 ms
Load EU customers and related orders in db1
-
Run the following Cypher query to change the active database to
db1
, and add the EU customers and orders::use db1; :param europe => ['Germany', 'UK', 'Sweden', 'France', 'Spain', 'Switzerland', 'Austria', 'Italy', 'Portugal', 'Ireland', 'Belgium', 'Norway', 'Denmark', 'Finland']; LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/customers.csv" AS row WITH row WHERE row.country IN $europe CREATE (n:Customer) SET n = row; CREATE INDEX FOR (c:Customer) ON (c.customerID); LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/orders.csv" AS row WITH row MATCH (c:Customer) WHERE row.customerID = c.customerID CREATE (o:Order) SET o = row; CREATE INDEX FOR (o:Order) ON (o.orderID); MATCH (c:Customer),(o:Order) WHERE c.customerID = o.customerID CREATE (c)-[:PURCHASED]->(o); LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row CREATE (n:Product) SET n.productID = row.productID; CREATE INDEX FOR (p:Product) ON (p.productID); LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/order-details.csv" AS row MATCH (p:Product), (o:Order) WHERE p.productID = row.productID AND o.orderID = row.orderID CREATE (o)-[details:ORDERS]->(p) SET details = row, details.quantity = toInteger(row.quantity);
-
Press Enter.
-
Verify that the EU Customer orders data is loaded in
db1
:MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product) RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product LIMIT 5;
+-------------------------------------------------------------+ | Customer | CustomerCountry | Order | Product | +-------------------------------------------------------------+ | "Alfreds Futterkiste" | "Germany" | "10692" | "63" | | "Alfreds Futterkiste" | "Germany" | "10835" | "77" | | "Alfreds Futterkiste" | "Germany" | "10835" | "59" | | "Alfreds Futterkiste" | "Germany" | "10702" | "76" | | "Alfreds Futterkiste" | "Germany" | "10702" | "3" | +-------------------------------------------------------------+ 5 rows available after 47 ms, consumed after another 2 ms
Load AME customers and related orders in db2
-
Run the following Cypher query to change the active database to
db2
and add the AME customers and orders::use db2; :param americas => ['Mexico', 'Canada', 'Argentina', 'Brazil', 'USA', 'Venezuela']; LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/customers.csv" AS row WITH row WHERE row.country IN $americas CREATE (n:Customer) SET n = row; CREATE INDEX FOR (c:Customer) ON (c.customerID); LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/orders.csv" AS row WITH row MATCH (c:Customer) WHERE row.customerID = c.customerID CREATE (o:Order) SET o = row; CREATE INDEX FOR (o:Order) ON (o.orderID); MATCH (c:Customer),(o:Order) WHERE c.customerID = o.customerID CREATE (c)-[:PURCHASED]->(o); LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/products.csv" AS row CREATE (n:Product) SET n.productID = row.productID; CREATE INDEX FOR (p:Product) ON (p.productID); LOAD CSV WITH HEADERS FROM "https://data.neo4j.com/northwind/order-details.csv" AS row MATCH (p:Product), (o:Order) WHERE p.productID = row.productID AND o.orderID = row.orderID CREATE (o)-[details:ORDERS]->(p) SET details = row, details.quantity = toInteger(row.quantity);
-
Press Enter.
-
Verify that the AME Customer orders data is loaded in
db2
:MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product) RETURN c.companyName AS Customer, c.country AS CustomerCountry, o.orderID AS Order, p.productID AS Product LIMIT 5;
+----------------------------------------------------------------------------+ | Customer | CustomerCountry | Order | Product | +----------------------------------------------------------------------------+ | "Ana Trujillo Emparedados y helados" | "Mexico" | "10759" | "32" | | "Ana Trujillo Emparedados y helados" | "Mexico" | "10926" | "72" | | "Ana Trujillo Emparedados y helados" | "Mexico" | "10926" | "13" | | "Ana Trujillo Emparedados y helados" | "Mexico" | "10926" | "19" | | "Ana Trujillo Emparedados y helados" | "Mexico" | "10926" | "11" | +----------------------------------------------------------------------------+ 5 rows available after 42 ms, consumed after another 1 ms
Configure a Composite database
Set up a Composite database with the CREATE COMPOSITE DATABASE
Cypher command and add local database aliases as constituents to the Composite database.
In this example, the Composite database is called compositenw
.
-
Run the command
CREATE COMPOSITE DATABASE <composite-database-name>
to create the Composite database:CREATE COMPOSITE DATABASE compositenw;
0 rows available after 137 ms, consumed after another 0 ms
-
Run the command
CREATE ALIAS <composite-database-name>.<alias-name> FOR DATABASE <database-name>
to create the constituent database aliases:CREATE ALIAS compositenw.product FOR DATABASE db0;
0 rows available after 101 ms, consumed after another 0 ms
CREATE ALIAS compositenw.customerEU FOR DATABASE db1;
0 rows available after 107 ms, consumed after another 0 ms
CREATE ALIAS compositenw.customerAME FOR DATABASE db2;
0 rows available after 98 ms, consumed after another 0 ms
The constituent database aliases in this tutorial are local database aliases (targeting databases in the same Neo4j DBMS), but they can just as well be remote database aliases (targeting databases in another Neo4j DBMS).
-
Run the command
SHOW DATABASES
to verify that the Composite database has been configured and isonline
:SHOW DATABASES;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | name | type | aliases | access | address | role | writer | requestedStatus | currentStatus | statusMessage | default | home | constituents | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "db0" | "standard" | ["compositenw.product"] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | FALSE | FALSE | [] | | "db1" | "standard" | ["compositenw.customerEU"] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | FALSE | FALSE | [] | | "db2" | "standard" | ["compositenw.customerAME"] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | FALSE | FALSE | [] | | "compositenw" | "composite" | [] | "read-only" | "localhost:7687" | "primary" | FALSE | "online" | "online" | "" | FALSE | FALSE | ["compositenw.customerAME", "compositenw.customerEU", "compositenw.product"] | | "neo4j" | "standard" | [] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | TRUE | TRUE | [] | | "system" | "system" | [] | "read-write" | "localhost:7687" | "primary" | TRUE | "online" | "online" | "" | FALSE | FALSE | [] | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 6 rows available after 242 ms, consumed after another 18 ms
-
Run the command
SHOW ALIASES FOR DATABASES
to verify that the database aliases have been configured:SHOW ALIASES FOR DATABASES;
+--------------------------------------------------------------------------------+ | name | composite | database | location | url | user | +--------------------------------------------------------------------------------+ | "compositenw.product" | "compositenw" | "db0" | "local" | null | null | | "compositenw.customerEU" | "compositenw" | "db1" | "local" | null | null | | "compositenw.customerAME" | "compositenw" | "db2" | "local" | null | null | +--------------------------------------------------------------------------------+ 3 rows available after 203 ms, consumed after another 16 ms
Retrieve data with a single Cypher query
Query a single database
When connected to a Composite database you can retrieve data from a single database by using the Cypher clause USE
and the name of an alias:
:use compositenw
USE compositenw.product
MATCH (p:Product)
RETURN p.productName AS product
LIMIT 5;
+--------------------------------+ | product | +--------------------------------+ | "Chai" | | "Chang" | | "Aniseed Syrup" | | "Chef Anton's Cajun Seasoning" | | "Chef Anton's Gumbo Mix" | +--------------------------------+ 5 rows available after 6 ms, consumed after another 21 ms
Query across multiple shards
Use the Composite database to query both shards and get customers whose name starts with A:
:use compositenw
USE compositenw.customerAME
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
UNION
USE compositenw.customerEU
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
LIMIT 5;
+---------------------+ | name | country | +---------------------+ | "ANATR" | "Mexico" | | "ANTON" | "Mexico" | | "ALFKI" | "Germany" | | "AROUT" | "UK" | +---------------------+ 4 rows available after 25 ms, consumed after another 56 ms
Or, using a more common Composite database idiom:
:use compositenw
UNWIND ['compositenw.customerAME', 'compositenw.customerEU'] AS g
CALL {
USE graph.byName(g)
MATCH (c:Customer)
WHERE c.customerID STARTS WITH 'A'
RETURN c.customerID AS name, c.country AS country
}
RETURN name, country
LIMIT 5;
+---------------------+ | name | country | +---------------------+ | "ANATR" | "Mexico" | | "ANTON" | "Mexico" | | "ALFKI" | "Germany" | | "AROUT" | "UK" | +---------------------+ 4 rows available after 61 ms, consumed after another 8 ms
Query across federation and shards
Here is a more complex query that uses all 3 databases to find all customers who have bought discontinued products in the Meat/Poultry category:
:use compositenw
CALL {
USE compositenw.product
MATCH (p:Product)-[:PART_OF]->(c:Category)
WHERE p.discontinued = true
AND c.categoryName = 'Meat/Poultry'
RETURN COLLECT(p.productID) AS pids
}
WITH *
UNWIND [g IN graph.names() WHERE g STARTS WITH 'compositenw.customer'] AS g
CALL {
USE graph.byName(g)
WITH pids
UNWIND pids as pid
MATCH (p:Product{productID:pid})<-[:ORDERS]-(:Order)<-[:PURCHASED]-(c:Customer)
RETURN DISTINCT c.customerID AS customer, c.country AS country
}
RETURN customer, country
LIMIT 20;
+--------------------------+ | customer | country | +--------------------------+ | "RICSU" | "Switzerland" | | "PERIC" | "Mexico" | | "WARTH" | "Finland" | | "WELLI" | "Brazil" | | "DRACD" | "Germany" | | "RATTC" | "USA" | | "HUNGO" | "Ireland" | | "QUEDE" | "Brazil" | | "SEVES" | "UK" | | "ANTON" | "Mexico" | | "BERGS" | "Sweden" | | "SAVEA" | "USA" | | "AROUT" | "UK" | | "FAMIA" | "Brazil" | | "WANDK" | "Germany" | | "WHITC" | "USA" | | "ISLAT" | "UK" | | "LONEP" | "USA" | | "QUICK" | "Germany" | | "HILAA" | "Venezuela" | +--------------------------+ 20 rows available after 51 ms, consumed after another 2 ms
The way this query works is by compositenw
calling database db0
to retrieve all discontinued products in the Meat/Poultry category.
Then, using the returned product IDs, it queries both db1
and db2
in parallel and gets the customers who have purchased these products and their country.
You have just learned how to store and retrieve data from multiple databases using a single Cypher query.
For more details on Composite databases, see Concepts.