Sharding data with the copy
command
The copy
command can be used to filter out data when creating database copies.
In the following example, a sample database is separated into 3 shards.
copy
command to filter out dataThe sample database contains the following data:
(p1 :Person :S2 {id:123, name: "Ava"})
(p2 :Person :S2 {id:124, name: "Bob"})
(p3 :Person :S3 {id:125, name: "Cat", age: 54})
(p4 :Person :S3 {id:126, name: "Dan"})
(t1 :Team :S1 :SAll {id:1, name: "Foo", mascot: "Pink Panther"})
(t2 :Team :S1 :SAll {id:2, name: "Bar", mascot: "Cookie Monster"})
(d1 :Division :SAll {name: "Marketing"})
(p1)-[:MEMBER]->(t1)
(p2)-[:MEMBER]->(t2)
(p3)-[:MEMBER]->(t1)
(p4)-[:MEMBER]->(t2)
The data has been prepared using queries to add the labels :S1
,:S2
, :S3
, and :SAll
, which denotes the target shard.
Shard 1 contains the team data.
Shard 2 and Shard 3 contain person data.
-
Create Shard 1 with:
bin/neo4j-admin database copy neo4j shard1 \ --copy-only-nodes-with-labels=S1,SAll \ (1) --skip-labels=S1,S2,S3,SAll (2)
1 The --copy-only-node-with-labels
property is used to filter out everything that does not have the label:S1
or:SAll
.2 The --skip-labels
property is used to exclude the temporary labels you created for the sharding process.The resulting shard contains the following:
(t1 :Team {id:1, name: "Foo", mascot: "Pink Panther"}) (t2 :Team {id:2, name: "Bar", mascot: "Cookie Monster"}) (d1 :Division {name: "Marketing"})
-
Create Shard 2:
bin/neo4j-admin database copy neo4j shard2 \ --copy-only-nodes-with-labels=S2,SAll \ --skip-labels=S1,S2,S3,SAll \ --copy-only-node-properties=Team.id
In Shard 2, you want to keep the
:Team
nodes as proxy nodes, to be able to link together information from the separate shards. The nodes will be included since they have the label:SAll
, but you specify--copy-only-node-properties
so as to not duplicate the team information from Shard 1.(p1 :Person {id:123, name: "Ava"}) (p2 :Person {id:124, name: "Bob"}) (t1 :Team {id:1}) (t2 :Team {id:2}) (d1 :Division {name: "Marketing"}) (p1)-[:MEMBER]->(t1) (p2)-[:MEMBER]->(t2)
Observe that
--copy-only-node-properties
did not filter outPerson.name
since the:Person
label was not mentioned in the filter. -
Create Shard 3, but with the filter
--skip-node-properties
, instead of--copy-only-node-properties
.bin/neo4j-admin database copy neo4j shard3 \ --copy-only-nodes-with-labels=S3,SAll \ --skip-labels=S1,S2,S3,SAll \ --skip-node-properties=Team.name,Team.mascot
The result is:
(p3 :Person {id:125, name: "Cat", age: 54}) (p4 :Person {id:126, name: "Dan"}) (t1 :Team {id:1}) (t2 :Team {id:2}) (d1 :Division {name: "Marketing"}) (p3)-[:MEMBER]->(t1) (p4)-[:MEMBER]->(t2)
As demonstrated, you can achieve the same result with both
--skip-node-properties
and--copy-only-node-properties
. In this example, it is easier to use--copy-only-node-properties
because only one property should be kept. The relationship property filters work in the same way.