Writing to Neo4j
The connector provides three data source options to write data to a Neo4j database.
| Option | Description | Value | Default |
|---|---|---|---|
|
Use this if you only need to create or update nodes with their properties, or as a first step before adding relationships. |
Colon-separated list of node labels to create or update. |
(empty) |
|
Use this if you need to create or update relationships along with their source and target nodes. |
Relationship type to create or update. |
(empty) |
|
Use this if you need more flexibility and know how to write a Cypher® query. |
Cypher query with a |
(empty) |
Examples
|
All the examples in this page assume that the |
|
You can run the read examples for each option to check the data after writing. |
labels option
Write the :Person nodes.
case class Person(name: String, surname: String, age: Int)
val peopleDF = List(
Person("John", "Doe", 42),
Person("Jane", "Doe", 40)
).toDF()
peopleDF.write
.format("org.neo4j.spark.DataSource")
.mode(SaveMode.Append)
.option("labels", ":Person")
.save()
# Create example DataFrame
peopleDF = spark.createDataFrame(
[
{"name": "John", "surname": "Doe", "age": 42},
{"name": "Jane", "surname": "Doe", "age": 40},
]
)
(
peopleDF.write.format("org.neo4j.spark.DataSource")
.mode("Append")
.option("labels", ":Person")
.save()
)
See Write nodes for more information and examples.
relationship option
Write the :BOUGHT relationship with its source and target nodes and its properties.
val relDF = Seq(
("John", "Doe", 1, "Product 1", 200, "ABC100"),
("Jane", "Doe", 2, "Product 2", 100, "ABC200")
).toDF("name", "surname", "customerID", "product", "quantity", "order")
relDF.write
// Create new relationships
.mode("Append")
.format("org.neo4j.spark.DataSource")
// Assign a type to the relationships
.option("relationship", "BOUGHT")
// Use `keys` strategy
.option("relationship.save.strategy", "keys")
// Create source nodes and assign them a label
.option("relationship.source.save.mode", "Append")
.option("relationship.source.labels", ":Customer")
// Map the DataFrame columns to node properties
.option("relationship.source.node.properties", "name,surname,customerID:id")
// Create target nodes and assign them a label
.option("relationship.target.save.mode", "Append")
.option("relationship.target.labels", ":Product")
// Map the DataFrame columns to node properties
.option("relationship.target.node.properties", "product:name")
// Map the DataFrame columns to relationship properties
.option("relationship.properties", "quantity,order")
.save()
# Create example DataFrame
relDF = spark.createDataFrame(
[
{
"name": "John",
"surname": "Doe",
"customerID": 1,
"product": "Product 1",
"quantity": 200,
"order": "ABC100",
},
{
"name": "Jane",
"surname": "Doe",
"customerID": 2,
"product": "Product 2",
"quantity": 100,
"order": "ABC200",
},
]
)
(
relDF.write
# Create new relationships
.mode("Append")
.format("org.neo4j.spark.DataSource")
# Assign a type to the relationships
.option("relationship", "BOUGHT")
# Use `keys` strategy
.option("relationship.save.strategy", "keys")
# Create source nodes and assign them a label
.option("relationship.source.save.mode", "Append")
.option("relationship.source.labels", ":Customer")
# Map the DataFrame columns to node properties
.option("relationship.source.node.properties", "name,surname,customerID:id")
# Create target nodes and assign them a label
.option("relationship.target.save.mode", "Append")
.option("relationship.target.labels", ":Product")
# Map the DataFrame columns to node properties
.option("relationship.target.node.properties", "product:name")
# Map the DataFrame columns to relationship properties
.option("relationship.properties", "quantity,order")
.save()
)
See Write relationships for more information and examples.
query option
Use a Cypher query to write data.
case class Person(name: String, surname: String, age: Int)
// Create an example DataFrame
val queryDF = List(
Person("John", "Doe", 42),
Person("Jane", "Doe", 40)
).toDF()
// Define the Cypher query to use in the write
val writeQuery =
"CREATE (n:Person {fullName: event.name + ' ' + event.surname})"
queryDF.write
.format("org.neo4j.spark.DataSource")
.option("query", writeQuery)
.mode(SaveMode.Overwrite)
.save()
# Create example DataFrame
queryDF = spark.createDataFrame(
[
{"name": "John", "surname": "Doe", "age": 42},
{"name": "Jane", "surname": "Doe", "age": 40},
]
)
# Define the Cypher query to use in the write
write_query = "CREATE (n:Person {fullName: event.name + ' ' + event.surname})"
(
queryDF.write.format("org.neo4j.spark.DataSource")
.option("query", write_query)
.mode("Overwrite")
.save()
)
See Write with a Cypher query for more information and examples.
Save mode
Regardless of the write option, the connector supports two save modes for the data source mode() method:
-
The
Appendmode creates new nodes or relationships by building aCREATECypher query. -
The
Overwritemode creates or updates new nodes or relationships by building aMERGECypher query.-
Requires the
node.keysoption when used with thelabelsoption. -
Requires the
relationship.source.node.keysandrelationship.target.node.keyswhen used with therelationshipoption.
-
Type mapping
See Data type mapping for the full type mapping between Spark DataFrames and Neo4j.
Performance considerations
Since writing is typically an expensive operation, make sure you write only the DataFrame columns you need.
For example, if the columns from the data source are name, surname, age, and livesIn, but you only need name and surname, you can do the following:
df.select(df("name"), df("surname"))
.write
.format("org.neo4j.spark.DataSource")
.mode(SaveMode.Append)
.option("labels", ":Person")
.save()