Reading from Neo4j
The connector provides three data source options to read data from a Neo4j database.
| Option | Description | Value | Default |
|---|---|---|---|
|
Use this if you only need to read nodes with their properties. |
Colon-separated list of node labels to read. |
(empty) |
|
Use this if you need to read relationships along with their source and target nodes. |
Relationship type to read. |
(empty) |
|
Use this if you need more flexibility and know how to write a Cypher® query. |
Cypher query with a |
(empty) |
Examples
|
All the examples in this page assume that the |
|
You can run the write examples for each option to have some example data to read. |
labels option
Read the :Person nodes.
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("labels", ":Person")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("labels", ":Person")
.load()
)
df.show()
See Read nodes for more information and examples.
relationship option
Read the :BOUGHT relationship with its source and target nodes and its properties.
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
df.show()
See Read relationships for more information and examples.
query option
Use a Cypher query to read data.
val readQuery = """
MATCH (n:Person)
RETURN id(n) AS id, n.fullName AS name
"""
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("query", readQuery)
.load()
df.show()
read_query = """
MATCH (n:Person)
RETURN id(n) AS id, n.fullName AS name
"""
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("query", read_query)
.load()
)
df.show()
See Read with a Cypher query for more information and examples.
Type mapping
See Data type mapping for the full type mapping between Spark DataFrames and Neo4j.
Performance considerations
If the schema is not specified, the Spark Connector uses sampling. Since sampling is potentially an expensive operation, consider defining a schema.