Export to GraphML
The export GraphML procedures export data into a format that’s used by other tools like Gephi and CytoScape to read graph data.
All Point
or Temporal
data types are exported formatted as a String
e.g:
|
{"crs":"wgs-84-3d","latitude":56.7,"longitude":12.78,"height":100.0} |
|
{"crs":"wgs-84-3d","latitude":56.7,"longitude":12.78,"height":null} |
|
2018-10-10 |
|
2018-10-10T00:00 |
Note that, to perform a correct Point serialization, it is not recommended to export a point with coordinates x,y and crs: 'wgs-84',
for example point({x: 56.7, y: 12.78, crs: 'wgs-84'})
. Otherwise, the point will be exported with longitude and latitude (and heigth) instead of x and y (and z)
Available Procedures
The table below describes the available procedures:
The labels exported are ordered alphabetically.
The output of labels()
function is not sorted, use it in combination with apoc.coll.sort()
.
Configuration parameters
The procedures support the following config parameters:
param | default | description |
---|---|---|
format |
gephi |
In export to Graphml script define the export format. Possible value is: "gephi" |
caption |
It’s an array of string (i.e. ['name','title']) that define an ordered set of properties eligible as value for the |
|
useTypes |
false |
Write the attribute type information to the graphml output |
batchSize |
20000 |
define the batch size |
delim |
"," |
define the delimiter character (export csv) |
arrayDelim |
";" |
define the delimiter character for arrays (used in the bulk import) |
useTypes |
false |
add type on file header (export csv and graphml export) |
storeNodeIds |
false |
set nodes' ids (import/export graphml) |
readLabels |
false |
read nodes' labels (import/export graphml) |
defaultRelationshipType |
"RELATED" |
set relationship type (import/export graphml) |
separateFiles |
false |
export results in separated file by type (nodes, relationships..) |
stream |
false |
stream the xml directly to the client into the |
Exporting to a file
By default exporting to the file system is disabled.
We can enable it by setting the following property in apoc.conf
:
apoc.export.file.enabled=true
If we try to use any of the export procedures without having first set this property, we’ll get the following error message:
Failed to invoke procedure: Caused by: java.lang.RuntimeException: Export to files not enabled, please set apoc.export.file.enabled=true in your neo4j.conf |
Export files are written to the import
directory, which is defined by the dbms.directories.import
property.
This means that any file path that we provide is relative to this directory.
If we try to write to an absolute path, such as /tmp/filename
, we’ll get an error message similar to the following one:
Failed to invoke procedure: Caused by: java.io.FileNotFoundException: /path/to/neo4j/import/tmp/fileName (No such file or directory) |
We can enable writing to anywhere on the file system by setting the following property in apoc.conf
:
apoc.import.file.use_neo4j_config=false
Neo4j will now be able to write anywhere on the file system, so be sure that this is your intention before setting this property. |
Exporting to S3
By default exporting to S3 is disabled.
We can enable it by setting the following property in apoc.conf
:
apoc.export.file.enabled=true
If we try to use any of the export procedures without having first set this property, we’ll get the following error message:
Failed to invoke procedure: Caused by: java.lang.RuntimeException: Export to files not enabled, please set apoc.export.file.enabled=true in your neo4j.conf |
Using S3 protocol
When using the S3 protocol we need to download and copy the following jars into the plugins directory:
-
aws-java-sdk-core-1.11.250.jar (https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-core/1.11.250)
-
aws-java-sdk-s3-1.11.250.jar (https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3/1.11.250)
-
httpclient-4.4.8.jar (https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient/4.5.4)
-
httpcore-4.5.4.jar (https://mvnrepository.com/artifact/org.apache.httpcomponents/httpcore/4.4.8)
-
joda-time-2.9.9.jar (https://mvnrepository.com/artifact/joda-time/joda-time/2.9.9)
Once those files have been copied we’ll need to restart the database.
Exporting to S3 can be done by simply replacing the file output with an S3 endpoint. The S3 URL must be in the following format:
-
s3://accessKey:secretKey[:sessionToken]@endpoint:port/bucket/key
(where the sessionToken is optional) or -
s3://endpoint:port/bucket/key?accessKey=accessKey&secretKey=secretKey[&sessionToken=sessionToken]
(where the sessionToken is optional) or -
s3://endpoint:port/bucket/key
if the accessKey, secretKey, and the optional sessionToken are provided in the environment variables
Memory Requirements
To support large uploads, the S3 uploading utility may use up to 2.25 GB of memory at a time. The actual usage will depend on the size of the upload, but will use a maximum of 2.25 GB.
Exporting a stream
If we don’t want to export to a file, we can stream results back in the data
column instead by passing a file name of null
and providing the stream:true
config.
Examples
This section includes examples showing how to use the export to Cypher procedures. These examples are based on a movies dataset, which can be imported by running the following Cypher query:
CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961})
CREATE (Hugo:Person {name:'Hugo Weaving', born:1960})
CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967})
CREATE (LanaW:Person {name:'Lana Wachowski', born:1965})
CREATE (JoelS:Person {name:'Joel Silver', born:1952})
CREATE
(Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix),
(Carrie)-[:ACTED_IN {roles:['Trinity']}]->(TheMatrix),
(Laurence)-[:ACTED_IN {roles:['Morpheus']}]->(TheMatrix),
(Hugo)-[:ACTED_IN {roles:['Agent Smith']}]->(TheMatrix),
(LillyW)-[:DIRECTED]->(TheMatrix),
(LanaW)-[:DIRECTED]->(TheMatrix),
(JoelS)-[:PRODUCED]->(TheMatrix);
The Neo4j Browser visualization below shows the imported graph:
Export whole database to GraphML
The apoc.export.graphml.all
procedure exports the whole database to a GraphML file or as a stream.
movies.graphml
CALL apoc.export.graphml.all("movies.graphml", {})
file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data |
---|---|---|---|---|---|---|---|---|---|---|---|
"movies.graphml" |
"database: nodes(8), rels(7)" |
"graphml" |
8 |
7 |
21 |
4 |
15 |
-1 |
0 |
TRUE |
NULL |
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="born" for="node" attr.name="born"/>
<key id="name" for="node" attr.name="name"/>
<key id="tagline" for="node" attr.name="tagline"/>
<key id="label" for="node" attr.name="label"/>
<key id="title" for="node" attr.name="title"/>
<key id="released" for="node" attr.name="released"/>
<key id="roles" for="edge" attr.name="roles"/>
<key id="label" for="edge" attr.name="label"/>
<graph id="G" edgedefault="directed">
<node id="n188" labels=":Movie"><data key="labels">:Movie</data><data key="title">The Matrix</data><data key="tagline">Welcome to the Real World</data><data key="released">1999</data></node>
<node id="n189" labels=":Person"><data key="labels">:Person</data><data key="born">1964</data><data key="name">Keanu Reeves</data></node>
<node id="n190" labels=":Person"><data key="labels">:Person</data><data key="born">1967</data><data key="name">Carrie-Anne Moss</data></node>
<node id="n191" labels=":Person"><data key="labels">:Person</data><data key="born">1961</data><data key="name">Laurence Fishburne</data></node>
<node id="n192" labels=":Person"><data key="labels">:Person</data><data key="born">1960</data><data key="name">Hugo Weaving</data></node>
<node id="n193" labels=":Person"><data key="labels">:Person</data><data key="born">1967</data><data key="name">Lilly Wachowski</data></node>
<node id="n194" labels=":Person"><data key="labels">:Person</data><data key="born">1965</data><data key="name">Lana Wachowski</data></node>
<node id="n195" labels=":Person"><data key="labels">:Person</data><data key="born">1952</data><data key="name">Joel Silver</data></node>
<edge id="e267" source="n189" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Neo"]</data></edge>
<edge id="e268" source="n190" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Trinity"]</data></edge>
<edge id="e269" source="n191" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Morpheus"]</data></edge>
<edge id="e270" source="n192" target="n188" label="ACTED_IN"><data key="label">ACTED_IN</data><data key="roles">["Agent Smith"]</data></edge>
<edge id="e271" source="n193" target="n188" label="DIRECTED"><data key="label">DIRECTED</data></edge>
<edge id="e272" source="n194" target="n188" label="DIRECTED"><data key="label">DIRECTED</data></edge>
<edge id="e273" source="n195" target="n188" label="PRODUCED"><data key="label">PRODUCED</data></edge>
</graph>
</graphml>
data
columnCALL apoc.export.graphml.all(null, {stream:true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data;
file | nodes | relationships | properties | data |
---|---|---|---|---|
|
|
|
|
|
Export specified nodes and relationships to GraphML
The apoc.export.graphml.data
procedure exports the specified nodes and relationships to a CSV file or as a stream.
:Person
label with a name
property that starts with L
to the file movies-l.csv
MATCH (person:Person)
WHERE person.name STARTS WITH "L"
WITH collect(person) AS people
CALL apoc.export.graphml.data(people, [], "movies-l.graphml", {})
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data |
---|---|---|---|---|---|---|---|---|---|---|---|
"movies-l.csv" |
"data: nodes(3), rels(0)" |
"csv" |
3 |
0 |
6 |
2 |
3 |
20000 |
1 |
TRUE |
NULL |
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="born" for="node" attr.name="born"/>
<key id="name" for="node" attr.name="name"/>
<key id="label" for="node" attr.name="label"/>
<graph id="G" edgedefault="directed">
<node id="n191" labels=":Person"><data key="labels">:Person</data><data key="born">1961</data><data key="name">Laurence Fishburne</data></node>
<node id="n193" labels=":Person"><data key="labels">:Person</data><data key="born">1967</data><data key="name">Lilly Wachowski</data></node>
<node id="n194" labels=":Person"><data key="labels">:Person</data><data key="born">1965</data><data key="name">Lana Wachowski</data></node>
</graph>
</graphml>
ACTED_IN
relationships and the nodes with Person
and Movie
labels on either side of that relationship in the data
columnMATCH (person:Person)-[actedIn:ACTED_IN]->(movie:Movie)
WITH collect(DISTINCT person) AS people, collect(DISTINCT movie) AS movies, collect(actedIn) AS actedInRels
CALL apoc.export.graphml.data(people + movies, actedInRels, null, {stream: true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data;
file | nodes | relationships | properties | data |
---|---|---|---|---|
|
|
|
|
|
Export virtual graph to GraphML
The apoc.export.graphml.graph
procedure exports a virtual graph to a CSV file or as a stream.
The examples in this section are based on a virtual graph that contains all PRODUCED
relationships and the nodes either side of that relationship.
The query below creates a virtual graph and stores it in memory with the name producers.cached
using Static Value Storage.
MATCH path = (:Person)-[produced:PRODUCED]->(:Movie)
WITH collect(path) AS paths
CALL apoc.graph.fromPaths(paths, "producers", {})
YIELD graph AS g
CALL apoc.static.set("producers.cached", g)
YIELD value
RETURN value, g
movies-producers.csv
CALL apoc.export.graphml.graph(apoc.static.get("producers.cached"), "movies-producers.graphml", {});
file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data |
---|---|---|---|---|---|---|---|---|---|---|---|
"movies-producers.graphml" |
"graph: nodes(2), rels(1)" |
"csv" |
2 |
1 |
5 |
2 |
3 |
20000 |
1 |
TRUE |
NULL |
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="born" for="node" attr.name="born"/>
<key id="name" for="node" attr.name="name"/>
<key id="tagline" for="node" attr.name="tagline"/>
<key id="label" for="node" attr.name="label"/>
<key id="title" for="node" attr.name="title"/>
<key id="released" for="node" attr.name="released"/>
<key id="label" for="edge" attr.name="label"/>
<graph id="G" edgedefault="directed">
<node id="n195" labels=":Person"><data key="labels">:Person</data><data key="born">1952</data><data key="name">Joel Silver</data></node>
<node id="n188" labels=":Movie"><data key="labels">:Movie</data><data key="title">The Matrix</data><data key="tagline">Welcome to the Real World</data><data key="released">1999</data></node>
<edge id="e273" source="n195" target="n188" label="PRODUCED"><data key="label">PRODUCED</data></edge>
</graph>
</graphml>
data
columnCALL apoc.export.graphml.graph(apoc.static.get("producers.cached"), null, {stream: true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data;
file | nodes | relationships | properties | data |
---|---|---|---|---|
|
|
|
|
|
Export results of Cypher query to GraphML
The apoc.export.graphml.query
procedure exports the results of a Cypher query to a CSV file or as a stream.
DIRECTED
relationships and the nodes with Person
and Movie
labels on either side of that relationship to the file movies-directed.graphml
WITH "MATCH path = (person:Person)-[directed:DIRECTED]->(movie)
RETURN person, directed, movie" AS query
CALL apoc.export.graphml.query(query, "movies-directed.graphml", {})
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data;
file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data |
---|---|---|---|---|---|---|---|---|---|---|---|
"movies-directed.graphml" |
"statement: nodes(3), rels(2)" |
"graphml" |
3 |
2 |
7 |
2 |
5 |
-1 |
0 |
TRUE |
NULL |
The contents of movies-directed.csv
are shown below:
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="born" for="node" attr.name="born"/>
<key id="name" for="node" attr.name="name"/>
<key id="tagline" for="node" attr.name="tagline"/>
<key id="label" for="node" attr.name="label"/>
<key id="title" for="node" attr.name="title"/>
<key id="released" for="node" attr.name="released"/>
<key id="label" for="edge" attr.name="label"/>
<graph id="G" edgedefault="directed">
<node id="n188" labels=":Movie"><data key="labels">:Movie</data><data key="title">The Matrix</data><data key="tagline">Welcome to the Real World</data><data key="released">1999</data></node>
<node id="n193" labels=":Person"><data key="labels">:Person</data><data key="born">1967</data><data key="name">Lilly Wachowski</data></node>
<node id="n194" labels=":Person"><data key="labels">:Person</data><data key="born">1965</data><data key="name">Lana Wachowski</data></node>
<edge id="e271" source="n193" target="n188" label="DIRECTED"><data key="label">DIRECTED</data></edge>
<edge id="e272" source="n194" target="n188" label="DIRECTED"><data key="label">DIRECTED</data></edge>
</graph>
</graphml>
DIRECTED
relationships and the nodes with Person
and Movie
labels on either side of that relationshipWITH "MATCH path = (person:Person)-[directed:DIRECTED]->(movie)
RETURN person, directed, movie" AS query
CALL apoc.export.graphml.query(query, null, {stream: true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data;
file | nodes | relationships | properties | data |
---|---|---|---|---|
|
|
|
|
|