Load Data from Web-APIs
Supported protocols are file
, http
, https
, s3
, gs
, hdfs
with redirect allowed.
If no protocol is provided, this procedure will try to check whether the URL is actually a file.
If apoc.import.file.use_neo4j_config is enabled, the procedures will check whether file system access is allowed and possibly constrained to a specific directory by
reading the two configuration parameters dbms.security.allow_csv_import_from_file_urls and server.directories.import respectively.
If you want to remove these constraints please set apoc.import.file.use_neo4j_config=false
|
Procedure | Description |
---|---|
|
Load JSON from URL |
|
Load XML from URL |
Adding failOnError:false
(by default true
) to the config map when using any of the procedures in the above table will make them not fail in case of an error.
The procedure will instead return zero rows. For example:
CALL apoc.load.json('http://example.com/test.json', null, {failOnError:false})
Load from Compressed File (zip/tar/tar.gz/tgz)
When loading a file that has been compressed, the compression algorithm has to be provided in the configuration options.
For example, in the following case, if xmlCompressed
was a .gzip
extension file, the configuration options {compression: 'GZIP'}
need to be supplied to the procedure call to load the root of the document /
into a Cypher map in memory:
CALL apoc.load.xml(xmlCompressed, '/', {compression: 'GZIP'})
For other valid compression configuration values, refer to the documentation about apoc.load.xml.
By default, the size of a decompressed file is limited to 200 times its compressed size.
That number can be changed by adjusting the configuration option apoc.max.decompression.ratio
in the apoc.conf
(it cannot be 0 as that would make decompression impossible).
If a negative number is given, there is no limit to how big a decompressed size can be.
This exposes the database to potential zip bomb attacks.
Trying to load an uncompressed file that exceeds the relative ratio with respect to the original compressed file will generate the following message:
The file dimension exceeded maximum size in bytes, 250000, which is 250 times the width of the original file. The InputStream has been blocked because the file could be a compression bomb attack.
Load Single File From Compressed File (zip/tar/tar.gz/tgz)
When loading data from compressed files, we need to put the !
character before the file name or path in the compressed file.
For example:
CALL apoc.load.json("https://github.com/neo4j/apoc/blob/5.26/core/src/test/resources/testload.tgz?raw=true!person.json");
Using S3, GCS or HDFS protocols
To use any of these protocols, additional extra dependency jars need to be downloaded and copied into the plugins directory <NEO4J_HOME>/plugins, respectively:
AWS dependency jar | APOC version |
---|---|
5.16 |
|
5.15 |
|
5.14 |
|
5.13 |
|
5.12 |
|
5.11 |
|
5.10 |
GCS dependency jar | APOC version |
---|---|
5.16 |
|
5.15 |
|
5.14 |
|
5.13 |
|
5.12 |
|
5.11 |
|
5.10 |
HDFS dependency jar | APOC version |
---|---|
5.16 |
|
5.15 |
|
5.14 |
|
5.13 |
|
5.12 |
|
5.11 |
|
5.10 |
These dependency jars are maintained by the APOC Extended library. This library is not supported by Neo4j. |
After copying the jars into the plugins directory, the database will need to be restarted.
Using S3 protocol
The S3 URL must be in the following format:
-
s3://accessKey:secretKey[:sessionToken]@endpoint:port/bucket/key
(where the sessionToken is optional) or -
s3://endpoint:port/bucket/key?accessKey=accessKey&secretKey=secretKey[&sessionToken=sessionToken]
(where the sessionToken is optional) or -
s3://endpoint:port/bucket/key
if the accessKey, secretKey, and the optional sessionToken are provided in the environment variables
Using Google Cloud Storage
Google Cloud Storage urls have the following shape:
gs://<bucket_name>/<file_path>
The authorization type can be specified by an additional authenticationType
query parameter:
-
NONE
: for public buckets (this is the default behavior if the parameter is not specified) -
GCP_ENVIRONMENT
: for passive authentication as a service account when Neo4j is running in the Google Cloud -
PRIVATE_KEY
: for using private keys generated for service accounts (requires settingGOOGLE_APPLICATION_CREDENTIALS
environment variable pointing to a private key JSON file as described by the official Google documentation.)
Example:
gs://bucket/test-file.csv?authenticationType=GCP_ENVIRONMENT