Loading Data from Web-APIs
Supported protocols are file
, http
, https
, s3
, gs
, hdfs
with redirect allowed.
If no procedure is provided, this procedure will try to check whether the URL is actually a file.
As apoc.import.file.use_neo4j_config is enabled, the procedures check whether file system access is allowed and possibly constrained to a specific directory by
reading the two configuration parameters dbms.security.allow_csv_import_from_file_urls and dbms.directories.import respectively.
If you want to remove these constraints please set apoc.import.file.use_neo4j_config=false
|
|
load from JSON URL (e.g. web-api) to import JSON as stream of values if the JSON was an array or a single value if it was a map |
|
load from XML URL (e.g. web-api) to import XML as single nested map with attributes and |
|
load from XML URL (e.g. web-api) to import XML as single nested map with attributes and |
|
load CSV fom URL as stream of values |
|
load XLS fom URL as stream of values |
Load Single File From Compressed File (zip/tar/tar.gz/tgz)
When loading data from compressed files, we need to put the !
character before the file name.
For example:
apoc.load.csv("pathToFile!csv/fileName.csv.tgz")
apoc.load.json("https://github.com/neo4j-contrib/neo4j-apoc-procedures/tree/3.4/src/test/resources/testload.tgz?raw=true!person.json");
Using S3 protocol
When using the S3 protocol we need to download and copy the following jars into the plugins directory:
-
aws-java-sdk-core-1.11.250.jar (https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-core/1.11.250)
-
aws-java-sdk-s3-1.11.250.jar (https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3/1.11.250)
-
httpclient-4.4.8.jar (https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient/4.5.4)
-
httpcore-4.5.4.jar (https://mvnrepository.com/artifact/org.apache.httpcomponents/httpcore/4.4.8)
-
joda-time-2.9.9.jar (https://mvnrepository.com/artifact/joda-time/joda-time/2.9.9)
Once those files have been copied we’ll need to restart the database.
The S3 URL must be in the following format:
-
s3://accessKey:secretKey[:sessionToken]@endpoint:port/bucket/key
(sessionToken is optional) or -
s3://endpoint:port/bucket/key?accessKey=accessKey&secretKey=secretKey[&sessionToken=sessionToken]
(sessionToken is optional) or -
s3://endpoint:port/bucket/key
if the accessKey, secretKey, and the optional sessionToken are provided in the environment variables
Using Google Cloud Storage
In order to use Google Cloud Storage you need that the following jars are included into the plugin dir:
-
api-common-1.8.1.jar
-
failureaccess-1.0.1.jar
-
gax-1.48.1.jar
-
gax-httpjson-0.65.1.jar
-
google-api-client-1.30.2.jar
-
google-api-services-storage-v1-rev20190624-1.30.1.jar
-
google-auth-library-credentials-0.17.1.jar
-
google-auth-library-oauth2-http-0.17.1.jar
-
google-cloud-core-1.90.0.jar
-
google-cloud-core-http-1.90.0.jar
-
google-cloud-storage-1.90.0.jar
-
google-http-client-1.31.0.jar
-
google-http-client-appengine-1.31.0.jar
-
google-http-client-jackson2-1.31.0.jar
-
google-oauth-client-1.30.1.jar
-
grpc-context-1.19.0.jar
-
guava-28.0-android.jar
-
opencensus-api-0.21.0.jar
-
opencensus-contrib-http-util-0.21.0.jar
-
proto-google-common-protos-1.16.0.jar
-
proto-google-iam-v1-0.12.0.jar
-
protobuf-java-3.9.1.jar
-
protobuf-java-util-3.9.1.jar
-
threetenbp-1.3.3.jar
But we prepared an uber-package in order simplify the process, you can download it from here and place it into the plugin directory
You can use Google Cloud storage via the following url format:
gs://<bucket_name>/<file_path>
moreover you can also define the authorization type:
-
NONE
: for public buckets (it’s the default behaviour so you don’t need to specify this) -
SERVICE
: with Service authentication by setting the env variable GOOGLE_APPLICATION_CREDENTIALS as described here: https://cloopenud.google.com/storage/docs/reference/libraries#client-libraries-install-java
Ex:
gs://andrea-bucket-1/test-privato.csv?authenticationType=SERVICE