Loading Data from Web-APIs

Supported protocols are file, http, https, s3, gs, hdfs with redirect allowed.

If no procedure is provided, this procedure will try to check whether the URL is actually a file.

As apoc.import.file.use_neo4j_config is enabled, the procedures check whether file system access is allowed and possibly constrained to a specific directory by reading the two configuration parameters dbms.security.allow_csv_import_from_file_urls and dbms.directories.import respectively. If you want to remove these constraints please set apoc.import.file.use_neo4j_config=false

CALL apoc.load.json('http://example.com/map.json', [path], [config]) YIELD value as person CREATE (p:Person) SET p = person

load from JSON URL (e.g. web-api) to import JSON as stream of values if the JSON was an array or a single value if it was a map

CALL apoc.load.xml('http://example.com/test.xml', ['xPath'], [config]) YIELD value as doc CREATE (p:Person) SET p.name = doc.name

load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text and _children fields.

CALL apoc.load.xmlSimple('http://example.com/test.xml') YIELD value as doc CREATE (p:Person) SET p.name = doc.name

load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text fields and _<childtype> collections per child-element-type.

CALL apoc.load.csv('url',{sep:";"}) YIELD lineNo, list, strings, map, stringMap

load CSV fom URL as stream of values
config contains any of: {skip:1,limit:5,header:false,sep:'TAB',ignore:['aColumn'],arraySep:';',results:['map','list','strings','stringMap'], nullValues:[''],mapping:{years:{type:'int',arraySep:'-',array:false,name:'age',ignore:false,nullValues:['n.A.']}}

CALL apoc.load.xls('url','Sheet'/'Sheet!A2:B5',{config}) YIELD lineNo, list, map

load XLS fom URL as stream of values
config contains any of: {skip:1,limit:5,header:false,ignore:['aColumn'],arraySep:';'+ nullValues:[''],mapping:{years:{type:'int',arraySep:'-',array:false,name:'age',ignore:false,nullValues:['n.A.']}}

Load Single File From Compressed File (zip/tar/tar.gz/tgz)

When loading data from compressed files, we need to put the ! character before the file name. For example:

Loading a compressed CSV file

apoc.load.csv("pathToFile!csv/fileName.csv.tgz")

Loading a compressed JSON file

apoc.load.json("https://github.com/neo4j-contrib/neo4j-apoc-procedures/tree/3.4/src/test/resources/testload.tgz?raw=true!person.json");

Using S3 protocol

When using the S3 protocol we need to download and copy the following jars into the plugins directory:

aws-java-sdk-core-1.11.250.jar (https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-core/1.11.250)
aws-java-sdk-s3-1.11.250.jar (https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3/1.11.250)
httpclient-4.4.8.jar (https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient/4.5.4)
httpcore-4.5.4.jar (https://mvnrepository.com/artifact/org.apache.httpcomponents/httpcore/4.4.8)
joda-time-2.9.9.jar (https://mvnrepository.com/artifact/joda-time/joda-time/2.9.9)

Once those files have been copied we’ll need to restart the database.

The S3 URL must be in the following format:

s3://accessKey:secretKey[:sessionToken]@endpoint:port/bucket/key (sessionToken is optional) or
s3://endpoint:port/bucket/key?accessKey=accessKey&secretKey=secretKey[&sessionToken=sessionToken] (sessionToken is optional) or
s3://endpoint:port/bucket/key if the accessKey, secretKey, and the optional sessionToken are provided in the environment variables

Using Google Cloud Storage

In order to use Google Cloud Storage you need that the following jars are included into the plugin dir:

api-common-1.8.1.jar
failureaccess-1.0.1.jar
gax-1.48.1.jar
gax-httpjson-0.65.1.jar
google-api-client-1.30.2.jar
google-api-services-storage-v1-rev20190624-1.30.1.jar
google-auth-library-credentials-0.17.1.jar
google-auth-library-oauth2-http-0.17.1.jar
google-cloud-core-1.90.0.jar
google-cloud-core-http-1.90.0.jar
google-cloud-storage-1.90.0.jar
google-http-client-1.31.0.jar
google-http-client-appengine-1.31.0.jar
google-http-client-jackson2-1.31.0.jar
google-oauth-client-1.30.1.jar
grpc-context-1.19.0.jar
guava-28.0-android.jar
opencensus-api-0.21.0.jar
opencensus-contrib-http-util-0.21.0.jar
proto-google-common-protos-1.16.0.jar
proto-google-iam-v1-0.12.0.jar
protobuf-java-3.9.1.jar
protobuf-java-util-3.9.1.jar
threetenbp-1.3.3.jar

But we prepared an uber-package in order simplify the process, you can download it from here and place it into the plugin directory

You can use Google Cloud storage via the following url format:

gs://<bucket_name>/<file_path>

moreover you can also define the authorization type:

NONE: for public buckets (it’s the default behaviour so you don’t need to specify this)
SERVICE: with Service authentication by setting the env variable GOOGLE_APPLICATION_CREDENTIALS as described here: https://cloopenud.google.com/storage/docs/reference/libraries#client-libraries-install-java

Ex:

gs://andrea-bucket-1/test-privato.csv?authenticationType=SERVICE

failOnError

Adding the config parameter failOnError:false (by default true), will mean that in the case of an error the procedure will not fail, but just return zero rows.