APOC 1.1.0 Release: Awesome Procedures on Cypher

Head of Product Innovation & Developer Strategy, Neo4j

July 19, 2016

8 min read

Learn what's new in the 1.1.0 release of in the Awesome Procedures on Cypher (a.k.a. "APOC") library

I’m super thrilled to announce last week’s 1.1.0 release of the Awesome Procedures on Cypher (APOC). A lot of new and cool stuff has been added and some issues have been fixed.

Thanks to everyone who contributed to the procedure collection, especially Stefan Armbruster, Kees Vegter, Florent Biville, Sascha Peukert, Craig Taverner, Chris Willemsen and many more.

And of course my thanks go to everyone who tried APOC and gave feedback, so that we could improve the library.

If you are new to Neo4j’s procedures and APOC, please start by reading the first article of my introductory blog series.

The APOC library was first released as version 1.0 in conjunction with the Neo4j 3.0 release at the end of April with around 90 procedures and was mentioned in Emil’s Neo4j 3.0 release keynote.

In early May we had a 1.0.1 release with a number of new procedures especially around free text search, graph algorithms and geocoding, which was also used by the journalists of the ICIJ for their downloadable Neo4j database of the Panama Papers.

And now, two months later, we’ve reached 200 procedures that are provided by APOC.
These cover a wide range of capabilities, some of which I want to discuss today. In each section of this post I’ll only list a small subset of the new procedures that were added.

If you want get more detailed information, please check out the documentation with examples.

Notable Changes

As the 100 new procedures represent quite a change, I want to highlight the aspects of APOC that got extended or documented with more practical examples.

Metadata

Besides the apoc.meta.graph functionality that was there from the start, additional procedures to return and sample graph metadata have been added. Some, like apoc.meta.stats, access the transactional database statistics to quickly return information about label and relationship-type counts.

There are now also procedures to return and check of types of values and properties.

CALL apoc.meta.subGraph({config})

examines a sample sub graph to create the meta-graph, default sampleSize is 100
config is: {labels:[labels],rels:[rel-types],sample:sample}

CALL apoc.meta.stats YIELD labelCount, relTypeCount, propertyKeyCount, nodeCount, relCount, labels, relTypes, stats

returns the information stored in the transactional database statistics

CALL apoc.meta.type(value)

type name of a value (INTEGER,FLOAT,STRING,BOOLEAN,RELATIONSHIP,NODE,PATH,NULL,UNKNOWN,MAP,LIST)

CALL apoc.meta.isType(value, type)

returns a row if type name matches none if not

Data Import / Export

The first export procedures output the provided graph data as Cypher statements in the format that neo4j-shell understands and that can also be read with apoc.cypher.runFile.

Indexes and constraints as well as batched sets of CREATE statements for nodes and relationships will be written to the provided file-path.

apoc.export.cypherAll(file, config)

exports whole database incl. indexes as cypher statements to the provided file

apoc.export.cypherData(nodes, rels, file, config)

exports given nodes and relationships incl. indexes as cypher statements to the provided file

apoc.export.cypherQuery(query, file, config)

exports nodes and relationships from the cypher statement incl. indexes as cypher statements to the provided file

Data Integration with Cassandra, MongoDB and RDBMS

Making integration with other databases easier is a big aspiration of APOC.

Being able to directly read and write data from these sources using Cypher statements is very powerful. As Cypher is an expressive data processing language that allows a variety of data filtering, cleansing and conversions and preparing of the original data.

APOC integrates with relational (RDBMS) and other tabular databases like Cassandra using JDBC. Each row returned from a table or statement is provided as a map value to Cypher to be processed.

And for ElasticSearch the same is achieved by using the underlying JSON-HTTP functionality. For MongoDB, we support connecting via their official Java driver.

To avoid listing full database connection strings with usernames and passwords in your procedures, you can configure those in $NEO4J_HOME/conf/neo4j.conf using the apoc.{jdbc,mongodb,es}.<name>.url config parameters, and just pass name as the first parameter in the procedure call.

Here is a part of the Cassandra example from the data integration section of the docs using the Cassandra JDBC Wrapper.

Entry in neo4j.conf

apoc.jdbc.cassandra_songs.url=jdbc:cassandra://localhost:9042/playlist

CALL apoc.load.jdbc('cassandra_songs', 'track_by_artist') YIELD row
MERGE (a:Artist {name: row.artist})
MERGE (g:Genre {name: row.genre})
CREATE (t:Track {id: toString(row.track_id), title: row.track, length: row.track_length_in_seconds})
CREATE (a)-[:PERFORMED]->(t)
CREATE (t)-[:GENRE]->(g);

// Added 63213 labels, created 63213 nodes, set 182413 properties, created 119200 relationships.

For each data source that you want to connect to, just provide the relevant driver in the $NEO4J_HOME/plugins directory as well. It will then automatically picked up by APOC.

Even if you just visualize which kind of graphs are hidden in that data, there is already a big benefit of being able to do that without leaving the comfort of Cypher and the Neo4j Browser.

To render virtual nodes, relationships and graphs, you can use the appropriate procedures from the apoc.create.* package.

Controlled Cypher Execution

While individual Cypher statements can be run easily, more complex executions – like large data updates, background executions or parallel executions – are not yet possible out of the box.

These kind of abilities are added by the apoc.periodic. and the apoc.cypher. packages. Especially apoc.peridoc.iterate and apoc.periodic.commit are useful for batched updates.

Procedures like apoc.cypher.runMany allow execution of semicolon-separated statements and apoc.cypher.mapParallel allows parallel execution of partial or whole Cypher statements driven by a collection of values.

CALL apoc.cypher.runFile(file or url) yield row, result

runs each statement in the file, all semicolon separated – currently no schema operations

CALL apoc.cypher.runMany('cypher;nstatements;',{params})

runs each semicolon separated statement and returns summary – currently no schema operations

CALL apoc.cypher.mapParallel(fragment, params, list-to-parallelize) yield value

executes fragment in parallel batches with the list segments being assigned to _

CALL apoc.periodic.commit(statement, params)

repeats an batch update statement until it returns 0, this procedure is blocking

CALL apoc.periodic.countdown('name',statement,delay-in-seconds)

submit a repeatedly-called background statement until it returns 0

CALL apoc.periodic.iterate('statement returning items', 'statement per item', {batchSize:1000,parallel:true}) YIELD batches, total

run the second statement for each item returned by the first statement. Returns number of batches and total processed rows

Schema / Indexing

Besides the manual index update and query support that was already there in the APOC release 1.0, more manual index management operations have been added.

CALL apoc.index.list() - YIELD type,name,config

lists all manual indexes

CALL apoc.index.remove('name') YIELD type,name,config

removes manual indexes

CALL apoc.index.forNodes('name',{config}) YIELD type,name,config

gets or creates manual node index

CALL apoc.index.forRelationships('name',{config}) YIELD type,name,config

gets or creates manual relationship index

There is pretty neat support for free text search that is also detailed with examples in the documentation. It allows you, with apoc.index.addAllNodes, to add a number of properties of nodes with certain labels to a free text search index which is then easily searchable with apoc.index.search.

apoc.index.addAllNodes('index-name',{label1:['prop1',…],…})

add all nodes to this full text index with the given proeprties, additionally populates a ‘search’ index

apoc.index.search('index-name', 'query') YIELD node, weight

search for the first 100 nodes in the given full text index matching the given lucene query returned by relevance

Collection & Map Functions

While Cypher has already great support for handling maps and collections, there are always some capabilities that are not possible yet. That’s where APOC’s map and collection functions come in. You can now dynamically create, clean and update maps.

apoc.map.fromPairs([[key,value],[key2,value2],…])

creates map from list with key-value pairs

apoc.map.fromLists([keys],[values])

creates map from a keys and a values list

apoc.map.fromValues([key,value,key1,value1])

creates map from alternating keys and values in a list

apoc.map.setKey(map,key,value)

returns the map with the value for this key added or replaced

apoc.map.clean(map,[keys],[values]) yield value

removes the keys and values (e.g. null-placeholders) contained in those lists, good for data cleaning from CSV/JSON

There are means to convert and split collections to other shapes and much more.

apoc.coll.partition(list,batchSize)

partitions a list into sublists of batchSize

apoc.coll.zip([list1],[list2])

all values in a list

apoc.coll.pairs([list])

returns `[first,second],[second,third], …

apoc.coll.toSet([list])

returns a unique list backed by a set

apoc.coll.split(list,value)

splits collection on given values rows of lists, value itself will not be part of resulting lists

apoc.coll.indexOf(coll, value)

position of value in the list

You can UNION, SUBTRACT and INTERSECTION collections and much more.

apoc.coll.union(first, second)

creates the distinct union of the 2 lists

apoc.coll.intersection(first, second)

returns the unique intersection of the two lists

apoc.coll.disjunction(first, second)

returns the disjunct set of the two lists

Graph Representation

There are a number of operations on a graph that return a subgraph of nodes and relationships. With the apoc.graph.* operations you can create such a named graph representation from a number of sources.

apoc.graph.from(data,'name',{properties}) yield graph

creates a virtual graph object for later processing it tries its best to extract the graph information from the data you pass in

apoc.graph.fromPaths([paths],'name',{properties})

creates a virtual graph object for later processing

apoc.graph.fromDB('name',{properties})

creates a virtual graph object for later processing

apoc.graph.fromCypher('statement',{params},'name',{properties})

creates a virtual graph object for later processing

The idea is that on top of this graph representation other operations (like export or updates), but also graph algorithms, can be executed. The general structure of this representation is:

{
 name:"Graph name",
 nodes:[node1,node2],
 relationships: [rel1,rel2],
 properties:{key:"value1",key2:42}
}

Plans for the Future

Of course, it doesn’t stop here. As outlined in the readme, there are many ideas for future development of APOC.

One area to be expanded are graph algorithms and the quality and performance of their implementation. We also want to support import and export capabilities, for instance for graphml and binary formats.

Something that in the future should be more widely supported by APOC procedures is to work with a subgraph representation of a named set of nodes, relationships and properties.

Conclusion

There is a lot more to explore, just take a moment and have a look at the wide variety of procedures listed in the readme.

Going forward I want to achieve a more regular release cycle of APOC. Every two weeks there should be a new release so that everyone benefits from bug fixes and new features.

Now, please: