Text Functions
Cypher has some basic functions to work with text like
-
split(string, delim)
-
toLower
andtoUpper
-
concatenation with
+
-
predicates like
CONTAINS, STARTS WITH, ENDS WITH
and regular expression matches via=~
.
But a lot of useful functions for string manipulation, comparison, and filtering are missing. APOC adds these functions.
Overview Text Functions
|
find the first occurence of the lookup string in the text, from inclusive, to exclusive,, -1 if not found, null if text is null. |
|
finds all occurences of the lookup string in the text, return list, from inclusive, to exclusive, empty list if not found, null if text is null. |
|
replace each substring of the given string that matches the given regular expression with the given replacement. |
|
returns an array containing a nested array for each match. The inner array contains all match groups. |
|
join the given strings with the given delimiter. |
|
multiply the given string with the given count |
|
sprintf format the string with the params given, and optional param language (default value is 'en'). |
|
left pad the string to the given width |
|
right pad the string to the given width |
|
returns a random string to the specified length |
|
capitalise the first letter of the word |
|
capitalise the first letter of every word in the text |
|
decapitalize the first letter of the word |
|
decapitalize the first letter of all words |
|
Swap the case of a string |
|
Convert a string to camelCase |
|
Convert a string to UpperCamelCase |
|
Convert a string to snake-case |
|
Convert a string to UPPER_CASE |
|
Returns the decimal value of the character at the given index |
|
Returns the unicode character of the given codepoint |
|
Returns the hex value string of the character at the given index |
|
Returns the hex value string of the given value |
|
return size of text in bytes |
|
return bytes of the text |
|
tries it’s best to convert the value to a cypher-property-string |
|
Encode a string with Base64 |
|
Decode Base64 encoded string |
|
Encode a url with Base64 |
|
Decode Base64 encoded url |
The replace
, split
and regexGroups
functions work with regular expressions.
Data Extraction
|
turn URL into map structure |
|
extract the personal name, user and domain as a map (needs javax.mail jar) |
|
deprecated returns domain part of the value |
Text Similarity Functions
|
compare the given strings with the Levenshtein distance algorithm |
|
compare the given strings with the Levenshtein distance algorithm |
|
calculate the similarity (a value within 0 and 1) between two texts based on Levenshtein distance. |
|
compare the given strings with the Hamming distance algorithm |
|
compare the given strings with the Jaro-Winkler distance algorithm |
|
compare the given strings with the Sørensen–Dice coefficient formula, assuming an English locale |
|
compare the given strings with the Sørensen–Dice coefficient formula, with the provided IETF language tag |
|
check if 2 words can be matched in a fuzzy way (LevenShtein). Depending on the length of the String it will allow more characters that needs to be edited to match the second String (distance: length < 3 then 0, length < 5 then 1, else 2). |
Compare the strings with the Levenshtein distance
Compare the given strings with the StringUtils.distance(text1, text2)
method (Levenshtein).
RETURN apoc.text.distance("Levenshtein", "Levenstein") // 1
Compare the given strings with the Sørensen–Dice coefficient formula.
RETURN apoc.text.sorensenDiceSimilarity("belly", "jolly") // 0.5
RETURN apoc.text.sorensenDiceSimilarityWithLanguage("halım", "halim", "tr-TR") // 0.5
Check if 2 words can be matched in a fuzzy way with fuzzyMatch
Depending on the length of the String (distance: length < 3 then 0, length < 5 then 1, else 2) it will allow more characters that needs to be edited to match the second String (LevenShtein distance).
RETURN apoc.text.fuzzyMatch("The", "the") // true
Phonetic Comparison Functions
The phonetic text (soundex) functions allow you to compute the soundex encoding of a given string. There is also a procedure to compare how similar two strings sound under the soundex algorithm. All soundex procedures by default assume the used language is US English.
|
Compute the US_ENGLISH phonetic soundex encoding of all words of the text value which can be a single string or a list of strings |
|
Compute the Double Metaphone phonetic encoding of all words of the text value which can be a single string or a list of strings |
|
strip the given string of everything except alpha numeric characters and convert it to lower case. |
|
compare the given strings stripped of everything except alpha numeric characters converted to lower case. |
|
Compute the US_ENGLISH soundex character difference between two given strings |
// will return 'H436'
RETURN apoc.text.phonetic('Hello, dear User!')
// will return '4' (very similar)
RETURN apoc.text.phoneticDelta('Hello Mr Rabbit', 'Hello Mr Ribbit')
Formatting Text
Format the string with the params given, and optional param language.
RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd',42,3.14,true]) AS value // abcd 42 3.1 true
RETURN apoc.text.format('ab%s %d %.1f %s%n',['cd',42,3.14,true],'it') AS value // abcd 42 3,1 true
String Search
The indexOf
function, provides the fist occurrence of the given lookup
string within the text
, or -1 if not found.
It can optionally take from
(inclusive) and to
(exclusive) parameters.
RETURN apoc.text.indexOf('Hello World!', 'World') // 6
The indexesOf
function, provides all occurrences of the given lookup string within the text, or empty list if not found.
It can optionally take from
(inclusive) and to
(exclusive) parameters.
RETURN apoc.text.indexesOf('Hello World!', 'o',2,9) // [4,7]
If you want to get a substring starting from your index match, you can use this
World!
WITH 'Hello World!' as text, length(text) as len
WITH text, len, apoc.text.indexOf(text, 'World',3) as index
RETURN substring(text, case index when -1 then len-1 else index end, len);
Regular Expressions
RETURN apoc.text.replace('Hello World!', '[^a-zA-Z]', '')
RETURN apoc.text.regexGroups('abc <link xxx1>yyy1</link> def <link xxx2>yyy2</link>','<link (\\w+)>(\\w+)</link>') AS result
// [["<link xxx1>yyy1</link>", "xxx1", "yyy1"], ["<link xxx2>yyy2</link>", "xxx2", "yyy2"]]
Split and Join
RETURN apoc.text.split('Hello World', ' +')
RETURN apoc.text.join(['Hello', 'World'], ' ')
Data Cleaning
RETURN apoc.text.clean('Hello World!')
true
RETURN apoc.text.compareCleaned('Hello World!', '_hello-world_')
UNWIND ['Hello World!', 'hello worlds'] as text
RETURN apoc.text.filterCleanMatches(text, 'hello_world') as text
The clean functionality can be useful for cleaning up slightly dirty text data with inconsistent formatting for non-exact comparisons.
Cleaning will strip the string of all non-alphanumeric characters (including spaces) and convert it to lower case.
Case Change Functions
capitalize
RETURN apoc.text.capitalize("neo4j") // "Neo4j"
capitalizeAll
RETURN apoc.text.capitalizeAll("graph database") // "Graph Database"
decapitalize
RETURN apoc.text.decapitalize("Graph Database") // "graph Database"
decapitalizeAll
RETURN apoc.text.decapitalizeAll("Graph Databases") // "graph databases"
swapCase
RETURN apoc.text.swapCase("Neo4j") // nEO4J
camelCase
RETURN apoc.text.camelCase("FOO_BAR"); // "fooBar"
RETURN apoc.text.camelCase("Foo bar"); // "fooBar"
RETURN apoc.text.camelCase("Foo22 bar"); // "foo22Bar"
RETURN apoc.text.camelCase("foo-bar"); // "fooBar"
RETURN apoc.text.camelCase("Foobar"); // "foobar"
RETURN apoc.text.camelCase("Foo$$Bar"); // "fooBar"
upperCamelCase
RETURN apoc.text.upperCamelCase("FOO_BAR"); // "FooBar"
RETURN apoc.text.upperCamelCase("Foo bar"); // "FooBar"
RETURN apoc.text.upperCamelCase("Foo22 bar"); // "Foo22Bar"
RETURN apoc.text.upperCamelCase("foo-bar"); // "FooBar"
RETURN apoc.text.upperCamelCase("Foobar"); // "Foobar"
RETURN apoc.text.upperCamelCase("Foo$$Bar"); // "FooBar"
snakeCase
RETURN apoc.text.snakeCase("test Snake Case"); // "test-snake-case"
RETURN apoc.text.snakeCase("FOO_BAR"); // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar"); // "foo-bar"
RETURN apoc.text.snakeCase("fooBar"); // "foo-bar"
RETURN apoc.text.snakeCase("foo-bar"); // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar"); // "foo-bar"
RETURN apoc.text.snakeCase("Foo bar"); // "foo-bar"
toUpperCase
RETURN apoc.text.toUpperCase("test upper case"); // "TEST_UPPER_CASE"
RETURN apoc.text.toUpperCase("FooBar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("fooBar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo-bar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo--bar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo$$bar"); // "FOO_BAR"
RETURN apoc.text.toUpperCase("foo 22 bar"); // "FOO_22_BAR"
Base64 De- and Encoding
Encode or decode a string in base64 or base64Url
RETURN apoc.text.base64Encode("neo4j") // bmVvNGo=
RETURN apoc.text.base64Decode("bmVvNGo=") // neo4j
RETURN apoc.text.base64UrlEncode("http://neo4j.com/?test=test") // aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0
RETURN apoc.text.base64UrlDecode("aHR0cDovL25lbzRqLmNvbS8_dGVzdD10ZXN0") // http://neo4j.com/?test=test
Random String
You can generate a random string to a specified length by calling apoc.text.random
with a length parameter and optional string of valid characters.
The valid
parameter will accept the following regex patterns, alternatively you can provide a string of letters and/or characters.
|
Description |
|
A-Z in uppercase |
|
A-Z in lowercase |
|
Numbers 0-9 inclusive |
.
and $
characters.RETURN apoc.text.random(10, "A-Z0-9.$")
Extract Domain
The User Function apoc.data.domain
will take a url or email address and try to determine the domain name.
This can be useful to make easier correlations and equality tests between differently formatted email addresses, and between urls to the same domains but specifying different locations.
WITH 'foo@bar.com' AS email
RETURN apoc.data.domain(email) // will return 'bar.com'
WITH 'http://www.example.com/all-the-things' AS url
RETURN apoc.data.domain(url) // will return 'www.example.com'