String operators
String operators are used to perform operations on STRING values.
Cypher® contains the following string operators:
-
Prefix:
STARTS WITH(case sensitive) -
Suffix:
ENDS WITH(case sensitive) -
substring:
CONTAINS(case sensitive) -
Regular expression:
=~ -
IS NORMALIZEDIntroduced in 5.17 -
IS NOT NORMALIZEDIntroduced in 5.17
These operators perform case-sensitive matching.
Attempting to use them on values which are not STRING values will return null.
Example graph
The following graph is used for the examples below:
To recreate the graph, run the following query in an empty Neo4j database:
CREATE (alice:Person {name:'Alice', age: 65, role: 'Project manager', email: 'alice@company.com'}),
(cecil:Person {name: 'Cecil', age: 25, role: 'Software developer', email: 'cecil@private.se'}),
(cecilia:Person {name: 'Cecilia', age: 31, role: 'Software developer'}),
(charlie:Person {name: 'Charlie', age: 61, role: 'Security engineer'}),
(daniel:Person {name: 'Daniel', age: 39, role: 'Director', email: 'daniel@company.com'}),
(eskil:Person {name: 'Eskil', age: 39, role: 'CEO', email: 'eskil@company.com'})
Examples
STARTS WITH operatorMATCH (n:Person)
WHERE n.name STARTS WITH 'C'
RETURN n.name AS name
| name |
|---|
|
|
|
Rows: 3 |
ENDS WITH operatorMATCH (n:Person)
WHERE n.role ENDS WITH 'developer'
RETURN n.name AS name, n.role AS role
| name | role |
|---|---|
|
|
|
|
Rows: 2 |
|
CONTAINS operatorMATCH (n:Person)
WHERE n.role CONTAINS 'eng'
RETURN n.name AS name, n.role AS role
| name | role |
|---|---|
|
|
Rows: 1 |
|
Regular expressions
Cypher supports filtering using regular expressions.
The regular expression syntax is inherited from the Java regular expressions.
This includes support for flags that change how STRING values are matched, including the case-insensitive (?i), multiline (?m), and dotall (?s) flags.
Flags are given at the beginning of the regular expression.
=~)MATCH (n:Person)
WHERE n.email =~ '.*@company.com'
RETURN n.name AS name, n.email AS email
| name | |
|---|---|
|
|
|
|
|
|
Rows: 3 |
|
By pre-pending a regular expression with the flag (?i), the whole expression becomes case-insensitive:
(?i)MATCH (n:Person)
WHERE n.name =~ '(?i)CEC.*'
RETURN n.name
The names of both Cecil and Cecilia are returned because their name starts with 'CEC' regardless of casing:
| name |
|---|
|
|
Rows: 2 |
Escaping in regular expressions
Characters such as . or * have special meaning in a regular expression.
To use these as ordinary characters without special meaning, escape them.
MATCH (n:Person)
WHERE n.email =~ '.*\\.se'
RETURN n.name AS name, n.email AS email
Cecil is returned because his email ends with '.se':
| name | |
|---|---|
|
|
Rows: 1 |
|
Note that the regular expression constructs in Java regular expressions are applied only after resolving the escaped character sequences in the given string literal. It is sometimes necessary to add additional backslashes to express regular expression constructs. This list clarifies the combination of these two definitions, containing the original escape sequence and the resulting character in the regular expression:
| String literal sequence | Resulting Regex sequence | Regex match |
|---|---|---|
|
Tab |
Tab |
|
|
Tab |
|
Backspace |
Backspace |
|
|
Word boundary |
|
Newline |
NewLine |
|
|
Newline |
|
Carriage return |
Carriage return |
|
|
Carriage return |
|
Form feed |
Form feed |
|
|
Form feed |
|
Single quote |
Single quote |
|
Double quote |
Double quote |
|
Backslash |
Backslash |
|
|
Backslash |
|
Unicode UTF-16 code point (4 hex digits must follow the |
Unicode UTF-16 code point (4 hex digits must follow the |
|
|
Unicode UTF-16 code point (4 hex digits must follow the |
|
Using regular expressions with unsanitized user input makes you vulnerable to Cypher injection. Consider using parameters instead. |
String normalization operatorsIntroduced in 5.17
The IS NORMALIZED operator is used to check whether the given STRING is in the NFC Unicode normalization form:
|
Unicode normalization is a process that transforms different representations of the same string into a standardized form. For more information, see the documentation for Unicode normalization forms. |
IS NORMALIZED operatorRETURN 'the \u212B char' IS NORMALIZED AS normalized
| normalized |
|---|
|
|
Because the given STRING contains a non-normalized Unicode character (\u212B), false is returned.
To normalize a STRING, use the normalize() function.
Note that the IS NORMALIZED operator returns null when used on a non-STRING value.
For example, RETURN 1 IS NORMALIZED returns null.
The IS NOT NORMALIZED operator is used to check whether the given STRING is not in the NFC Unicode normalization form:
IS NOT NORMALIZEDRETURN 'the \u212B char' IS NOT NORMALIZED AS notNormalized
| notNormalized |
|---|
|
|
Because the given STRING contains a non-normalized Unicode character (\u212B), and is not normalized, true is returned.
Note that the IS NOT NORMALIZED operator returns null when used on a non-STRING value.
For example, RETURN 1 IS NOT NORMALIZED returns null.
Using IS NORMALIZED with a specified normalization type
It is possible to define which Unicode normalization type is used (the default is NFC).
The available normalization types are:
-
NFC -
NFD -
NFKC -
NFKD
WITH 'the \u00E4 char' as myString
RETURN myString IS NFC NORMALIZED AS nfcNormalized,
myString IS NFD NORMALIZED AS nfdNormalized
The given STRING contains the Unicode character: \u00E4, which is considered normalized in NFC form, but not in NFD form.
| nfcNormalized | nfdNormalized |
|---|---|
|
|
|
|
It is also possible to specify the normalization form when using the negated normalization operator.
For example, RETURN "string" IS NOT NFD NORMALIZED.