String operators
String operators are used to perform operations on STRING
values.
Cypher® contains the following string operators:
-
Prefix:
STARTS WITH
(case sensitive) -
Suffix:
ENDS WITH
(case sensitive) -
substring:
CONTAINS
(case sensitive) -
Regular expression:
=~
-
IS NORMALIZED
-
IS NOT NORMALIZED
These operators perform case-sensitive matching.
Attempting to use them on values which are not STRING
values will return NULL
.
Example graph
The following graph is used for the examples below:
To recreate the graph, run the following query in an empty Neo4j database:
CREATE (alice:Person {name:'Alice', age: 65, role: 'Project manager', email: 'alice@company.com'}),
(cecil:Person {name: 'Cecil', age: 25, role: 'Software developer', email: 'cecil@private.se'}),
(cecilia:Person {name: 'Cecilia', age: 31, role: 'Software developer'}),
(charlie:Person {name: 'Charlie', age: 61, role: 'Security engineer'}),
(daniel:Person {name: 'Daniel', age: 39, role: 'Director', email: 'daniel@company.com'}),
(eskil:Person {name: 'Eskil', age: 39, role: 'CEO', email: 'eskil@company.com'})
Examples
STARTS WITH
operatorMATCH (n:Person)
WHERE n.name STARTS WITH 'C'
RETURN n.name AS name
name |
---|
|
|
|
Rows: 3 |
ENDS WITH
operatorMATCH (n:Person)
WHERE n.role ENDS WITH 'developer'
RETURN n.name AS name, n.role AS role
name | role |
---|---|
|
|
|
|
Rows: 2 |
CONTAINS
operatorMATCH (n:Person)
WHERE n.role CONTAINS 'eng'
RETURN n.name AS name, n.role AS role
name | role |
---|---|
|
|
Rows: 1 |
Regular expressions
Cypher supports filtering using regular expressions.
The regular expression syntax is inherited from the Java regular expressions.
This includes support for flags that change how STRING
values are matched, including the case-insensitive (?i)
, multiline (?m)
, and dotall (?s)
flags.
Flags are given at the beginning of the regular expression.
=~
)MATCH (n:Person)
WHERE n.email =~ '.*@company.com'
RETURN n.name AS name, n.email AS email
name | |
---|---|
|
|
|
|
|
|
Rows: 3 |
By pre-pending a regular expression with the flag (?i)
, the whole expression becomes case-insensitive:
(?i)
MATCH (n:Person)
WHERE n.name =~ '(?i)CEC.*'
RETURN n.name
The names of both Cecil
and Cecilia
are returned because their name starts with 'CEC'
regardless of casing:
name |
---|
|
|
Rows: 2 |
Escaping in regular expressions
Characters such as .
or *
have special meaning in a regular expression.
To use these as ordinary characters without special meaning, escape them.
MATCH (n:Person)
WHERE n.email =~ '.*\\.se'
RETURN n.name AS name, n.email AS email
Cecil
is returned because his email ends with '.se'
:
name | |
---|---|
|
|
Rows: 1 |
Note that the regular expression constructs in Java regular expressions are applied only after resolving the escaped character sequences in the given string literal. It is sometimes necessary to add additional backslashes to express regular expression constructs. This list clarifies the combination of these two definitions, containing the original escape sequence and the resulting character in the regular expression:
String literal sequence | Resulting Regex sequence | Regex match |
---|---|---|
|
Tab |
Tab |
|
|
Tab |
|
Backspace |
Backspace |
|
|
Word boundary |
|
Newline |
NewLine |
|
|
Newline |
|
Carriage return |
Carriage return |
|
|
Carriage return |
|
Form feed |
Form feed |
|
|
Form feed |
|
Single quote |
Single quote |
|
Double quote |
Double quote |
|
Backslash |
Backslash |
|
|
Backslash |
|
Unicode UTF-16 code point (4 hex digits must follow the |
Unicode UTF-16 code point (4 hex digits must follow the |
|
|
Unicode UTF-16 code point (4 hex digits must follow the |
Using regular expressions with unsanitized user input makes you vulnerable to Cypher injection. Consider using parameters instead. |
String normalization operators
The IS NORMALIZED
operator is used to check whether the given STRING
is in the NFC
Unicode normalization form:
Unicode normalization is a process that transforms different representations of the same string into a standardized form. For more information, see the documentation for Unicode normalization forms. |
IS NORMALIZED
operatorRETURN 'the \u212B char' IS NORMALIZED AS normalized
normalized |
---|
|
|
Because the given STRING
contains a non-normalized Unicode character (\u212B
), FALSE
is returned.
To normalize a STRING
, use the normalize() function.
Note that the IS NORMALIZED
operator returns NULL
when used on a non-STRING
value.
For example, RETURN 1 IS NORMALIZED
returns NULL
.
The IS NOT NORMALIZED
operator is used to check whether the given STRING
is not in the NFC
Unicode normalization form:
IS NOT NORMALIZED
RETURN 'the \u212B char' IS NOT NORMALIZED AS notNormalized
notNormalized |
---|
|
|
Because the given STRING
contains a non-normalized Unicode character (\u212B
), and is not normalized, TRUE
is returned.
Note that the IS NOT NORMALIZED
operator returns NULL
when used on a non-STRING
value.
For example, RETURN 1 IS NOT NORMALIZED
returns NULL
.
Using IS NORMALIZED
with a specified normalization type
It is possible to define which Unicode normalization type is used (the default is NFC
).
The available normalization types are:
-
NFC
-
NFD
-
NFKC
-
NFKD
WITH 'the \u00E4 char' as myString
RETURN myString IS NFC NORMALIZED AS nfcNormalized,
myString IS NFD NORMALIZED AS nfdNormalized
The given STRING
contains the Unicode character: \u00E4
, which is considered normalized in NFC
form, but not in NFD
form.
nfcNormalized | nfdNormalized |
---|---|
|
|
|
It is also possible to specify the normalization form when using the negated normalization operator.
For example, RETURN "string" IS NOT NFD NORMALIZED
.