String operators

String operators are used to perform operations on STRING values. Cypher® contains the following string operators:

  • Prefix: STARTS WITH (case sensitive)

  • Suffix: ENDS WITH (case sensitive)

  • substring: CONTAINS (case sensitive)

  • Regular expression: =~

  • IS NORMALIZED

  • IS NOT NORMALIZED

These operators perform case-sensitive matching. Attempting to use them on values which are not STRING values will return NULL.

Example graph

The following graph is used for the examples below:

predicate operators

To recreate the graph, run the following query in an empty Neo4j database:

CREATE (alice:Person {name:'Alice', age: 65, role: 'Project manager', email: 'alice@company.com'}),
       (cecil:Person {name: 'Cecil', age: 25, role: 'Software developer', email: 'cecil@private.se'}),
       (cecilia:Person {name: 'Cecilia', age: 31, role: 'Software developer'}),
       (charlie:Person {name: 'Charlie', age: 61, role: 'Security engineer'}),
       (daniel:Person {name: 'Daniel', age: 39, role: 'Director', email: 'daniel@company.com'}),
       (eskil:Person {name: 'Eskil', age: 39, role: 'CEO', email: 'eskil@company.com'})

Examples

Example 1. Prefix, suffix, and substring operators
STARTS WITH operator
MATCH (n:Person)
WHERE n.name STARTS WITH 'C'
RETURN n.name AS name
Result
name

"Cecil"

"Cecilia"

"Charlie"

Rows: 3

ENDS WITH operator
MATCH (n:Person)
WHERE n.role ENDS WITH 'developer'
RETURN n.name AS name, n.role AS role
Result
name role

"Cecil"

"Software developer"

"Cecilia"

"Software developer"

Rows: 2

CONTAINS operator
MATCH (n:Person)
WHERE n.role CONTAINS 'eng'
RETURN n.name AS name, n.role AS role
Result
name role

"Charlie"

"Security engineer"

Rows: 1

Regular expressions

Cypher supports filtering using regular expressions. The regular expression syntax is inherited from the Java regular expressions. This includes support for flags that change how STRING values are matched, including the case-insensitive (?i), multiline (?m), and dotall (?s) flags. Flags are given at the beginning of the regular expression.

Example 2. Regular expressions
Regular expression (=~)
MATCH (n:Person)
WHERE n.email =~ '.*@company.com'
RETURN n.name AS name, n.email AS email
Result
name email

"Alice"

"alice@company.com"

"Daniel"

"daniel@company.com"

"Eskil"

"eskil@company.com"

Rows: 3

By pre-pending a regular expression with the flag (?i), the whole expression becomes case-insensitive:

Case-insensitive regular expression (?i)
MATCH (n:Person)
WHERE n.name =~ '(?i)CEC.*'
RETURN n.name

The names of both Cecil and Cecilia are returned because their name starts with 'CEC' regardless of casing:

Result
name

"Cecil"

"Cecilia"

Rows: 2

Escaping in regular expressions

Characters such as . or * have special meaning in a regular expression. To use these as ordinary characters without special meaning, escape them.

Escaped characters in a regular expression
MATCH (n:Person)
WHERE n.email =~ '.*\\.se'
RETURN n.name AS name, n.email AS email

Cecil is returned because his email ends with '.se':

Result
name email

"Cecil"

"cecil@private.se"

Rows: 1

Note that the regular expression constructs in Java regular expressions are applied only after resolving the escaped character sequences in the given string literal. It is sometimes necessary to add additional backslashes to express regular expression constructs. This list clarifies the combination of these two definitions, containing the original escape sequence and the resulting character in the regular expression:

String literal sequence Resulting Regex sequence Regex match

\t

Tab

Tab

\\t

\t

Tab

\b

Backspace

Backspace

\\b

\b

Word boundary

\n

Newline

NewLine

\\n

\n

Newline

\r

Carriage return

Carriage return

\\r

\r

Carriage return

\f

Form feed

Form feed

\\f

\f

Form feed

\'

Single quote

Single quote

\"

Double quote

Double quote

\\

Backslash

Backslash

\\\

\\

Backslash

\uxxxx

Unicode UTF-16 code point (4 hex digits must follow the \u)

Unicode UTF-16 code point (4 hex digits must follow the \u)

\\uxxxx

\uxxxx

Unicode UTF-16 code point (4 hex digits must follow the \u)

Using regular expressions with unsanitized user input makes you vulnerable to Cypher injection. Consider using parameters instead.

String normalization operators

The IS NORMALIZED operator is used to check whether the given STRING is in the NFC Unicode normalization form:

Unicode normalization is a process that transforms different representations of the same string into a standardized form. For more information, see the documentation for Unicode normalization forms.

IS NORMALIZED operator
RETURN 'the \u212B char' IS NORMALIZED AS normalized
Result
normalized

FALSE

Rows: 1

Because the given STRING contains a non-normalized Unicode character (\u212B), FALSE is returned.

To normalize a STRING, use the normalize() function.

Note that the IS NORMALIZED operator returns NULL when used on a non-STRING value. For example, RETURN 1 IS NORMALIZED returns NULL.

The IS NOT NORMALIZED operator is used to check whether the given STRING is not in the NFC Unicode normalization form:

IS NOT NORMALIZED
RETURN 'the \u212B char' IS NOT NORMALIZED AS notNormalized
Result
notNormalized

TRUE

Rows: 1

Because the given STRING contains a non-normalized Unicode character (\u212B), and is not normalized, TRUE is returned.

Note that the IS NOT NORMALIZED operator returns NULL when used on a non-STRING value. For example, RETURN 1 IS NOT NORMALIZED returns NULL.

Using IS NORMALIZED with a specified normalization type

It is possible to define which Unicode normalization type is used (the default is NFC).

The available normalization types are:

  • NFC

  • NFD

  • NFKC

  • NFKD

Query
WITH 'the \u00E4 char' as myString
RETURN myString IS NFC NORMALIZED AS nfcNormalized,
    myString IS NFD NORMALIZED AS nfdNormalized

The given STRING contains the Unicode character: \u00E4, which is considered normalized in NFC form, but not in NFD form.

Result
nfcNormalized nfdNormalized

TRUE

FALSE

Rows: 2

It is also possible to specify the normalization form when using the negated normalization operator. For example, RETURN "string" IS NOT NFD NORMALIZED.