String operators

String operators are used to perform operations on STRING values. Cypher^® contains the following string operators:

Prefix: STARTS WITH (case sensitive)
Suffix: ENDS WITH (case sensitive)
substring: CONTAINS (case sensitive)
Regular expression: =~
IS NORMALIZED Introduced in 5.17
IS NOT NORMALIZED Introduced in 5.17

These operators perform case-sensitive matching. Attempting to use them on values which are not STRING values will return null.

Example graph

The following graph is used for the examples below:

To recreate the graph, run the following query in an empty Neo4j database:

CREATE (alice:Person {name:'Alice', age: 65, role: 'Project manager', email: 'alice@company.com'}),
       (cecil:Person {name: 'Cecil', age: 25, role: 'Software developer', email: 'cecil@private.se'}),
       (cecilia:Person {name: 'Cecilia', age: 31, role: 'Software developer'}),
       (charlie:Person {name: 'Charlie', age: 61, role: 'Security engineer'}),
       (daniel:Person {name: 'Daniel', age: 39, role: 'Director', email: 'daniel@company.com'}),
       (eskil:Person {name: 'Eskil', age: 39, role: 'CEO', email: 'eskil@company.com'})

Examples

Example 1. Prefix, suffix, and substring operators

STARTS WITH operator

MATCH (n:Person)
WHERE n.name STARTS WITH 'C'
RETURN n.name AS name

Result
name
`"Cecil"`
`"Cecilia"`
`"Charlie"`
Rows: 3

ENDS WITH operator

MATCH (n:Person)
WHERE n.role ENDS WITH 'developer'
RETURN n.name AS name, n.role AS role

Result
name	role
`"Cecil"`	`"Software developer"`
`"Cecilia"`	`"Software developer"`
Rows: 2

CONTAINS operator

MATCH (n:Person)
WHERE n.role CONTAINS 'eng'
RETURN n.name AS name, n.role AS role

Result
name	role
`"Charlie"`	`"Security engineer"`
Rows: 1

Regular expressions

Cypher supports filtering using regular expressions. The regular expression syntax is inherited from the Java regular expressions. This includes support for flags that change how STRING values are matched, including the case-insensitive (?i), multiline (?m), and dotall (?s) flags. Flags are given at the beginning of the regular expression.

Example 2. Regular expressions

Regular expression (=~)

MATCH (n:Person)
WHERE n.email =~ '.*@company.com'
RETURN n.name AS name, n.email AS email

Result
name	email
`"Alice"`	`"alice@company.com"`
`"Daniel"`	`"daniel@company.com"`
`"Eskil"`	`"eskil@company.com"`
Rows: 3

By pre-pending a regular expression with the flag (?i), the whole expression becomes case-insensitive:

Case-insensitive regular expression (?i)

MATCH (n:Person)
WHERE n.name =~ '(?i)CEC.*'
RETURN n.name

The names of both Cecil and Cecilia are returned because their name starts with 'CEC' regardless of casing:

Result
name
`"Cecil"`
`"Cecilia"`
Rows: 2

Escaping in regular expressions

Characters such as . or * have special meaning in a regular expression. To use these as ordinary characters without special meaning, escape them.

Escaped characters in a regular expression

MATCH (n:Person)
WHERE n.email =~ '.*\\.se'
RETURN n.name AS name, n.email AS email

Cecil is returned because his email ends with '.se':

Result
name	email
`"Cecil"`	`"cecil@private.se"`
Rows: 1

Note that the regular expression constructs in Java regular expressions are applied only after resolving the escaped character sequences in the given string literal. It is sometimes necessary to add additional backslashes to express regular expression constructs. This list clarifies the combination of these two definitions, containing the original escape sequence and the resulting character in the regular expression:

String literal sequence Resulting Regex sequence Regex match

String literal sequence	Resulting Regex sequence	Regex match
`\t`	Tab	Tab
`\\t`	`\t`	Tab
`\b`	Backspace	Backspace
`\\b`	`\b`	Word boundary
`\n`	Newline	NewLine
`\\n`	`\n`	Newline
`\r`	Carriage return	Carriage return
`\\r`	`\r`	Carriage return
`\f`	Form feed	Form feed
`\\f`	`\f`	Form feed
`\'`	Single quote	Single quote
`\"`	Double quote	Double quote
`\\`	Backslash	Backslash
`\\\`	`\\`	Backslash
`\uxxxx`	Unicode UTF-16 code point (4 hex digits must follow the `\u`)	Unicode UTF-16 code point (4 hex digits must follow the `\u`)
`\\uxxxx`	`\uxxxx`	Unicode UTF-16 code point (4 hex digits must follow the `\u`)

\t

Tab

\\t

\t

Tab

\b

Backspace

\\b

\b

Word boundary

\n

Newline

NewLine

\\n

\n

Newline

\r

Carriage return

\\r

\r

Carriage return

\f

Form feed

\\f

\f

Form feed

\'

Single quote

\"

Double quote

\\

Backslash

\\\

\\

Backslash

\uxxxx

Unicode UTF-16 code point (4 hex digits must follow the \u)

\\uxxxx

\uxxxx

Unicode UTF-16 code point (4 hex digits must follow the \u)

Using regular expressions with unsanitized user input makes you vulnerable to Cypher injection. Consider using parameters instead.

String normalization operators

The IS NORMALIZED operator is used to check whether the given STRING is in the NFC Unicode normalization form:

Unicode normalization is a process that transforms different representations of the same string into a standardized form. For more information, see the documentation for Unicode normalization forms.

IS NORMALIZED operator

RETURN 'the \u212B char' IS NORMALIZED AS normalized

Result
normalized
`false`
`Rows: 1`

Because the given STRING contains a non-normalized Unicode character (\u212B), false is returned.

To normalize a STRING, use the normalize() function.

Note that the IS NORMALIZED operator returns null when used on a non-STRING value. For example, RETURN 1 IS NORMALIZED returns null.

The IS NOT NORMALIZED operator is used to check whether the given STRING is not in the NFC Unicode normalization form:

IS NOT NORMALIZED

RETURN 'the \u212B char' IS NOT NORMALIZED AS notNormalized

Result
notNormalized
`TRUE`
`Rows: 1`

Because the given STRING contains a non-normalized Unicode character (\u212B), and is not normalized, true is returned.

Note that the IS NOT NORMALIZED operator returns null when used on a non-STRING value. For example, RETURN 1 IS NOT NORMALIZED returns null.

Using `IS NORMALIZED` with a specified normalization type

It is possible to define which Unicode normalization type is used (the default is NFC).

The available normalization types are:

NFC
NFD
NFKC
NFKD

Query

WITH 'the \u00E4 char' as myString
RETURN myString IS NFC NORMALIZED AS nfcNormalized,
    myString IS NFD NORMALIZED AS nfdNormalized

The given STRING contains the Unicode character: \u00E4, which is considered normalized in NFC form, but not in NFD form.

Result
nfcNormalized	nfdNormalized
`true`	`false`
`Rows: 2`

It is also possible to specify the normalization form when using the negated normalization operator. For example, RETURN "string" IS NOT NFD NORMALIZED.