PackStream
PackStream is a binary presentation format for the exchange of richly-typed data. It provides a syntax layer for the Bolt messaging protocol.
Version 1
PackStream is a general purpose data serialisation format, originally inspired by (but incompatible with) MessagePack.
The format provides a type system fully compatible with the types supported by Cypher, see the Cypher Manual → Values and types for more information.
PackStream offers a number of core data types, many supported by multiple binary representations, as well as a flexible extension mechanism.
The core data types are described in the table below.
Data type | Description |
---|---|
missing or empty value |
|
true or false |
|
signed 64-bit integer |
|
64-bit floating point number |
|
byte array |
|
unicode text, UTF-8 |
|
ordered collection of values |
|
collection of key-value entries (no order guaranteed) |
|
composite value with a type signature |
Neither unsigned integers nor 32-bit floating point numbers are included. This is a deliberate design decision to allow broader compatibility across client languages. |
The PackStream specified structures are listed in the table below.
Structure name | Description |
---|---|
|
snapshot of a node within a graph database |
|
snapshot of a relationship within a graph database |
|
relationship detail without start or end node information |
|
alternating sequence of nodes and relationships |
|
a date without a time-zone in the ISO-8601 calendar system, e.g. “2007-12-03” |
|
a time with an offset from UTC/Greenwich in the ISO-8601 calendar system, e.g. “10:15:30+01:00” |
|
a time without a time-zone in the ISO-8601 calendar system, e.g. “10:15:30” |
|
a date-time with a time-zone in the ISO-8601 calendar system, e.g. “2007-12-03T10:15:30+01:00 Europe/Paris”, the time-zone is specified in minutes offset from UTC |
|
a date-time with a time-zone in the ISO-8601 calendar system, e.g. “2007-12-03T10:15:30+01:00 Europe/Paris”, the time-zone is specified with a zone ID |
|
a date-time without a time-zone in the ISO-8601 calendar system, e.g. “2007-12-03T10:15:30” |
|
a temporal amount |
|
represents a single location in two-dimensional space |
|
represents a single location in three-dimensional space |
General representation
Every serialised PackStream value begins with a marker byte.
The marker contains both data type information and direct or indirect size information for types that require it. How that size information is encoded varies by marker type.
Some values, such as Boolean true, can be encoded within a single marker byte. Many small integers (specifically between -16 and +127 inclusive) are also encoded within a single byte.
A number of marker bytes are reserved for future expansion of the format itself. These bytes should not be used and encountering them in an incoming stream should treated as an error.
Sized values
Some representations are of a variable length and have their size explicitly encoded in the representation. Such values generally begin with a single marker byte, followed by a size, followed by the data content itself. In this context, the marker denotes both type and scale and therefore determines the number of bytes used to represent the size of the data. The size itself is encoded as either an 8-bit, 16-bit or 32-bit unsigned integer. Sizes longer than this are not supported.
The diagram below illustrates the general layout for a sized value, here with a 16-bit size:
![packstream sized value](../_images/packstream_sized_value.png)
Data types
Boolean
Marker, false: C2
Marker, true: C3
Boolean values are encoded within a single marker byte, using C3
to denote true and `C2`to denote false.
Integer
Markers, TINY_INT
:
Marker | Decimal number |
---|---|
|
-16 |
|
-15 |
|
-14 |
|
-13 |
|
-12 |
|
-11 |
|
-10 |
|
-9 |
|
-8 |
|
-7 |
|
-6 |
|
-5 |
|
-4 |
|
-3 |
|
-2 |
|
-1 |
|
0 |
|
1 |
|
2 |
… |
… |
… |
… |
… |
… |
|
126 |
|
127 |
Marker, INT_8
: C8
Marker, INT_16
: C9
Marker, INT_32
: CA
Marker, INT_64
: CB
Integer values occupy either 1, 2, 3, 5, or 9 bytes depending on magnitude. The available representations are:
Representation | Size (bytes) | Description |
---|---|---|
|
1 |
marker byte only |
|
2 |
marker byte |
|
3 |
marker byte |
|
5 |
marker byte |
|
9 |
marker byte |
The available encodings are illustrated below and each shows a valid representation for the decimal value 42:
Representation | Size (bytes) | Description |
---|---|---|
|
1 |
|
|
2 |
|
|
3 |
|
|
5 |
|
|
9 |
|
Some marker bytes can be used to carry the value of a small integer as well as its type.
These markers can be identified by a zero high-order bit (for positive values) or by a high-order nibble containing only ones (for negative values).
Specifically, values between 00
and 7F
inclusive can be directly translated to and from positive integers with the same value.
Similarly, values between F0
and FF
inclusive can do the same for negative numbers between -16 and -1.
While it is possible to encode small numbers in wider formats, it is generally recommended to use the most compact representation possible. |
The following table shows the optimal representation for every possible integer in the signed 64-bit range:
Range minimum | Range maximum | Optimal representation |
---|---|---|
-9 223 372 036 854 775 808 |
-2 147 483 649 |
|
-2 147 483 648 |
-32 769 |
|
-32 768 |
-129 |
|
-128 |
-17 |
|
-16 |
+127 |
|
+128 |
+32 767 |
|
+32 768 |
+2 147 483 647 |
|
+2 147 483 648 |
+9 223 372 036 854 775 807 |
|
The value -9223372036854775808
(the minimum) can be represented as:
CB 80 00 00 00 00 00 00 00
The value 9223372036854775807
(the maximum) can be represented as:
CB 7F FF FF FF FF FF FF FF
Float
Marker: C1
Floats are double-precision floating-point values, generally used for representing fractions and decimals. They are encoded as a single C1 marker byte followed by 8 bytes which are formatted according to the IEEE 754 floating-point “double format” bit layout in big-endian order.
-
Bit 63 (the bit that is selected by the mask 0x8000000000000000) represents the sign of the number.
-
Bits 62-52 (the bits that are selected by the mask 0x7ff0000000000000) represent the exponent.
-
Bits 51-0 (the bits that are selected by the mask 0x000fffffffffffff) represent the significand (sometimes called the mantissa) of the number.
The value 1.23 in decimal can be represented as:
C1 3F F3 AE 14 7A E1 47 AE
Bytes
Bytes are arrays of byte values. These are used to transmit raw binary data and the size represents the number of bytes contained. Unlike other values, there is no separate encoding for byte arrays containing fewer than 16 bytes.
Marker | Size | Maximum Size |
---|---|---|
|
8-bit big-endian unsigned integer |
255 bytes |
|
16-bit big-endian unsigned integer |
65 535 bytes |
|
32-bit big-endian unsigned integer |
2 147 483 647 bytes |
One of the markers CC
, CD
, or CE
should be used, depending on scale.
This marker is followed by the size and bytes themselves.
Empty byte array b[]
CC 00
Byte array containing three values 1, 2 and 3; b[1, 2, 3]
CC 03 01 02 03
String
Markers
For shorter strings:
Marker | Size (bytes) |
---|---|
|
0 |
|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
9 |
|
10 |
|
11 |
|
12 |
|
13 |
|
14 |
|
15 |
For longer strings:
Marker | Size | Maximum number of bytes |
---|---|---|
|
8-bit big-endian unsigned integer |
255 bytes |
|
16-bit big-endian unsigned integer |
65 535 bytes |
|
32-bit big-endian unsigned integer |
2 147 483 647 bytes |
Text data is represented as UTF-8 encoded bytes.
The sizes used in string representations are the byte counts of the UTF-8 encoded data, not the character count of the original text. |
For encoded text containing fewer than 16 bytes, including empty strings, the marker byte should contain the high-order nibble ´8´ (binary 1000) followed by a low-order nibble containing the size.
The encoded data then immediately follows the marker.
For encoded text containing 16 bytes or more, the marker D0
, D1
or D2
should be used, depending on scale.
This marker is followed by the size and the UTF-8 encoded data.
Value | Encoding |
---|---|
|
|
|
|
|
|
|
|
List
Lists are heterogeneous sequences of values and therefore permit a mixture of types within the same list. The size of a list denotes the number of items within that list, rather than the total packed byte size.
Markers:
Marker | Size (items) | Maximum size |
---|---|---|
|
the low-order nibble of marker |
0 items |
|
the low-order nibble of marker |
1 item |
|
the low-order nibble of marker |
2 items |
|
the low-order nibble of marker |
3 items |
|
the low-order nibble of marker |
4 items |
|
the low-order nibble of marker |
5 items |
|
the low-order nibble of marker |
6 items |
|
the low-order nibble of marker |
7 items |
|
the low-order nibble of marker |
8 items |
|
the low-order nibble of marker |
9 items |
|
the low-order nibble of marker |
10 items |
|
the low-order nibble of marker |
11 items |
|
the low-order nibble of marker |
12 items |
|
the low-order nibble of marker |
13 items |
|
the low-order nibble of marker |
14 items |
|
the low-order nibble of marker |
15 items |
|
8-bit big-endian unsigned integer |
255 items |
|
16-bit big-endian unsigned integer |
65 535 items |
|
32-bit big-endian unsigned integer |
2 147 483 647 items |
For lists containing fewer than 16 items, including empty lists, the marker byte should contain the high-order nibble ´9´ (binary 1001) followed by a low-order nibble containing the size. The items within the list are then serialised in order immediately after the marker.
For lists containing 16 items or more, the marker D4
, D5
or D6
should be used, depending on scale.
This marker is followed by the size and list items, serialized in order.
[]
90
[Integer(1), Integer(2), Integer(3)]
93 01 02 03
[ Integer(1), Float(2.0), String("three") ]
93 01 C1 40 00 00 00 00 00 00 00 85 74 68 72 65 65
[ Integer(1), Integer(2), ... Integer(40) ]
D4 28 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28
Dictionary
A Dictionary
is a list containing key-value entries:
-
keys must be a
String
-
can contain multiple instances of the same key
-
permit a mixture of types
The size of a Dictionary
denotes the number of key-value entries within that Dictionary, not the total packed byte size.
Markers:
Marker | Size (key-value entries) | Maximum size |
---|---|---|
|
contained within low-order nibble of marker |
0 |
|
contained within low-order nibble of marker |
1 |
|
contained within low-order nibble of marker |
2 |
|
contained within low-order nibble of marker |
3 |
|
contained within low-order nibble of marker |
4 |
|
contained within low-order nibble of marker |
5 |
|
contained within low-order nibble of marker |
6 |
|
contained within low-order nibble of marker |
7 |
|
contained within low-order nibble of marker |
8 |
|
contained within low-order nibble of marker |
9 |
|
contained within low-order nibble of marker |
10 |
|
contained within low-order nibble of marker |
11 |
|
contained within low-order nibble of marker |
12 |
|
contained within low-order nibble of marker |
13 |
|
contained within low-order nibble of marker |
14 |
|
contained within low-order nibble of marker |
15 |
|
8-bit big-endian unsigned integer |
255 entries |
|
16-bit big-endian unsigned integer |
65 535 entries |
|
32-bit big-endian unsigned integer |
2 147 483 647 entries |
For a dictionary containing fewer than 16 key-value entries, including an empty dictionary, the marker byte should contain the high-order nibble ´A´ (binary 1010) followed by a low-order nibble containing the size.
The entries within the dictionary are then serialized in [key, value, key, value]
order immediately after the marker.
Keys are always |
For a dictionary containing 16 key-value entries or more, the marker D8
, D9
or DA
should be used, depending on scale.
This marker is followed by the size and the key-value entries.
{}
A0
{"one": "eins"}
A1 83 6F 6E 65 84 65 69 6E 73
{"A": 1, "B": 2 ... "Z": 26}
D8 1A 81 41 01 81 42 02 81 43 03 81 44 04 81 45 05 81 46 06 81 47 07 81 48 08 81 49 09 81 4A 0A 81 4B 0B 81 4C 0C 81 4D 0D 81 4E 0E 81 4F 0F 81 50 10 81 51 11 81 52 12 81 53 13 81 54 14 81 55 15 81 56 16 81 57 17 81 58 18 81 59 19 81 5A 1A
If there are multiple instances of the same key when unpacked, the last seen value for that key should be used.
[("key_1", 1), ("key_2", 2), ("key_1", 3)] -> {"key_1": 3, "key_2": 2}
Structure
A structure is a composite value, comprised of fields and a unique type code. Structure encodings consist, beyond the marker, of a single byte, the tag byte, followed by a sequence of up to 15 fields, each an individual value. The size of a structure is measured as the number of fields and not the total byte size. This count does not include the tag.
Markers:
Marker | Size (fields) | Maximum size |
---|---|---|
|
contained within low-order nibble of marker |
0 fields |
|
contained within low-order nibble of marker |
1 field |
|
contained within low-order nibble of marker |
2 fields |
|
contained within low-order nibble of marker |
3 fields |
|
contained within low-order nibble of marker |
4 fields |
|
contained within low-order nibble of marker |
5 fields |
|
contained within low-order nibble of marker |
6 fields |
|
contained within low-order nibble of marker |
7 fields |
|
contained within low-order nibble of marker |
8 fields |
|
contained within low-order nibble of marker |
9 fields |
|
contained within low-order nibble of marker |
10 fields |
|
contained within low-order nibble of marker |
11 fields |
|
contained within low-order nibble of marker |
12 fields |
|
contained within low-order nibble of marker |
13 fields |
|
contained within low-order nibble of marker |
14 fields |
|
contained within low-order nibble of marker |
15 fields |
For structures containing fewer than 16 fields, the marker byte should contain the high-order nibble ´B´ (binary 1011) followed by a low-order nibble containing the size. The marker is immediately followed by the tag byte and the field values in that order. The tag byte is used to identify the type or class of the structure and may hold any value between 0 and +127.
The table below lists the PackStream specified structures and their code and tag byte.
Structure name | Code | tag byte |
---|---|---|
´N´ |
|
|
´R´ |
|
|
´r´ |
|
|
´P´ |
|
|
´D´ |
|
|
´T´ |
|
|
´t´ |
|
|
´F´ |
|
|
´f´ |
|
|
´d´ |
|
|
´E´ |
|
|
´X´ |
|
|
´Y´ |
|
Node
A snapshot of a node within a graph database.
tag byte: 4E
Number of fields: 3
Node::Structure( id::Integer, labels::List<String>, properties::Dictionary, )
Node( id = 3, labels = ["Example", "Node"], properties = {"name": "example"}, )
B3 4E ...
Relationship
A snapshot of a relationship within a graph database.
tag byte: 52
Number of fields: 5
Relationship::Structure( id::Integer, startNodeId::Integer, endNodeId::Integer, type::String, properties::Dictionary, )
Relationship( id = 11, startNodeId = 2, endNodeId = 3, type = "KNOWS", properties = {"name": "example"}, )
B5 52 ...
UnboundRelationship
A relationship without start or end node ID. It is used internally for Path serialization.
tag byte: 72
Number of fields: 3
UnboundRelationship::Structure( id::Integer, type::String, properties::Dictionary, )
UnboundRelationship( id = 17, type = "KNOWS", properties = {"name": "example"}, )
B3 72 ...
Path
An alternating sequence of nodes and relationships.
tag byte: 50
Number of fields: 3
Path::Structure( nodes::List<Node>, rels::List<UnboundRelationship>, ids::List<Integer>, )
Where the rels
field is a list of unbound relationships and the ids
is a list of relationship- and node IDs to represent the path.
Date
A date without a time-zone in the ISO-8601 calendar system, e.g. “2007-12-03”.
tag byte: 44
Number of fields: 2
Time::Structure( nanoseconds::Integer, tz_offset_seconds::Integer, )
Where the nanoseconds
are nanoseconds since midnight (this time is not UTC) and the tz_offset_seconds
are an offset in seconds from UTC.
To convert to UTC, use:
utc_nanoseconds = nanoseconds - (tz_offset_seconds * 1000000000)
LocalTime
An instant capturing the time of day, but neither the date nor the time zone.
tage byte: 74
Number of fields: 1
LocalTime::Structure( nanoseconds::Integer, )
Where the nanoseconds
are nanoseconds since midnight.
DateTime
An instant capturing the date, the time, and the time zone. The time zone information is specified with a zone offset.
tag byte: 46
Number of fields: 3
DateTime::Structure( seconds::Integer, nanoseconds::Integer, tz_offset_seconds::Integer, )
Where the seconds
are seconds since the adjusted Unix epoch (not UTC) and the tz_offset_seconds
specifies the offset in seconds from UTC.
To convert to UTC, use:
utc_nanoseconds = (seconds * 1000000000) + nanoseconds - (tx_offset_seconds * 1000000000)
DateTimeZoneId
An instant capturing the date, the time, and the time zone. The time zone information is specified with a zone identifier.
tag byte: 66
Number of fields: 3
DateTimeZoneId::Structure( seconds::Integer, nanoseconds::Integer, tz_id::String, )
Where the seconds
are seconds since the adjusted Unix epoch (not UTC) and the tz_id
is an identifier for a specific time zone, e.g. "Europe/Paris".
To convert to UTC, use:
utc_nanoseconds = (seconds * 1000000000) + nanoseconds - get_offset_in_nanoseconds(tz_id)
Duration
A temporal amount.
This captures the difference in time between two instants.
It only captures the amount of time between two instants, it does not capture a start time and end time.
A unit capturing the start time and end time would be a Time Interval
and is out of scope for this proposal.
A duration can be negative.
tag byte: 45
Number of fields: 4
Duration::Structure( months::Integer, days::Integer, seconds::Integer, nanoseconds::Integer, )