logo JSON BinPack


Encodings: String

Do you want to help improve these docs? Edit this page on GitHub


UTF8_STRING_NO_LENGTH

The encoding consist in the UTF-8 encoding of the input string.

Options

Option Type Description
size uint The string UTF-8 byte-length

Conditions

Condition Description
len(value) == size The input string must have the declared UTF-8 byte-length

Examples

Given the input value “foo bar” with a corresponding size of 7, the encoding results in:

+------+------+------+------+------+------+------+
| 0x66 | 0x6f | 0x6f | 0x20 | 0x62 | 0x61 | 0x72 |
+------+------+------+------+------+------+------+
  f      o      o             b      a      r

FLOOR_VARINT_PREFIX_UTF8_STRING_SHARED

The encoding consists of the byte-length of the string minus minimum plus 1 as a Base-128 64-bit Little Endian variable-length unsigned integer followed by the UTF-8 encoding of the input value.

Optionally, if the input string has already been encoded to the buffer using UTF-8, the encoding may consist of the byte constant 0x00 followed by the byte-length of the string minus minimum plus 1 as a Base-128 64-bit Little Endian variable-length unsigned integer, followed by the current offset minus the offset to the start of the UTF-8 string value in the buffer encoded as a Base-128 64-bit Little Endian variable-length unsigned integer.

Options

Option Type Description
minimum uint The inclusive minimum string UTF-8 byte-length

Conditions

Condition Description
len(value) >= minimum The input string byte-length is equal to or greater than the minimum

Examples

Given the input string foo with a minimum 3 where the string has not been previously encoded, the encoding results in:

+------+------+------+------+
| 0x01 | 0x66 | 0x6f | 0x6f |
+------+------+------+------+
         f      o      o

Given the encoding of foo with a minimum of 0 followed by the encoding of foo with a minimum of 3, the encoding may result in:

0      1      2      3      4      5      6
^      ^      ^      ^      ^      ^      ^
+------+------+------+------+------+------+------+
| 0x04 | 0x66 | 0x6f | 0x6f | 0x00 | 0x01 | 0x05 |
+------+------+------+------+------+------+------+
         f      o      o                    6 - 1

ROOF_VARINT_PREFIX_UTF8_STRING_SHARED

The encoding consists of maximum minus the byte-length of the string plus 1 as a Base-128 64-bit Little Endian variable-length unsigned integer followed by the UTF-8 encoding of the input value.

Optionally, if the input string has already been encoded to the buffer using UTF-8, the encoding may consist of the byte constant 0x00 followed by maximum minus the byte-length of the string plus 1 as a Base-128 64-bit Little Endian variable-length unsigned integer, followed by the current offset minus the offset to the start of the UTF-8 string value in the buffer encoded as a Base-128 64-bit Little Endian variable-length unsigned integer.

Options

Option Type Description
maximum uint The inclusive maximum string UTF-8 byte-length

Conditions

Condition Description
len(value) <= maximum The input string byte-length is equal to or less than the maximum

Examples

Given the input string foo with a maximum 4 where the string has not been previously encoded, the encoding results in:

+------+------+------+------+
| 0x02 | 0x66 | 0x6f | 0x6f |
+------+------+------+------+
         f      o      o

Given the encoding of foo with a maximum of 3 followed by the encoding of foo with a maximum of 5, the encoding may result in:

0      1      2      3      4      5      6
^      ^      ^      ^      ^      ^      ^
+------+------+------+------+------+------+------+
| 0x01 | 0x66 | 0x6f | 0x6f | 0x00 | 0x03 | 0x05 |
+------+------+------+------+------+------+------+
         f      o      o                    6 - 1

BOUNDED_8BIT_PREFIX_UTF8_STRING_SHARED

The encoding consists of the byte-length of the string minus minimum plus 1 as an 8-bit fixed-length unsigned integer followed by the UTF-8 encoding of the input value.

Optionally, if the input string has already been encoded to the buffer using UTF-8, the encoding may consist of the byte constant 0x00 followed by the byte-length of the string minus minimum plus 1 as an 8-bit fixed-length unsigned integer, followed by the current offset minus the offset to the start of the UTF-8 string value in the buffer encoded as a Base-128 64-bit Little Endian variable-length unsigned integer.

The byte-length of the string is encoded even if maximum equals minimum in order to disambiguate between shared and non-shared fixed strings.

Options

Option Type Description
minimum uint The inclusive minimum string UTF-8 byte-length
maximum uint The inclusive maximum string UTF-8 byte-length

Conditions

Condition Description
len(value) >= minimum The input string byte-length is equal to or greater than the minimum
len(value) <= maximum The input string byte-length is equal to or less than the maximum
maximum - minimum < 2 ** 8 - 1 The range minus 1 must be representable in 8 bits

Examples

Given the input string foo with a minimum 3 and a maximum 5 where the string has not been previously encoded, the encoding results in:

+------+------+------+------+
| 0x01 | 0x66 | 0x6f | 0x6f |
+------+------+------+------+
         f      o      o

Given the encoding of foo with a minimum of 0 and a maximum of 6 followed by the encoding of foo with a minimum of 3 and a maximum of 100, the encoding may result in:

0      1      2      3      4      5      6
^      ^      ^      ^      ^      ^      ^
+------+------+------+------+------+------+------+
| 0x04 | 0x66 | 0x6f | 0x6f | 0x00 | 0x01 | 0x05 |
+------+------+------+------+------+------+------+
         f      o      o                    6 - 1

RFC3339_DATE_INTEGER_TRIPLET

The encoding consists of an implementation of RFC3339 date expressions as the sequence of 3 integers: the year as a 16-bit fixed-length Little Endian unsigned integer, the month as an 8-bit fixed-length unsigned integer, and the day as an 8-bit fixed-length unsigned integer.

Options

None

Conditions

Condition Description
len(value) == 10 The input string consists of 10 characters
value[0:4] >= 0 The year is greater than or equal to 0
value[0:4] <= 9999 The year is less than or equal to 9999 as stated by RFC3339
value[4] == '-' The year and the month are divided by a hyphen
value[5:7] >= 1 The month is greater than or equal to 1
value[5:7] <= 12 The month is less than or equal to 12
value[7] == '-' The month and the day are divided by a hyphen
value[8:10] >= 1 The day is greater than or equal to 1
value[8:10] <= 31 The day is less than or equal to 31

Examples

Given the input string 2014-10-01, the encoding results in:

+------+------+------+------+
| 0xde | 0x07 | 0x0a | 0x01 |
+------+------+------+------+
  year   ...    month  day

PREFIX_VARINT_LENGTH_STRING_SHARED

The encoding consists of the byte-length of the string plus 1 as a Base-128 64-bit Little Endian variable-length unsigned integer followed by the UTF-8 encoding of the input value.

Optionally, if the input string has already been encoded to the buffer using this encoding the encoding may consist of the byte constant 0x00 followed by the current offset minus the offset to the start of the string as a Base-128 64-bit Little Endian variable-length unsigned integer. It is permissible to point to another instance of the string that is a pointer itself.

Options

None

Conditions

None

Examples

Given the input string foo where the string has not been previously encoded, the encoding results in:

+------+------+------+------+
| 0x04 | 0x66 | 0x6f | 0x6f |
+------+------+------+------+
         f      o      o

Given the encoding of foo repeated 3 times, the encoding may result in:

0      1      2      3      4      5      6      7
^      ^      ^      ^      ^      ^      ^      ^
+------+------+------+------+------+------+------+------+
| 0x04 | 0x66 | 0x6f | 0x6f | 0x00 | 0x05 | 0x00 | 0x03 |
+------+------+------+------+------+------+------+------+
         f      o      o             5 - 0         7 - 4