RDB Spec

Format layout

An rdb file begins with “REDIS0003” encoded as a bytestring. [9 bytes]

For each database in the dump

The first byte is the SELECT_DB opcode (0xfe) [1 byte]
The second byte is the length-encode database number (See below) [1-4 byte(s)]
For each key-value pair in the database:
1. If the object has an expire time:
2. The first byte is the EXPIRETIME opcode (0xfd) or, in Redis 2.6+, EXPIRETIME_MS opcode (0xfc)
3. What follows is the expire time in seconds or, in Redis 2.6+, in ms (See below)
The first byte is the object type (See below for list of object type opcodes)
Then, the string-encoded key for the object (See below)
Then, the object itself, specially encoded for its type (See below)

An rdb file ends with the EOF opcode (0xff) [1 byte]

Object types

0×00 → String object
0×01 → List object
0×02 → Set object
0×03 => Sorted set object
0×04 → Hash object
0×09 → Zipmap
0×0a → Ziplist
0×0b → Intset
0×0c → Ziplist (as value-score pairs for a sorted set)

Length-encoding

The first two most-significant bits hold the type of the value and the length

The value is not specially encoded and has a length less than 64 (00)
The value is not specially encoded and has a length less than 16,384 (01)
The value is not specially encoded and has a length that fits in a 32-bit integer (10)
The value is specially encoded (11)

The remaining bits hold the length (or, in the last case, encoding type) to be encoded (marked below as X’s)

(00) | XXXXXX [1 byte total]
(01) | XXXXXX XXXXXXXX [2 bytes total]
(10) | 000000 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX [5 bytes total]
(11) | XXXXXX [1 byte total] (The remaining 6 bits determine how the following value is encoded. This is unique to each type and will be described in the specific sections below on value types.)

String objects

The first 1-5 bytes of a string object contains the length of the string object and whether or not it is an encoded string, as described in the length-encoding section above .

If the string is not an encoded type, it is simply a series of n bytes where n is the length determined above.

If the string is an encoded type and the bottom two bits of the length are either (00,01,or 10), the string is loaded as an integer and converted to a string. Otherwise the string is lzf-decompressed and loaded as a string.

If the bottom two bits of the length are set to 00, the string is a 8-bit signed integer
If the bottom two bits of the length are set to 01, the string is a 16-bit signed integer
If the bottom two bits of the length are set to 10, the string is a 32-bit signed integer
If the bottom two bits of the length are set to 10, the string is lzf-compressed

Example (the string “foo”):
⁰³ → (00000011) The string is normally encoded as a series of 3 characters
[66 6f 6f] => “foo”

Example (the string “-1”):
[c0] → (11000000) The string is an 8-bit signed integer
[ff] => “-1”

Example (the string “256”):
[c1] → (11000001) The string is a 16-bit signed integer
[00 01] → “256”

List objects

The first 1-5 bytes of a list object contains the number of list members, as described in the length-encoding section above .

There is no special encoding type for lists. Their compact representation is the ziplist but this will be specified as the object type, not in the length encoding.

List objects are simply n strings, decoded as string objects where n is the length from above.

Set objects

The first 1-5 bytes of a set object contains the number of set members, as described in the length-encoding section above .

There is no special encoding type for lists. Their compact representation is the intset but this will be specified as the object type, not in the length encoding. The intset is reserved for sets in which all members are integers.

Set objects are simply n strings, decoded as string objects where n is the length from above.

Sorted set object

The first 1-5 bytes of a sorted set object contains the number of value-score pairs, as described in the length-encoding section above .

There is no special encoding type for sorted sets. Their compact representation is a ziplist of value-score pairs but this will be specified as the object type, not in the length encoding.

Sorted Set objects are n value-score pairs.

Each value is decoded as string objects where n is the length from above.

Every value is followed by a double value specifying its score. This double should be decoded as specified in the double section below .

Hash object

The first 1-5 bytes of a hash object contains the number of key-value pairs, as described in the length-encoding section above .

There is no special encoding type for hashes. Their compact representation is the zipmap but this will be specified as the object type, not in the length encoding.

The rest of the hash is decoded as 2n string objects where n is the length from above.

Every other string is the key followed by the string representing its value.

Ziplist

(Note: I was unable to find whether the endiannesses listed below as little-endian encodings were always litle-endian or whether they are stored as host byte order.)

Ziplists are space-efficient special encodings for lists and sorted sets. The max number of members and max size for the ziplist encoding is set in the conf file that the Redis server reads when starting.

The first 4 bytes store the number of bytes in the ziplist

The next 4 bytes store the offset (in bytes) to the end of the last entry in the list

The next 2 bytes store the number of members of the ziplist

The remaining n-1 bytes of the ziplist store a sequence of members. If the ziplist is encoding a sorted set, the members should be parsed as value, score pairs.

The structure of every ziplist member is as follows:

First, is the number of bytes of the previous member in the ziplist. If this byte is less than 0xfe, the length is stored as a single byte. If the first byte is set to 0xfe, the next four bytes will hold the length of the previous member.
Next, is the encoding and length of the current member of the ziplist. This is similar, though not exactly the same, to the way the lengths and encodings of RDB objects are stored.
1. If the first two most-significant bits are set to zero (00), the ziplist member is encoded as an n-bytes long string where n is the value of the remaining 6 bits. (00|XXXXXX)
2. If the first two most-significant bits are set to one (01), the ziplist member is encoded as an n-bytes long string where n is the conjunction of remaining 6 bits shifted left 8 bytes and the next byte (01|XXXXXX XXXXXXXX)
3. If the first two most-significant bits are set to two (10), the ziplist member is encoded as an n-bytes long string where n is the conjunction of remaining 6 bits shifted left 16 bytes and the next two bytes (01|XXXXXX XXXXXXXX XXXXXXXX).
4. If the first two most-significant bits are set to three and the next two bits are set to zero (1100), the ziplist member is encoded as an int16t (2 bytes).
5. If the first two most-significant bits are set to three and the next two bits are set to one (1101), the ziplist member is encoded as an int32t (4 bytes).
6. If the first two most-significant bits are set to three and the next two bits are set to two (1110), the ziplist member is encoded as an int64t (8 bytes).
Finally, comes the value specified by the encoding and length

Every ziplist terminates with a 0xff byte

Example (encoding the very silly list [1,1]):

[13 00 00 00] → Little-endian 32-bit length (in bytes) of ziplist

[0e 00 00 00] → little-endian 32-bit offset (in bytes) to the end of the last entry in the list

[02 00] → little-endian 16-bit number of list entries

[00] → number of bytes of previous entry

[c0] → value encoding (11000000, which means a 16-bit signed integer follows)

[01 00] → 16-bit value

[04] → number of bytes of previous entry

[c0] → value encoding (11000000, which means a 16-bit signed integer follows)

[01 00] → 16-bit value

[ff] → end of ziplist

Intset

(Note: I was unable to find whether the endiannesses listed below as little-endian encodings were always litle-endian or whether they are stored as host byte order.)

The intset is used to encode sets consisting only of integers.

Intsets begin with a 32-bit length, specifying the length of each member of the intset.

Second, there is a 32-bit number specifying the number of elements in the intset

Finally, every member is a signed integer, the size of which is specified above.

Intsets end with a 0xff

Example (encoding the set [1,-1])

[02 00 00 00] → Little-endian 32-bit length encoding, in bytes, of members of the intset (2, or 16-bits)

[02 00 00 00] → Little-endian 32-bit number of elements in the intset (2)

[ff ff] → One little-endian 16-bit member (-1 [signed int16])

[01 00] → One little-endian 16-bit member

[ff] → End of intset

Zipmap

Zipmaps are space-efficient special encodings for hashes. The max number of members and max size for the zipmap encoding is set in the conf file that the Redis server reads when starting.

The first byte specifies the length of the zipmap. If the length of the zipmap is greater than or equal to 0xfe, disregard this length and traverse the entire zipmap until 0xff is encountered after a member.

Next is a series of key-value pairs encoded as follows:

First, the length of key. If this value is less than 0xfd, this is the length of the key. A value of 0xfe indicates the length is encoded as the next 4 bytes in host-byte order.
The next n bytes contain a bytestring of the key, where n is the length above.
Next is the length of the value encoded in the same manner as the length of the key.
Next is an 8-byte integer that specifies the number of free unused bytes after the string. When values are re-set from a longer value to a shorter value, some bytes may be set aside to make it faster to re-increase the size of the value. If you are parsing the structure, however, make sure you consume the free bytes are you consume the value. You can then disregard these free bytes.
The next n bytes contain a bytestring of the value, where n is the length above.
The next o bytes contain “free” bytes that should be thrown away, where o is the free length above.

The zipmap ends with a 0xff

Example (encoding the hash {bar: 1})

[01] → The length of the zip-map (1 key-value pair)

[03] → The length of the first key

[62 61 72] → “bar”

[01] → The length of the first value

[02] → Number of free bytes

[31] → “1”

[6e 65] → 2 free bytes (“ne” from a previous setting of bar to “one”. These can be thrown out)

[ff] → End of zipmap

LZF Compression

The LZF compression is a fast, cousin of the more-popular LZW compression algorithm.

The first 1-5 byte(s) are a length-encoded length of the compressed string.

The second 1-5 byte(s) are a length-encoded length of the decompressed string.

The next n bytes are in compressed lzf format, where n is the first length above.

To decompress LZF, use the following loop:

Take one byte and divide it into the first three bits and the remaining five. (LLL | HHHHH)
The first three bytes (LLL) will be a number between 0 and 7 and will be referred to as the “length”.
The remaining five bits specify what is called the “high distance”
1. If the first three bits == 0×00, copy length+1 bytes from the compressed stream and append them to the current decompresses stream. Go to the beginning of this decompression loop and continue.
2. If the first three bits > 0 && < 7, this is “length”. Continue.
3. If the first three bits == 7, fetch a new byte and add 7 to its value. This is the “length”. Continue.
Fetch another byte and combine it with the “high distance.” (Shift the high distance left 8 bits and OR the two). This new value is the “distance”
Now go to the [distance] to last char in the decoded string and copy [length]+2 characters. If length is longer than the remaining characters in the decoded string, loop back to the beginning of distance and continue copying.
1. For instance. If you have decoded “foobar” so far, your distance is 1 and your length+2 is 10, you would repeat “ar” five times, totalling 10 characters.
Repeat this algorithm until you have reached the end of the compressed string.

Example (encoded string “if i never if i never if i never”)

[12] → The compressed length is 18 bytes
[20] → The original length is 32 bytes
[0b] → (000 | 01011) Copy the next 11+1 bytes literally
[69 66 20 69 20 6e 65 76 65 72 20 69] → “if i never i”
[e0 0a] → (111 | 00000) (10) Set length to 10+7.
[0a] → (10) Set distance to (00000 00001010) or 10. We move to the 10th to last character in “if i never i”. This is ‘f’. (“i” is the 0th character). Now copy 10+7+2 characters. There are only 11 characters to copy so we will copy those 11 characters, move back to ‘f’ and copy 8 more. Our decoded string now reads “if i never if i never if i neve”
[00] → (000 | 00000) Copy the next 0+1 byte literally
[72] → (r) “if i never if i never if i never”