encoding.md - capnproto - FORK: Cap'n Proto serialization/RPC system

encoding.md (24468B)
      1 ---
      2 layout: page
      3 title: Encoding Spec
      4 ---
      5 
      6 # Encoding Spec
      7 
      8 ## Organization
      9 
     10 ### 64-bit Words
     11 
     12 For the purpose of Cap'n Proto, a "word" is defined as 8 bytes, or 64 bits.  Since alignment of
     13 data is important, all objects (structs, lists, and blobs) are aligned to word boundaries, and
     14 sizes are usually expressed in terms of words.  (Primitive values are aligned to a multiple of
     15 their size within a struct or list.)
     16 
     17 ### Messages
     18 
     19 The unit of communication in Cap'n Proto is a "message".  A message is a tree of objects, with
     20 the root always being a struct.
     21 
     22 Physically, messages may be split into several "segments", each of which is a flat blob of bytes.
     23 Typically, a segment must be loaded into a contiguous block of memory before it can be accessed,
     24 so that the relative pointers within the segment can be followed quickly.  However, when a message
     25 has multiple segments, it does not matter where those segments are located in memory relative to
     26 each other; inter-segment pointers are encoded differently, as we'll see later.
     27 
     28 Ideally, every message would have only one segment.  However, there are a few reasons why splitting
     29 a message into multiple segments may be convenient:
     30 
     31 * It can be difficult to predict how large a message might be until you start writing it, and you
     32   can't start writing it until you have a segment to write to.  If it turns out the segment you
     33   allocated isn't big enough, you can allocate additional segments without the need to relocate the
     34   data you've already written.
     35 * Allocating excessively large blocks of memory can make life difficult for memory allocators,
     36   especially on 32-bit systems with limited address space.
     37 
     38 The first word of the first segment of the message is always a pointer pointing to the message's
     39 root struct.
     40 
     41 ### Objects
     42 
     43 Each segment in a message contains a series of objects.  For the purpose of Cap'n Proto, an "object"
     44 is any value which may have a pointer pointing to it.  Pointers can only point to the beginning of
     45 objects, not into the middle, and no more than one pointer can point at each object.  Thus, objects
     46 and the pointers connecting them form a tree, not a graph.  An object is itself composed of
     47 primitive data values and pointers, in a layout that depends on the kind of object.
     48 
     49 At the moment, there are three kinds of objects:  structs, lists, and far-pointer landing pads.
     50 Blobs might also be considered to be a kind of object, but are encoded identically to lists of
     51 bytes.
     52 
     53 ## Value Encoding
     54 
     55 ### Primitive Values
     56 
     57 The built-in primitive types are encoded as follows:
     58 
     59 * `Void`:  Not encoded at all.  It has only one possible value thus carries no information.
     60 * `Bool`:  One bit.  1 = true, 0 = false.
     61 * Integers:  Encoded in little-endian format.  Signed integers use two's complement.
     62 * Floating-points:  Encoded in little-endian IEEE-754 format.
     63 
     64 Primitive types must always be aligned to a multiple of their size.  Note that since the size of
     65 a `Bool` is one bit, this means eight `Bool` values can be encoded in a single byte -- this differs
     66 from C++, where the `bool` type takes a whole byte.
     67 
     68 ### Enums
     69 
     70 Enums are encoded the same as `UInt16`.
     71 
     72 ## Object Encoding
     73 
     74 ### Blobs
     75 
     76 The built-in blob types are encoded as follows:
     77 
     78 * `Data`:  Encoded as a pointer, identical to `List(UInt8)`.
     79 * `Text`:  Like `Data`, but the content must be valid UTF-8, and the last byte of the content must
     80   be zero.  The encoding allows bytes other than the last to be zero, but some applications
     81   (especially ones written in languages that use NUL-terminated strings) may truncate at the first
     82   zero.  If a particular text field is explicitly intended to support zero bytes, it should
     83   document this, but otherwise senders should assume that zero bytes are not allowed to be safe.
     84   Note that the NUL terminator is included in the size sent on the wire, but the runtime library
     85   should not count it in any size reported to the application.
     86 
     87 ### Structs
     88 
     89 A struct value is encoded as a pointer to its content.  The content is split into two sections:
     90 data and pointers, with the pointer section appearing immediately after the data section.  This
     91 split allows structs to be traversed (e.g., copied) without knowing their type.
     92 
     93 A struct pointer looks like this:
     94 
     95     lsb                      struct pointer                       msb
     96     +-+-----------------------------+---------------+---------------+
     97     |A|             B               |       C       |       D       |
     98     +-+-----------------------------+---------------+---------------+
     99 
    100     A (2 bits) = 0, to indicate that this is a struct pointer.
    101     B (30 bits) = Offset, in words, from the end of the pointer to the
    102         start of the struct's data section.  Signed.
    103     C (16 bits) = Size of the struct's data section, in words.
    104     D (16 bits) = Size of the struct's pointer section, in words.
    105 
    106 Fields are positioned within the struct according to an algorithm with the following principles:
    107 
    108 * The position of each field depends only on its definition and the definitions of lower-numbered
    109   fields, never on the definitions of higher-numbered fields.  This ensures backwards-compatibility
    110   when new fields are added.
    111 * Due to alignment requirements, fields in the data section may be separated by padding.  However,
    112   later-numbered fields may be positioned into the padding left between earlier-numbered fields.
    113   Because of this, a struct will never contain more than 63 bits of padding.  Since objects are
    114   rounded up to a whole number of words anyway, padding never ends up wasting space.
    115 * Unions and groups need not occupy contiguous memory.  Indeed, they may have to be split into
    116   multiple slots if new fields are added later on.
    117 
    118 Field offsets are computed by the Cap'n Proto compiler.  The precise algorithm is too complicated
    119 to describe here, but you need not implement it yourself, as the compiler can produce a compiled
    120 schema format which includes offset information.
    121 
    122 #### Default Values
    123 
    124 A default struct is always all-zeros.  To achieve this, fields in the data section are stored xor'd
    125 with their defined default values.  An all-zero pointer is considered "null"; accessor methods
    126 for pointer fields check for null and return a pointer to their default value in this case.
    127 
    128 There are several reasons why this is desirable:
    129 
    130 * Cap'n Proto messages are often "packed" with a simple compression algorithm that deflates
    131   zero-value bytes.
    132 * Newly-allocated structs only need to be zero-initialized, which is fast and requires no knowledge
    133   of the struct type except its size.
    134 * If a newly-added field is placed in space that was previously padding, messages written by old
    135   binaries that do not know about this field will still have its default value set correctly --
    136   because it is always zero.
    137 
    138 #### Zero-sized structs.
    139 
    140 As stated above, a pointer whose bits are all zero is considered a null pointer, *not* a struct of
    141 zero size. To encode a struct of zero size, set A, C, and D to zero, and set B (the offset) to -1.
    142 
    143 **Historical explanation:** A null pointer is intended to be treated as equivalent to the field's
    144 default value. Early on, it was thought that a zero-sized struct was a suitable synonym for
    145 null, since interpreting an empty struct as any struct type results in a struct whose fields are
    146 all default-valued. So, the pointer encoding was designed such that a zero-sized struct's pointer
    147 would be all-zero, so that it could conveniently be overloaded to mean "null".
    148 
    149 However, it turns out there are two important differences between a zero-sized struct and a null
    150 pointer. First, applications often check for null explicitly when implementing optional fields.
    151 Second, an empty struct is technically equivalent to the default value for the struct *type*,
    152 whereas a null pointer is equivalent to the default value for the particular *field*. These are
    153 not necessarily the same.
    154 
    155 It therefore became necessary to find a different encoding for zero-sized structs. Since the
    156 struct has zero size, the pointer's offset can validly point to any location so long as it is
    157 in-bounds. Since an offset of -1 points to the beginning of the pointer itself, it is known to
    158 be in-bounds. So, we use an offset of -1 when the struct has zero size.
    159 
    160 ### Lists
    161 
    162 A list value is encoded as a pointer to a flat array of values.
    163 
    164     lsb                       list pointer                        msb
    165     +-+-----------------------------+--+----------------------------+
    166     |A|             B               |C |             D              |
    167     +-+-----------------------------+--+----------------------------+
    168 
    169     A (2 bits) = 1, to indicate that this is a list pointer.
    170     B (30 bits) = Offset, in words, from the end of the pointer to the
    171         start of the first element of the list.  Signed.
    172     C (3 bits) = Size of each element:
    173         0 = 0 (e.g. List(Void))
    174         1 = 1 bit
    175         2 = 1 byte
    176         3 = 2 bytes
    177         4 = 4 bytes
    178         5 = 8 bytes (non-pointer)
    179         6 = 8 bytes (pointer)
    180         7 = composite (see below)
    181     D (29 bits) = Size of the list:
    182         when C <> 7: Number of elements in the list.
    183         when C = 7: Number of words in the list, not counting the tag word
    184         (see below).
    185 
    186 The pointed-to values are tightly-packed.  In particular, `Bool`s are packed bit-by-bit in
    187 little-endian order (the first bit is the least-significant bit of the first byte).
    188 
    189 When C = 7, the elements of the list are fixed-width composite values -- usually, structs.  In
    190 this case, the list content is prefixed by a "tag" word that describes each individual element.
    191 The tag has the same layout as a struct pointer, except that the pointer offset (B) instead
    192 indicates the number of elements in the list.  Meanwhile, section (D) of the list pointer -- which
    193 normally would store this element count -- instead stores the total number of _words_ in the list
    194 (not counting the tag word).  The reason we store a word count in the pointer rather than an element
    195 count is to ensure that the extents of the list's location can always be determined by inspecting
    196 the pointer alone, without having to look at the tag; this may allow more-efficient prefetching in
    197 some use cases.  The reason we don't store struct lists as a list of pointers is because doing so
    198 would take significantly more space (an extra pointer per element) and may be less cache-friendly.
    199 
    200 In the future, we could consider implementing matrixes using the "composite" element type, with the
    201 elements being fixed-size lists rather than structs.  In this case, the tag would look like a list
    202 pointer rather than a struct pointer.  As of this writing, no such feature has been implemented.
    203 
    204 A struct list must always be written using C = 7. However, a list of any element size (except
    205 C = 1, i.e. 1-bit) may be *decoded* as a struct list, with each element being interpreted as being
    206 a prefix of the struct data. For instance, a list of 2-byte values (C = 3) can be decoded as a
    207 struct list where each struct has 2 bytes in their "data" section (and an empty pointer section). A
    208 list of pointer values (C = 6) can be decoded as a struct list where each struct has a pointer
    209 section with one pointer (and an empty data section). The purpose of this rule is to make it
    210 possible to upgrade a list of primitives to a list of structs, as described under the
    211 [protocol evolution rules](language.html#evolving-your-protocol).
    212 (We make a special exception that boolean lists cannot be upgraded in this way due to the
    213 unreasonable implementation burden.) Note that even though struct lists can be decoded from any
    214 element size (except C = 1), it is NOT permitted to encode a struct list using any type other than
    215 C = 7 because doing so would interfere with the [canonicalization algorithm](#canonicalization).
    216 
    217 ### Inter-Segment Pointers
    218 
    219 When a pointer needs to point to a different segment, offsets no longer work.  We instead encode
    220 the pointer as a "far pointer", which looks like this:
    221 
    222     lsb                        far pointer                        msb
    223     +-+-+---------------------------+-------------------------------+
    224     |A|B|            C              |               D               |
    225     +-+-+---------------------------+-------------------------------+
    226 
    227     A (2 bits) = 2, to indicate that this is a far pointer.
    228     B (1 bit) = 0 if the landing pad is one word, 1 if it is two words.
    229         See explanation below.
    230     C (29 bits) = Offset, in words, from the start of the target segment
    231         to the location of the far-pointer landing-pad within that
    232         segment.  Unsigned.
    233     D (32 bits) = ID of the target segment.  (Segments are numbered
    234         sequentially starting from zero.)
    235 
    236 If B == 0, then the "landing pad" of a far pointer is normally just another pointer, which in turn
    237 points to the actual object.
    238 
    239 If B == 1, then the "landing pad" is itself another far pointer that is interpreted differently:
    240 This far pointer (which always has B = 0) points to the start of the object's _content_, located in
    241 some other segment.  The landing pad is itself immediately followed by a tag word.  The tag word
    242 looks exactly like an intra-segment pointer to the target object would look, except that the offset
    243 is always zero.
    244 
    245 The reason for the convoluted double-far convention is to make it possible to form a new pointer
    246 to an object in a segment that is full.  If you can't allocate even one word in the segment where
    247 the target resides, then you will need to allocate a landing pad in some other segment, and use
    248 this double-far approach.  This should be exceedingly rare in practice since pointers are normally
    249 set to point to new objects, not existing ones.
    250 
    251 ### Capabilities (Interfaces)
    252 
    253 When using Cap'n Proto for [RPC](rpc.html), every message has an associated "capability table"
    254 which is a flat list of all capabilities present in the message body.  The details of what this
    255 table contains and where it is stored are the responsibility of the RPC system; in some cases, the
    256 table may not even be part of the message content.
    257 
    258 A capability pointer, then, simply contains an index into the separate capability table.
    259 
    260     lsb                    capability pointer                     msb
    261     +-+-----------------------------+-------------------------------+
    262     |A|              B              |               C               |
    263     +-+-----------------------------+-------------------------------+
    264 
    265     A (2 bits) = 3, to indicate that this is an "other" pointer.
    266     B (30 bits) = 0, to indicate that this is a capability pointer.
    267         (All other values are reserved for future use.)
    268     C (32 bits) = Index of the capability in the message's capability
    269         table.
    270 
    271 In [rpc.capnp](https://github.com/sandstorm-io/capnproto/blob/master/c++/src/capnp/rpc.capnp), the
    272 capability table is encoded as a list of `CapDescriptors`, appearing along-side the message content
    273 in the `Payload` struct.  However, some use cases may call for different approaches.  A message
    274 that is built and consumed within the same process need not encode the capability table at all
    275 (it can just keep the table as a separate array).  A message that is going to be stored to disk
    276 would need to store a table of `SturdyRef`s instead of `CapDescriptor`s.
    277 
    278 ## Serialization Over a Stream
    279 
    280 When transmitting a message, the segments must be framed in some way, i.e. to communicate the
    281 number of segments and their sizes before communicating the actual data.  The best framing approach
    282 may differ depending on the medium -- for example, messages read via `mmap` or shared memory may
    283 call for a different approach than messages sent over a socket or a pipe.  Cap'n Proto does not
    284 attempt to specify a framing format for every situation.  However, since byte streams are by far
    285 the most common transmission medium, Cap'n Proto does define and implement a recommended framing
    286 format for them.
    287 
    288 When transmitting over a stream, the following should be sent.  All integers are unsigned and
    289 little-endian.
    290 
    291 * (4 bytes) The number of segments, minus one (since there is always at least one segment).
    292 * (N * 4 bytes) The size of each segment, in words.
    293 * (0 or 4 bytes) Padding up to the next word boundary.
    294 * The content of each segment, in order.
    295 
    296 ### Packing
    297 
    298 For cases where bandwidth usage matters, Cap'n Proto defines a simple compression scheme called
    299 "packing".  This scheme is based on the observation that Cap'n Proto messages contain lots of
    300 zero bytes: padding bytes, unset fields, and high-order bytes of small-valued integers.
    301 
    302 In packed format, each word of the message is reduced to a tag byte followed by zero to eight
    303 content bytes.  The bits of the tag byte correspond to the bytes of the unpacked word, with the
    304 least-significant bit corresponding to the first byte.  Each zero bit indicates that the
    305 corresponding byte is zero.  The non-zero bytes are packed following the tag.
    306 
    307 For example, here is some typical Cap'n Proto data (a struct pointer (offset = 2, data size = 3,
    308 pointer count = 2) followed by a text pointer (offset = 6, length = 53)) and its packed form:
    309 
    310     unpacked (hex):  08 00 00 00 03 00 02 00   19 00 00 00 aa 01 00 00
    311     packed (hex):  51 08 03 02   31 19 aa 01
    312 
    313 In addition to the above, there are two tag values which are treated specially:  0x00 and 0xff.
    314 
    315 * 0x00:  The tag is followed by a single byte which indicates a count of consecutive zero-valued
    316   words, minus 1.  E.g. if the tag 0x00 is followed by 0x05, the sequence unpacks to 6 words of
    317   zero.
    318 
    319   Or, put another way: the tag is first decoded as if it were not special.  Since none of the bits
    320   are set, it is followed by no bytes and expands to a word full of zeros.  After that, the next
    321   byte is interpreted as a count of _additional_ words that are also all-zero.
    322 
    323 * 0xff:  The tag is followed by the bytes of the word (as if it weren't special), but after those
    324   bytes is another byte with value N.  Following that byte is N unpacked words that should be copied
    325   directly.  These unpacked words may or may not contain zeros -- it is up to the compressor to
    326   decide when to end the unpacked span and return to packing each word.  The purpose of this rule
    327   is to minimize the impact of packing on data that doesn't contain any zeros -- in particular,
    328   long text blobs.  Because of this rule, the worst-case space overhead of packing is 2 bytes per
    329   2 KiB of input (256 words = 2KiB).
    330 
    331 Examples:
    332 
    333     unpacked (hex):  00 (x 32 bytes)
    334     packed (hex):  00 03
    335 
    336     unpacked (hex):  8a (x 32 bytes)
    337     packed (hex):  ff 8a (x 8 bytes) 03 8a (x 24 bytes)
    338 
    339 Notice that both of the special cases begin by treating the tag as if it weren't special.  This
    340 is intentionally designed to make encoding faster:  you can compute the tag value and encode the
    341 bytes in a single pass through the input word.  Only after you've finished with that word do you
    342 need to check whether the tag ended up being 0x00 or 0xff.
    343 
    344 It is possible to write both an encoder and a decoder which only branch at the end of each word,
    345 and only to handle the two special tags.  It is not necessary to branch on every byte.  See the
    346 C++ reference implementation for an example.
    347 
    348 Packing is normally applied on top of the standard stream framing described in the previous
    349 section.
    350 
    351 ### Compression
    352 
    353 When Cap'n Proto messages may contain repetitive data (especially, large text blobs), it makes sense
    354 to apply a standard compression algorithm in addition to packing. When CPU time is scarce, we
    355 recommend [LZ4 compression](https://code.google.com/p/lz4/). Otherwise, [zlib](http://www.zlib.net)
    356 is slower but will compress more.
    357 
    358 ## Canonicalization
    359 
    360 Cap'n Proto messages have a well-defined canonical form. Cap'n Proto encoders are NOT required to
    361 output messages in canonical form, and in fact they will almost never do so by default. However,
    362 it is possible to write code which canonicalizes a Cap'n Proto message without knowing its schema.
    363 
    364 A canonical Cap'n Proto message must adhere to the following rules:
    365 
    366 * The object tree must be encoded in preorder (with respect to the order of the pointers within
    367   each object).
    368 * The message must be encoded as a single segment. (When signing or hashing a canonical Cap'n Proto
    369   message, the segment table shall not be included, because it would be redundant.)
    370 * Trailing zero-valued words in a struct's data or pointer segments must be truncated. Since zero
    371   represents a default value, this does not change the struct's meaning. This rule is important
    372   to ensure that adding a new field to a struct does not affect the canonical encoding of messages
    373   that do not set that field.
    374 * Similarly, for a struct list, if a trailing word in a section of all structs in the list is zero,
    375   then it must be truncated from all structs in the list. (All structs in a struct list must have
    376   equal sizes, hence a trailing zero can only be removed if it is zero in all elements.)
    377 * Any struct pointer pointing to a zero-sized struct should have an
    378   offset of -1.
    379   * Note that this applies _only_ to structs; other zero-sized values should have offsets
    380     allocated in preorder, as normal.
    381 * Canonical messages are not packed. However, packing can still be applied for transmission
    382   purposes; the message must simply be unpacked before checking signatures.
    383 
    384 Note that Cap'n Proto 0.5 introduced the rule that struct lists must always be encoded using
    385 C = 7 in the [list pointer](#lists). Prior versions of Cap'n Proto allowed struct lists to be
    386 encoded using any element size, so that small structs could be compacted to take less than a word
    387 per element, and many encoders in fact implemented this. Unfortunately, this "optimization" made
    388 canonicalization impossible without knowing the schema, which is a significant obstacle. Therefore,
    389 the rules have been changed in 0.5, but data written by previous versions may not be possible to
    390 canonicalize.
    391 
    392 ## Security Considerations
    393 
    394 A naive implementation of a Cap'n Proto reader may be vulnerable to attacks based on various kinds
    395 of malicious input. Implementations MUST guard against these.
    396 
    397 ### Pointer Validation
    398 
    399 Cap'n Proto readers must validate pointers, e.g. to check that the target object is within the
    400 bounds of its segment. To avoid an upfront scan of the message (which would defeat Cap'n Proto's
    401 O(1) parsing performance), validation should occur lazily when the getter method for a pointer is
    402 called, throwing an exception or returning a default value if the pointer is invalid.
    403 
    404 ### Amplification attack
    405 
    406 A message containing cyclic (or even just overlapping) pointers can cause the reader to go into
    407 an infinite loop while traversing the content.
    408 
    409 To defend against this, as the application traverses the message, each time a pointer is
    410 dereferenced, a counter should be incremented by the size of the data to which it points.  If this
    411 counter goes over some limit, an error should be raised, and/or default values should be returned. We call this limit the "traversal limit" (or, sometimes, the "read limit").
    412 
    413 The C++ implementation currently defaults to a limit of 64MiB, but allows the caller to set a
    414 different limit if desired. Another reasonable strategy is to set the limit to some multiple of
    415 the original message size; however, most applications should place limits on overall message sizes
    416 anyway, so it makes sense to have one check cover both.
    417 
    418 **List amplification:** A list of `Void` values or zero-size structs can have a very large element count while taking constant space on the wire. If the receiving application expects a list of structs, it will see these zero-sized elements as valid structs set to their default values. If it iterates through the list processing each element, it could spend a large amount of CPU time or other resources despite the message being small. To defend against this, the "traversal limit" should count a list of zero-sized elements as if each element were one word instead. This rule was introduced in the C++ implementation in [commit 1048706](https://github.com/sandstorm-io/capnproto/commit/104870608fde3c698483fdef6b97f093fc15685d).
    419 
    420 ### Stack overflow DoS attack
    421 
    422 A message with deeply-nested objects can cause a stack overflow in typical code which processes
    423 messages recursively.
    424 
    425 To defend against this, as the application traverses the message, the pointer depth should be
    426 tracked. If it goes over some limit, an error should be raised.  The C++ implementation currently
    427 defaults to a limit of 64 pointers, but allows the caller to set a different limit.
	capnproto FORK: Cap'n Proto serialization/RPC system - core tools and C++ library
	git clone https://git.neptards.moe/neptards/capnproto.git
	Log \| Files \| Refs \| README \| LICENSE