cxx.md - capnproto - FORK: Cap'n Proto serialization/RPC system

cxx.md (45348B)
      1 ---
      2 layout: page
      3 title: C++ Serialization
      4 ---
      5 
      6 # C++ Serialization
      7 
      8 The Cap'n Proto C++ runtime implementation provides an easy-to-use interface for manipulating
      9 messages backed by fast pointer arithmetic.  This page discusses the serialization layer of
     10 the runtime; see [C++ RPC](cxxrpc.html) for information about the RPC layer.
     11 
     12 ## Example Usage
     13 
     14 For the Cap'n Proto definition:
     15 
     16 {% highlight capnp %}
     17 struct Person {
     18   id @0 :UInt32;
     19   name @1 :Text;
     20   email @2 :Text;
     21   phones @3 :List(PhoneNumber);
     22 
     23   struct PhoneNumber {
     24     number @0 :Text;
     25     type @1 :Type;
     26 
     27     enum Type {
     28       mobile @0;
     29       home @1;
     30       work @2;
     31     }
     32   }
     33 
     34   employment :union {
     35     unemployed @4 :Void;
     36     employer @5 :Text;
     37     school @6 :Text;
     38     selfEmployed @7 :Void;
     39     # We assume that a person is only one of these.
     40   }
     41 }
     42 
     43 struct AddressBook {
     44   people @0 :List(Person);
     45 }
     46 {% endhighlight %}
     47 
     48 You might write code like:
     49 
     50 {% highlight c++ %}
     51 #include "addressbook.capnp.h"
     52 #include <capnp/message.h>
     53 #include <capnp/serialize-packed.h>
     54 #include <iostream>
     55 
     56 void writeAddressBook(int fd) {
     57   ::capnp::MallocMessageBuilder message;
     58 
     59   AddressBook::Builder addressBook = message.initRoot<AddressBook>();
     60   ::capnp::List<Person>::Builder people = addressBook.initPeople(2);
     61 
     62   Person::Builder alice = people[0];
     63   alice.setId(123);
     64   alice.setName("Alice");
     65   alice.setEmail("alice@example.com");
     66   // Type shown for explanation purposes; normally you'd use auto.
     67   ::capnp::List<Person::PhoneNumber>::Builder alicePhones =
     68       alice.initPhones(1);
     69   alicePhones[0].setNumber("555-1212");
     70   alicePhones[0].setType(Person::PhoneNumber::Type::MOBILE);
     71   alice.getEmployment().setSchool("MIT");
     72 
     73   Person::Builder bob = people[1];
     74   bob.setId(456);
     75   bob.setName("Bob");
     76   bob.setEmail("bob@example.com");
     77   auto bobPhones = bob.initPhones(2);
     78   bobPhones[0].setNumber("555-4567");
     79   bobPhones[0].setType(Person::PhoneNumber::Type::HOME);
     80   bobPhones[1].setNumber("555-7654");
     81   bobPhones[1].setType(Person::PhoneNumber::Type::WORK);
     82   bob.getEmployment().setUnemployed();
     83 
     84   writePackedMessageToFd(fd, message);
     85 }
     86 
     87 void printAddressBook(int fd) {
     88   ::capnp::PackedFdMessageReader message(fd);
     89 
     90   AddressBook::Reader addressBook = message.getRoot<AddressBook>();
     91 
     92   for (Person::Reader person : addressBook.getPeople()) {
     93     std::cout << person.getName().cStr() << ": "
     94               << person.getEmail().cStr() << std::endl;
     95     for (Person::PhoneNumber::Reader phone: person.getPhones()) {
     96       const char* typeName = "UNKNOWN";
     97       switch (phone.getType()) {
     98         case Person::PhoneNumber::Type::MOBILE: typeName = "mobile"; break;
     99         case Person::PhoneNumber::Type::HOME: typeName = "home"; break;
    100         case Person::PhoneNumber::Type::WORK: typeName = "work"; break;
    101       }
    102       std::cout << "  " << typeName << " phone: "
    103                 << phone.getNumber().cStr() << std::endl;
    104     }
    105     Person::Employment::Reader employment = person.getEmployment();
    106     switch (employment.which()) {
    107       case Person::Employment::UNEMPLOYED:
    108         std::cout << "  unemployed" << std::endl;
    109         break;
    110       case Person::Employment::EMPLOYER:
    111         std::cout << "  employer: "
    112                   << employment.getEmployer().cStr() << std::endl;
    113         break;
    114       case Person::Employment::SCHOOL:
    115         std::cout << "  student at: "
    116                   << employment.getSchool().cStr() << std::endl;
    117         break;
    118       case Person::Employment::SELF_EMPLOYED:
    119         std::cout << "  self-employed" << std::endl;
    120         break;
    121     }
    122   }
    123 }
    124 {% endhighlight %}
    125 
    126 ## C++ Feature Usage:  C++11, Exceptions
    127 
    128 This implementation makes use of C++11 features.  If you are using GCC, you will need at least
    129 version 4.7 to compile Cap'n Proto.  If you are using Clang, you will need at least version 3.2.
    130 These compilers required the flag `-std=c++11` to enable C++11 features -- your code which
    131 `#include`s Cap'n Proto headers will need to be compiled with this flag.  Other compilers have not
    132 been tested at this time.
    133 
    134 This implementation prefers to handle errors using exceptions.  Exceptions are only used in
    135 circumstances that should never occur in normal operation.  For example, exceptions are thrown
    136 on assertion failures (indicating bugs in the code), network failures, and invalid input.
    137 Exceptions thrown by Cap'n Proto are never part of the interface and never need to be caught in
    138 correct usage.  The purpose of throwing exceptions is to allow higher-level code a chance to
    139 recover from unexpected circumstances without disrupting other work happening in the same process.
    140 For example, a server that handles requests from multiple clients should, on exception, return an
    141 error to the client that caused the exception and close that connection, but should continue
    142 handling other connections normally.
    143 
    144 When Cap'n Proto code might throw an exception from a destructor, it first checks
    145 `std::uncaught_exception()` to ensure that this is safe.  If another exception is already active,
    146 the new exception is assumed to be a side-effect of the main exception, and is either silently
    147 swallowed or reported on a side channel.
    148 
    149 In recognition of the fact that some teams prefer not to use exceptions, and that even enabling
    150 exceptions in the compiler introduces overhead, Cap'n Proto allows you to disable them entirely
    151 by registering your own exception callback.  The callback will be called in place of throwing an
    152 exception.  The callback may abort the process, and is required to do so in certain circumstances
    153 (when a fatal bug is detected).  If the callback returns normally, Cap'n Proto will attempt
    154 to continue by inventing "safe" values.  This will lead to garbage output, but at least the program
    155 will not crash.  Your exception callback should set some sort of a flag indicating that an error
    156 occurred, and somewhere up the stack you should check for that flag and cancel the operation.
    157 See the header `kj/exception.h` for details on how to register an exception callback.
    158 
    159 ## KJ Library
    160 
    161 Cap'n Proto is built on top of a basic utility library called KJ.  The two were actually developed
    162 together -- KJ is simply the stuff which is not specific to Cap'n Proto serialization, and may be
    163 useful to others independently of Cap'n Proto.  For now, the the two are distributed together.  The
    164 name "KJ" has no particular meaning; it was chosen to be short and easy-to-type.
    165 
    166 As of v0.3, KJ is distributed with Cap'n Proto but built as a separate library.  You may need
    167 to explicitly link against libraries:  `-lcapnp -lkj`
    168 
    169 ## Generating Code
    170 
    171 To generate C++ code from your `.capnp` [interface definition](language.html), run:
    172 
    173     capnp compile -oc++ myproto.capnp
    174 
    175 This will create `myproto.capnp.h` and `myproto.capnp.c++` in the same directory as `myproto.capnp`.
    176 
    177 To use this code in your app, you must link against both `libcapnp` and `libkj`.  If you use
    178 `pkg-config`, Cap'n Proto provides the `capnp` module to simplify discovery of compiler and linker
    179 flags.
    180 
    181 If you use [RPC](cxxrpc.html) (i.e., your schema defines [interfaces](language.html#interfaces)),
    182 then you will additionally nead to link against `libcapnp-rpc` and `libkj-async`, or use the
    183 `capnp-rpc` `pkg-config` module.
    184 
    185 ### Setting a Namespace
    186 
    187 You probably want your generated types to live in a C++ namespace.  You will need to import
    188 `/capnp/c++.capnp` and use the `namespace` annotation it defines:
    189 
    190 {% highlight capnp %}
    191 using Cxx = import "/capnp/c++.capnp";
    192 $Cxx.namespace("foo::bar::baz");
    193 {% endhighlight %}
    194 
    195 Note that `capnp/c++.capnp` is installed in `$PREFIX/include` (`/usr/local/include` by default)
    196 when you install the C++ runtime.  The `capnp` tool automatically searches `/usr/include` and
    197 `/usr/local/include` for imports that start with a `/`, so it should "just work".  If you installed
    198 somewhere else, you may need to add it to the search path with the `-I` flag to `capnp compile`,
    199 which works much like the compiler flag of the same name.
    200 
    201 ## Types
    202 
    203 ### Primitive Types
    204 
    205 Primitive types map to the obvious C++ types:
    206 
    207 * `Bool` -> `bool`
    208 * `IntNN` -> `intNN_t`
    209 * `UIntNN` -> `uintNN_t`
    210 * `Float32` -> `float`
    211 * `Float64` -> `double`
    212 * `Void` -> `::capnp::Void` (An empty struct; its only value is `::capnp::VOID`)
    213 
    214 ### Structs
    215 
    216 For each struct `Foo` in your interface, a C++ type named `Foo` generated.  This type itself is
    217 really just a namespace; it contains two important inner classes:  `Reader` and `Builder`.
    218 
    219 `Reader` represents a read-only instance of `Foo` while `Builder` represents a writable instance
    220 (usually, one that you are building).  Both classes behave like pointers, in that you can pass them
    221 by value and they do not own the underlying data that they operate on.  In other words,
    222 `Foo::Builder` is like a pointer to a `Foo` while `Foo::Reader` is like a const pointer to a `Foo`.
    223 
    224 For every field `bar` defined in `Foo`, `Foo::Reader` has a method `getBar()`.  For primitive types,
    225 `get` just returns the type, but for structs, lists, and blobs, it returns a `Reader` for the
    226 type.
    227 
    228 {% highlight c++ %}
    229 // Example Reader methods:
    230 
    231 // myPrimitiveField @0 :Int32;
    232 int32_t getMyPrimitiveField();
    233 
    234 // myTextField @1 :Text;
    235 ::capnp::Text::Reader getMyTextField();
    236 // (Note that Text::Reader may be implicitly cast to const char* and
    237 // std::string.)
    238 
    239 // myStructField @2 :MyStruct;
    240 MyStruct::Reader getMyStructField();
    241 
    242 // myListField @3 :List(Float64);
    243 ::capnp::List<double> getMyListField();
    244 {% endhighlight %}
    245 
    246 `Foo::Builder`, meanwhile, has several methods for each field `bar`:
    247 
    248 * `getBar()`:  For primitives, returns the value.  For composites, returns a Builder for the
    249   composite.  If a composite field has not been initialized (i.e. this is the first time it has
    250   been accessed), it will be initialized to a copy of the field's default value before returning.
    251 * `setBar(x)`:  For primitives, sets the value to x.  For composites, sets the value to a deep copy
    252   of x, which must be a Reader for the type.
    253 * `initBar(n)`:  Only for lists and blobs.  Sets the field to a newly-allocated list or blob
    254   of size n and returns a Builder for it.  The elements of the list are initialized to their empty
    255   state (zero for numbers, default values for structs).
    256 * `initBar()`:  Only for structs.  Sets the field to a newly-allocated struct and returns a
    257   Builder for it.  Note that the newly-allocated struct is initialized to the default value for
    258   the struct's _type_ (i.e., all-zero) rather than the default value for the field `bar` (if it
    259   has one).
    260 * `hasBar()`:  Only for pointer fields (e.g. structs, lists, blobs).  Returns true if the pointer
    261   has been initialized (non-null).  (This method is also available on readers.)
    262 * `adoptBar(x)`:  Only for pointer fields.  Adopts the orphaned object x, linking it into the field
    263   `bar` without copying.  See the section on orphans.
    264 * `disownBar()`:  Disowns the value pointed to by `bar`, setting the pointer to null and returning
    265   its previous value as an orphan.  See the section on orphans.
    266 
    267 {% highlight c++ %}
    268 // Example Builder methods:
    269 
    270 // myPrimitiveField @0 :Int32;
    271 int32_t getMyPrimitiveField();
    272 void setMyPrimitiveField(int32_t value);
    273 
    274 // myTextField @1 :Text;
    275 ::capnp::Text::Builder getMyTextField();
    276 void setMyTextField(::capnp::Text::Reader value);
    277 ::capnp::Text::Builder initMyTextField(size_t size);
    278 // (Note that Text::Reader is implicitly constructable from const char*
    279 // and std::string, and Text::Builder can be implicitly cast to
    280 // these types.)
    281 
    282 // myStructField @2 :MyStruct;
    283 MyStruct::Builder getMyStructField();
    284 void setMyStructField(MyStruct::Reader value);
    285 MyStruct::Builder initMyStructField();
    286 
    287 // myListField @3 :List(Float64);
    288 ::capnp::List<double>::Builder getMyListField();
    289 void setMyListField(::capnp::List<double>::Reader value);
    290 ::capnp::List<double>::Builder initMyListField(size_t size);
    291 {% endhighlight %}
    292 
    293 ### Groups
    294 
    295 Groups look a lot like a combination of a nested type and a field of that type, except that you
    296 cannot set, adopt, or disown a group -- you can only get and init it.
    297 
    298 ### Unions
    299 
    300 A named union (as opposed to an unnamed one) works just like a group, except with some additions:
    301 
    302 * For each field `foo`, the union reader and builder have a method `isFoo()` which returns true
    303   if `foo` is the currently-set field in the union.
    304 * The union reader and builder also have a method `which()` that returns an enum value indicating
    305   which field is currently set.
    306 * Calling the set, init, or adopt accessors for a field makes it the currently-set field.
    307 * Calling the get or disown accessors on a field that isn't currently set will throw an
    308   exception in debug mode or return garbage when `NDEBUG` is defined.
    309 
    310 Unnamed unions differ from named unions only in that the accessor methods from the union's members
    311 are added directly to the containing type's reader and builder, rather than generating a nested
    312 type.
    313 
    314 See the [example](#example-usage) at the top of the page for an example of unions.
    315 
    316 ### Lists
    317 
    318 Lists are represented by the type `capnp::List<T>`, where `T` is any of the primitive types,
    319 any Cap'n Proto user-defined type, `capnp::Text`, `capnp::Data`, or `capnp::List<U>`
    320 (to form a list of lists).
    321 
    322 The type `List<T>` itself is not instantiatable, but has two inner classes: `Reader` and `Builder`.
    323 As with structs, these types behave like pointers to read-only and read-write data, respectively.
    324 
    325 Both `Reader` and `Builder` implement `size()`, `operator[]`, `begin()`, and `end()`, as good C++
    326 containers should.  Note, though, that `operator[]` is read-only -- you cannot use it to assign
    327 the element, because that would require returning a reference, which is impossible because the
    328 underlying data may not be in your CPU's native format (e.g., wrong byte order).  Instead, to
    329 assign an element of a list, you must use `builder.set(index, value)`.
    330 
    331 For `List<Foo>` where `Foo` is a non-primitive type, the type returned by `operator[]` and
    332 `iterator::operator*()` is `Foo::Reader` (for `List<Foo>::Reader`) or `Foo::Builder`
    333 (for `List<Foo>::Builder`).  The builder's `set` method takes a `Foo::Reader` as its second
    334 parameter.
    335 
    336 For lists of lists or lists of blobs, the builder also has a method `init(index, size)` which sets
    337 the element at the given index to a newly-allocated value with the given size and returns a builder
    338 for it.  Struct lists do not have an `init` method because all elements are initialized to empty
    339 values when the list is created.
    340 
    341 ### Enums
    342 
    343 Cap'n Proto enums become C++11 "enum classes".  That means they behave like any other enum, but
    344 the enum's values are scoped within the type.  E.g. for an enum `Foo` with value `bar`, you must
    345 refer to the value as `Foo::BAR`.
    346 
    347 To match prevaling C++ style, an enum's value names are converted to UPPERCASE_WITH_UNDERSCORES
    348 (whereas in the schema language you'd write them in camelCase).
    349 
    350 Keep in mind when writing `switch` blocks that an enum read off the wire may have a numeric
    351 value that is not listed in its definition.  This may be the case if the sender is using a newer
    352 version of the protocol, or if the message is corrupt or malicious.  In C++11, enums are allowed
    353 to have any value that is within the range of their base type, which for Cap'n Proto enums is
    354 `uint16_t`.
    355 
    356 ### Blobs (Text and Data)
    357 
    358 Blobs are manipulated using the classes `capnp::Text` and `capnp::Data`.  These classes are,
    359 again, just containers for inner classes `Reader` and `Builder`.  These classes are iterable and
    360 implement `size()` and `operator[]` methods.  `Builder::operator[]` even returns a reference
    361 (unlike with `List<T>`).  `Text::Reader` additionally has a method `cStr()` which returns a
    362 NUL-terminated `const char*`.
    363 
    364 As a special convenience, if you are using GCC 4.8+ or Clang, `Text::Reader` (and its underlying
    365 type, `kj::StringPtr`) can be implicitly converted to and from `std::string` format.  This is
    366 accomplished without actually `#include`ing `<string>`, since some clients do not want to rely
    367 on this rather-bulky header.  In fact, any class which defines a `.c_str()` method will be
    368 implicitly convertible in this way.  Unfortunately, this trick doesn't work on GCC 4.7.
    369 
    370 ### Interfaces
    371 
    372 [Interfaces (RPC) have their own page.](cxxrpc.html)
    373 
    374 ### Generics
    375 
    376 [Generic types](language.html#generic-types) become templates in C++. The outer type (the one whose
    377 name matches the schema declaration's name) is templatized; the inner `Reader` and `Builder` types
    378 are not, because they inherit the parameters from the outer type. Similarly, template parameters
    379 should refer to outer types, not `Reader` or `Builder` types.
    380 
    381 For example, given:
    382 
    383 {% highlight capnp %}
    384 struct Map(Key, Value) {
    385   entries @0 :List(Entry);
    386   struct Entry {
    387     key @0 :Key;
    388     value @1 :Value;
    389   }
    390 }
    391 
    392 struct People {
    393   byName @0 :Map(Text, Person);
    394   # Maps names to Person instances.
    395 }
    396 {% endhighlight %}
    397 
    398 You might write code like:
    399 
    400 {% highlight c++ %}
    401 void processPeople(People::Reader people) {
    402   Map<Text, Person>::Reader reader = people.getByName();
    403   capnp::List<Map<Text, Person>::Entry>::Reader entries =
    404       reader.getEntries()
    405   for (auto entry: entries) {
    406     processPerson(entry);
    407   }
    408 }
    409 {% endhighlight %}
    410 
    411 Note that all template parameters will be specified with a default value of `AnyPointer`.
    412 Therefore, the type `Map<>` is equivalent to `Map<capnp::AnyPointer, capnp::AnyPointer>`.
    413 
    414 ### Constants
    415 
    416 Constants are exposed with their names converted to UPPERCASE_WITH_UNDERSCORES naming style
    417 (whereas in the schema language you’d write them in camelCase).  Primitive constants are just
    418 `constexpr` values.  Pointer-type constants (e.g. structs, lists, and blobs) are represented
    419 using a proxy object that can be converted to the relevant `Reader` type, either implicitly or
    420 using the unary `*` or `->` operators.
    421 
    422 ## Messages and I/O
    423 
    424 To create a new message, you must start by creating a `capnp::MessageBuilder`
    425 (`capnp/message.h`).  This is an abstract type which you can implement yourself, but most users
    426 will want to use `capnp::MallocMessageBuilder`.  Once your message is constructed, write it to
    427 a file descriptor with `capnp::writeMessageToFd(fd, builder)` (`capnp/serialize.h`) or
    428 `capnp::writePackedMessageToFd(fd, builder)` (`capnp/serialize-packed.h`).
    429 
    430 To read a message, you must create a `capnp::MessageReader`, which is another abstract type.
    431 Implementations are specific to the data source.  You can use `capnp::StreamFdMessageReader`
    432 (`capnp/serialize.h`) or `capnp::PackedFdMessageReader` (`capnp/serialize-packed.h`)
    433 to read from file descriptors; both take the file descriptor as a constructor argument.
    434 
    435 Note that if your stream contains additional data after the message, `PackedFdMessageReader` may
    436 accidentally read some of that data, since it does buffered I/O.  To make this work correctly, you
    437 will need to set up a multi-use buffered stream.  Buffered I/O may also be a good idea with
    438 `StreamFdMessageReader` and also when writing, for performance reasons.  See `capnp/io.h` for
    439 details.
    440 
    441 There is an [example](#example-usage) of all this at the beginning of this page.
    442 
    443 ### Using mmap
    444 
    445 Cap'n Proto can be used together with `mmap()` (or Win32's `MapViewOfFile()`) for extremely fast
    446 reads, especially when you only need to use a subset of the data in the file.  Currently,
    447 Cap'n Proto is not well-suited for _writing_ via `mmap()`, only reading, but this is only because
    448 we have not yet invented a mutable segment framing format -- the underlying design should
    449 eventually work for both.
    450 
    451 To take advantage of `mmap()` at read time, write your file in regular serialized (but NOT packed)
    452 format -- that is, use `writeMessageToFd()`, _not_ `writePackedMessageToFd()`.  Now, `mmap()` in
    453 the entire file, and then pass the mapped memory to the constructor of
    454 `capnp::FlatArrayMessageReader` (defined in `capnp/serialize.h`).  That's it.  You can use the
    455 reader just like a normal `StreamFdMessageReader`.  The operating system will automatically page
    456 in data from disk as you read it.
    457 
    458 `mmap()` works best when reading from flash media, or when the file is already hot in cache.
    459 It works less well with slow rotating disks.  Here, disk seeks make random access relatively
    460 expensive.  Also, if I/O throughput is your bottleneck, then the fact that mmaped data cannot
    461 be packed or compressed may hurt you.  However, it all depends on what fraction of the file you're
    462 actually reading -- if you only pull one field out of one deeply-nested struct in a huge tree, it
    463 may still be a win.  The only way to know for sure is to do benchmarks!  (But be careful to make
    464 sure your benchmark is actually interacting with disk and not cache.)
    465 
    466 ## Dynamic Reflection
    467 
    468 Sometimes you want to write generic code that operates on arbitrary types, iterating over the
    469 fields or looking them up by name.  For example, you might want to write code that encodes
    470 arbitrary Cap'n Proto types in JSON format.  This requires something like "reflection", but C++
    471 does not offer reflection.  Also, you might even want to operate on types that aren't compiled
    472 into the binary at all, but only discovered at runtime.
    473 
    474 The C++ API supports inspecting schemas at runtime via the interface defined in
    475 `capnp/schema.h`, and dynamically reading and writing instances of arbitrary types via
    476 `capnp/dynamic.h`.  Here's the example from the beginning of this file rewritten in terms
    477 of the dynamic API:
    478 
    479 {% highlight c++ %}
    480 #include "addressbook.capnp.h"
    481 #include <capnp/message.h>
    482 #include <capnp/serialize-packed.h>
    483 #include <iostream>
    484 #include <capnp/schema.h>
    485 #include <capnp/dynamic.h>
    486 
    487 using ::capnp::DynamicValue;
    488 using ::capnp::DynamicStruct;
    489 using ::capnp::DynamicEnum;
    490 using ::capnp::DynamicList;
    491 using ::capnp::List;
    492 using ::capnp::Schema;
    493 using ::capnp::StructSchema;
    494 using ::capnp::EnumSchema;
    495 
    496 using ::capnp::Void;
    497 using ::capnp::Text;
    498 using ::capnp::MallocMessageBuilder;
    499 using ::capnp::PackedFdMessageReader;
    500 
    501 void dynamicWriteAddressBook(int fd, StructSchema schema) {
    502   // Write a message using the dynamic API to set each
    503   // field by text name.  This isn't something you'd
    504   // normally want to do; it's just for illustration.
    505 
    506   MallocMessageBuilder message;
    507 
    508   // Types shown for explanation purposes; normally you'd
    509   // use auto.
    510   DynamicStruct::Builder addressBook =
    511       message.initRoot<DynamicStruct>(schema);
    512 
    513   DynamicList::Builder people =
    514       addressBook.init("people", 2).as<DynamicList>();
    515 
    516   DynamicStruct::Builder alice =
    517       people[0].as<DynamicStruct>();
    518   alice.set("id", 123);
    519   alice.set("name", "Alice");
    520   alice.set("email", "alice@example.com");
    521   auto alicePhones = alice.init("phones", 1).as<DynamicList>();
    522   auto phone0 = alicePhones[0].as<DynamicStruct>();
    523   phone0.set("number", "555-1212");
    524   phone0.set("type", "mobile");
    525   alice.get("employment").as<DynamicStruct>()
    526        .set("school", "MIT");
    527 
    528   auto bob = people[1].as<DynamicStruct>();
    529   bob.set("id", 456);
    530   bob.set("name", "Bob");
    531   bob.set("email", "bob@example.com");
    532 
    533   // Some magic:  We can convert a dynamic sub-value back to
    534   // the native type with as<T>()!
    535   List<Person::PhoneNumber>::Builder bobPhones =
    536       bob.init("phones", 2).as<List<Person::PhoneNumber>>();
    537   bobPhones[0].setNumber("555-4567");
    538   bobPhones[0].setType(Person::PhoneNumber::Type::HOME);
    539   bobPhones[1].setNumber("555-7654");
    540   bobPhones[1].setType(Person::PhoneNumber::Type::WORK);
    541   bob.get("employment").as<DynamicStruct>()
    542      .set("unemployed", ::capnp::VOID);
    543 
    544   writePackedMessageToFd(fd, message);
    545 }
    546 
    547 void dynamicPrintValue(DynamicValue::Reader value) {
    548   // Print an arbitrary message via the dynamic API by
    549   // iterating over the schema.  Look at the handling
    550   // of STRUCT in particular.
    551 
    552   switch (value.getType()) {
    553     case DynamicValue::VOID:
    554       std::cout << "";
    555       break;
    556     case DynamicValue::BOOL:
    557       std::cout << (value.as<bool>() ? "true" : "false");
    558       break;
    559     case DynamicValue::INT:
    560       std::cout << value.as<int64_t>();
    561       break;
    562     case DynamicValue::UINT:
    563       std::cout << value.as<uint64_t>();
    564       break;
    565     case DynamicValue::FLOAT:
    566       std::cout << value.as<double>();
    567       break;
    568     case DynamicValue::TEXT:
    569       std::cout << '\"' << value.as<Text>().cStr() << '\"';
    570       break;
    571     case DynamicValue::LIST: {
    572       std::cout << "[";
    573       bool first = true;
    574       for (auto element: value.as<DynamicList>()) {
    575         if (first) {
    576           first = false;
    577         } else {
    578           std::cout << ", ";
    579         }
    580         dynamicPrintValue(element);
    581       }
    582       std::cout << "]";
    583       break;
    584     }
    585     case DynamicValue::ENUM: {
    586       auto enumValue = value.as<DynamicEnum>();
    587       KJ_IF_MAYBE(enumerant, enumValue.getEnumerant()) {
    588         std::cout <<
    589             enumerant->getProto().getName().cStr();
    590       } else {
    591         // Unknown enum value; output raw number.
    592         std::cout << enumValue.getRaw();
    593       }
    594       break;
    595     }
    596     case DynamicValue::STRUCT: {
    597       std::cout << "(";
    598       auto structValue = value.as<DynamicStruct>();
    599       bool first = true;
    600       for (auto field: structValue.getSchema().getFields()) {
    601         if (!structValue.has(field)) continue;
    602         if (first) {
    603           first = false;
    604         } else {
    605           std::cout << ", ";
    606         }
    607         std::cout << field.getProto().getName().cStr()
    608                   << " = ";
    609         dynamicPrintValue(structValue.get(field));
    610       }
    611       std::cout << ")";
    612       break;
    613     }
    614     default:
    615       // There are other types, we aren't handling them.
    616       std::cout << "?";
    617       break;
    618   }
    619 }
    620 
    621 void dynamicPrintMessage(int fd, StructSchema schema) {
    622   PackedFdMessageReader message(fd);
    623   dynamicPrintValue(message.getRoot<DynamicStruct>(schema));
    624   std::cout << std::endl;
    625 }
    626 {% endhighlight %}
    627 
    628 Notes about the dynamic API:
    629 
    630 * You can implicitly cast any compiled Cap'n Proto struct reader/builder type directly to
    631   `DynamicStruct::Reader`/`DynamicStruct::Builder`.  Similarly with `List<T>` and `DynamicList`,
    632   and even enum types and `DynamicEnum`.  Finally, all valid Cap'n Proto field types may be
    633   implicitly converted to `DynamicValue`.
    634 
    635 * You can load schemas dynamically at runtime using `SchemaLoader` (`capnp/schema-loader.h`) and
    636   use the Dynamic API to manipulate objects of these types.  `MessageBuilder` and `MessageReader`
    637   have methods for accessing the message root using a dynamic schema.
    638 
    639 * While `SchemaLoader` loads binary schemas, you can also parse directly from text using
    640   `SchemaParser` (`capnp/schema-parser.h`).  However, this requires linking against `libcapnpc`
    641   (in addition to `libcapnp` and `libkj`) -- this code is bulky and not terribly efficient.  If
    642   you can arrange to use only binary schemas at runtime, you'll be better off.
    643 
    644 * Unlike with Protobufs, there is no "global registry" of compiled-in types.  To get the schema
    645   for a compiled-in type, use `capnp::Schema::from<MyType>()`.
    646 
    647 * Unlike with Protobufs, the overhead of supporting reflection is small.  Generated `.capnp.c++`
    648   files contain only some embedded const data structures describing the schema, no code at all,
    649   and the runtime library support code is relatively small.  Moreover, if you do not use the
    650   dynamic API or the schema API, you do not even need to link their implementations into your
    651   executable.
    652 
    653 * The dynamic API performs type checks at runtime.  In case of error, it will throw an exception.
    654   If you compile with `-fno-exceptions`, it will crash instead.  Correct usage of the API should
    655   never throw, but bugs happen.  Enabling and catching exceptions will make your code more robust.
    656 
    657 * Loading user-provided schemas has security implications: it greatly increases the attack
    658   surface of the Cap'n Proto library.  In particular, it is easy for an attacker to trigger
    659   exceptions.  To protect yourself, you are strongly advised to enable exceptions and catch them.
    660 
    661 ## Orphans
    662 
    663 An "orphan" is a Cap'n Proto object that is disconnected from the message structure.  That is,
    664 it is not the root of a message, and there is no other Cap'n Proto object holding a pointer to it.
    665 Thus, it has no parents.  Orphans are an advanced feature that can help avoid copies and make it
    666 easier to use Cap'n Proto objects as part of your application's internal state.  Typical
    667 applications probably won't use orphans.
    668 
    669 The class `capnp::Orphan<T>` (defined in `<capnp/orphan.h>`) represents a pointer to an orphaned
    670 object of type `T`.  `T` can be any struct type, `List<T>`, `Text`, or `Data`.  E.g.
    671 `capnp::Orphan<Person>` would be an orphaned `Person` structure.  `Orphan<T>` is a move-only class,
    672 similar to `std::unique_ptr<T>`.  This prevents two different objects from adopting the same
    673 orphan, which would result in an invalid message.
    674 
    675 An orphan can be "adopted" by another object to link it into the message structure.  Conversely,
    676 an object can "disown" one of its pointers, causing the pointed-to object to become an orphan.
    677 Every pointer-typed field `foo` provides builder methods `adoptFoo()` and `disownFoo()` for these
    678 purposes.  Again, these methods use C++11 move semantics.  To use them, you will need to be
    679 familiar with `std::move()` (or the equivalent but shorter-named `kj::mv()`).
    680 
    681 Even though an orphan is unlinked from the message tree, it still resides inside memory allocated
    682 for a particular message (i.e. a particular `MessageBuilder`).  An orphan can only be adopted by
    683 objects that live in the same message.  To move objects between messages, you must perform a copy.
    684 If the message is serialized while an `Orphan<T>` living within it still exists, the orphan's
    685 content will be part of the serialized message, but the only way the receiver could find it is by
    686 investigating the raw message; the Cap'n Proto API provides no way to detect or read it.
    687 
    688 To construct an orphan from scratch (without having some other object disown it), you need an
    689 `Orphanage`, which is essentially an orphan factory associated with some message.  You can get one
    690 by calling the `MessageBuilder`'s `getOrphanage()` method, or by calling the static method
    691 `Orphanage::getForMessageContaining(builder)` and passing it any struct or list builder.
    692 
    693 Note that when an `Orphan<T>` goes out-of-scope without being adopted, the underlying memory that
    694 it occupied is overwritten with zeros.  If you use packed serialization, these zeros will take very
    695 little bandwidth on the wire, but will still waste memory on the sending and receiving ends.
    696 Generally, you should avoid allocating message objects that won't be used, or if you cannot avoid
    697 it, arrange to copy the entire message over to a new `MessageBuilder` before serializing, since
    698 only the reachable objects will be copied.
    699 
    700 ## Reference
    701 
    702 The runtime library contains lots of useful features not described on this page.  For now, the
    703 best reference is the header files.  See:
    704 
    705     capnp/list.h
    706     capnp/blob.h
    707     capnp/message.h
    708     capnp/serialize.h
    709     capnp/serialize-packed.h
    710     capnp/schema.h
    711     capnp/schema-loader.h
    712     capnp/dynamic.h
    713 
    714 ## Tips and Best Practices
    715 
    716 Here are some tips for using the C++ Cap'n Proto runtime most effectively:
    717 
    718 * Accessor methods for primitive (non-pointer) fields are fast and inline.  They should be just
    719   as fast as accessing a struct field through a pointer.
    720 
    721 * Accessor methods for pointer fields, on the other hand, are not inline, as they need to validate
    722   the pointer.  If you intend to access the same pointer multiple times, it is a good idea to
    723   save the value to a local variable to avoid repeating this work.  This is generally not a
    724   problem given C++11's `auto`.
    725 
    726   Example:
    727 
    728       // BAD
    729       frob(foo.getBar().getBaz(),
    730            foo.getBar().getQux(),
    731            foo.getBar().getCorge());
    732 
    733       // GOOD
    734       auto bar = foo.getBar();
    735       frob(bar.getBaz(), bar.getQux(), bar.getCorge());
    736 
    737   It is especially important to use this style when reading messages, for another reason:  as
    738   described under the "security tips" section, below, every time you `get` a pointer, Cap'n Proto
    739   increments a counter by the size of the target object.  If that counter hits a pre-defined limit,
    740   an exception is thrown (or a default value is returned, if exceptions are disabled), to prevent
    741   a malicious client from sending your server into an infinite loop with a specially-crafted
    742   message.  If you repeatedly `get` the same object, you are repeatedly counting the same bytes,
    743   and so you may hit the limit prematurely.  (Since Cap'n Proto readers are backed directly by
    744   the underlying message buffer and do not have anywhere else to store per-object information, it
    745   is impossible to remember whether you've seen a particular object already.)
    746 
    747 * Internally, all pointer fields start out "null", even if they have default values.  When you have
    748   a pointer field `foo` and you call `getFoo()` on the containing struct's `Reader`, if the field
    749   is "null", you will receive a reader for that field's default value.  This reader is backed by
    750   read-only memory; nothing is allocated.  However, when you call `get` on a _builder_, and the
    751   field is null, then the implementation must make a _copy_ of the default value to return to you.
    752   Thus, you've caused the field to become non-null, just by "reading" it.  On the other hand, if
    753   you call `init` on that field, you are explicitly replacing whatever value is already there
    754   (null or not) with a newly-allocated instance, and that newly-allocated instance is _not_ a
    755   copy of the field's default value, but just a completely-uninitialized instance of the
    756   appropriate type.
    757 
    758 * It is possible to receive a struct value constructed from a newer version of the protocol than
    759   the one your binary was built with, and that struct might have extra fields that you don't know
    760   about.  The Cap'n Proto implementation tries to avoid discarding this extra data.  If you copy
    761   the struct from one message to another (e.g. by calling a set() method on a parent object), the
    762   extra fields will be preserved.  This makes it possible to build proxies that receive messages
    763   and forward them on without having to rebuild the proxy every time a new field is added.  You
    764   must be careful, however:  in some cases, it's not possible to retain the extra fields, because
    765   they need to be copied into a space that is allocated before the expected content is known.
    766   In particular, lists of structs are represented as a flat array, not as an array of pointers.
    767   Therefore, all memory for all structs in the list must be allocated upfront.  Hence, copying
    768   a struct value from another message into an element of a list will truncate the value.  Because
    769   of this, the setter method for struct lists is called `setWithCaveats()` rather than just `set()`.
    770 
    771 * Messages are built in "arena" or "region" style:  each object is allocated sequentially in
    772   memory, until there is no more room in the segment, in which case a new segment is allocated,
    773   and objects continue to be allocated sequentially in that segment.  This design is what makes
    774   Cap'n Proto possible at all, and it is very fast compared to other allocation strategies.
    775   However, it has the disadvantage that if you allocate an object and then discard it, that memory
    776   is lost.  In fact, the empty space will still become part of the serialized message, even though
    777   it is unreachable.  The implementation will try to zero it out, so at least it should pack well,
    778   but it's still better to avoid this situation.  Some ways that this can happen include:
    779   * If you `init` a field that is already initialized, the previous value is discarded.
    780   * If you create an orphan that is never adopted into the message tree.
    781   * If you use `adoptWithCaveats` to adopt an orphaned struct into a struct list, then a shallow
    782     copy is necessary, since the struct list requires that its elements are sequential in memory.
    783     The previous copy of the struct is discarded (although child objects are transferred properly).
    784   * If you copy a struct value from another message using a `set` method, the copy will have the
    785     same size as the original.  However, the original could have been built with an older version
    786     of the protocol which lacked some fields compared to the version your program was built with.
    787     If you subsequently `get` that struct, the implementation will be forced to allocate a new
    788     (shallow) copy which is large enough to hold all known fields, and the old copy will be
    789     discarded.  Child objects will be transferred over without being copied -- though they might
    790     suffer from the same problem if you `get` them later on.
    791   Sometimes, avoiding these problems is too inconvenient.  Fortunately, it's also possible to
    792   clean up the mess after-the-fact:  if you copy the whole message tree into a fresh
    793   `MessageBuilder`, only the reachable objects will be copied, leaving out all of the unreachable
    794   dead space.
    795 
    796   In the future, Cap'n Proto may be improved such that it can re-use dead space in a message.
    797   However, this will only improve things, not fix them entirely: fragementation could still leave
    798   dead space.
    799 
    800 ### Build Tips
    801 
    802 * If you are worried about the binary footprint of the Cap'n Proto library, consider statically
    803   linking with the `--gc-sections` linker flag.  This will allow the linker to drop pieces of the
    804   library that you do not actually use.  For example, many users do not use the dynamic schema and
    805   reflection APIs, which contribute a large fraction of the Cap'n Proto library's overall
    806   footprint.  Keep in mind that if you ever stringify a Cap'n Proto type, the stringification code
    807   depends on the dynamic API; consider only using stringification in debug builds.
    808 
    809   If you are dynamically linking against the system's shared copy of `libcapnp`, don't worry about
    810   its binary size.  Remember that only the code which you actually use will be paged into RAM, and
    811   those pages are shared with other applications on the system.
    812 
    813   Also remember to strip your binary.  In particular, `libcapnpc` (the schema parser) has
    814   excessively large symbol names caused by its use of template-based parser combinators.  Stripping
    815   the binary greatly reduces its size.
    816 
    817 * The Cap'n Proto library has lots of debug-only asserts that are removed if you `#define NDEBUG`,
    818   including in headers.  If you care at all about performance, you should compile your production
    819   binaries with the `-DNDEBUG` compiler flag.  In fact, if Cap'n Proto detects that you have
    820   optimization enabled but have not defined `NDEBUG`, it will define it for you (with a warning),
    821   unless you define `DEBUG` or `KJ_DEBUG` to explicitly request debugging.
    822 
    823 ### Security Tips
    824 
    825 Cap'n Proto has not yet undergone security review.  It most likely has some vulnerabilities.  You
    826 should not attempt to decode Cap'n Proto messages from sources you don't trust at this time.
    827 
    828 However, assuming the Cap'n Proto implementation hardens up eventually, then the following security
    829 tips will apply.
    830 
    831 * It is highly recommended that you enable exceptions.  When compiled with `-fno-exceptions`,
    832   Cap'n Proto categorizes exceptions into "fatal" and "recoverable" varieties.  Fatal exceptions
    833   cause the server to crash, while recoverable exceptions are handled by logging an error and
    834   returning a "safe" garbage value.  Fatal is preferred in cases where it's unclear what kind of
    835   garbage value would constitute "safe".  The more of the library you use, the higher the chance
    836   that you will leave yourself open to the possibility that an attacker could trigger a fatal
    837   exception somewhere.  If you enable exceptions, then you can catch the exception instead of
    838   crashing, and return an error just to the attacker rather than to everyone using your server.
    839 
    840   Basic parsing of Cap'n Proto messages shouldn't ever trigger fatal exceptions (assuming the
    841   implementation is not buggy).  However, the dynamic API -- especially if you are loading schemas
    842   controlled by the attacker -- is much more exception-happy.  If you cannot use exceptions, then
    843   you are advised to avoid the dynamic API when dealing with untrusted data.
    844 
    845 * If you need to process schemas from untrusted sources, take them in binary format, not text.
    846   The text parser is a much larger attack surface and not designed to be secure.  For instance,
    847   as of this writing, it is trivial to deadlock the parser by simply writing a constant whose value
    848   depends on itself.
    849 
    850 * Cap'n Proto automatically applies two artificial limits on messages for security reasons:
    851   a limit on nesting dept, and a limit on total bytes traversed.
    852 
    853   * The nesting depth limit is designed to prevent stack overflow when handling a deeply-nested
    854     recursive type, and defaults to 64.  If your types aren't recursive, it is highly unlikely
    855     that you would ever hit this limit, and even if they are recursive, it's still unlikely.
    856 
    857   * The traversal limit is designed to defend against maliciously-crafted messages which use
    858     pointer cycles or overlapping objects to make a message appear much larger than it looks off
    859     the wire.  While cycles and overlapping objects are illegal, they are hard to detect reliably.
    860     Instead, Cap'n Proto places a limit on how many bytes worth of objects you can _dereference_
    861     before it throws an exception.  This limit is assessed every time you follow a pointer.  By
    862     default, the limit is 64MiB (this may change in the future).  `StreamFdMessageReader` will
    863     actually reject upfront any message which is larger than the traversal limit, even before you
    864     start reading it.
    865 
    866     If you need to write your code in such a way that you might frequently re-read the same
    867     pointers, instead of increasing the traversal limit to the point where it is no longer useful,
    868     consider simply copying the message into a new `MallocMessageBuilder` before starting.  Then,
    869     the traversal limit will be enforced only during the copy.  There is no traversal limit on
    870     objects once they live in a `MessageBuilder`, even if you use `.asReader()` to convert a
    871     particular object's builder to the corresponding reader type.
    872 
    873   Both limits may be increased using `capnp::ReaderOptions`, defined in `capnp/message.h`.
    874 
    875 * Remember that enums on the wire may have a numeric value that does not match any value defined
    876   in the schema.  Your `switch()` statements must always have a safe default case.
    877 
    878 ## Lessons Learned from Protocol Buffers
    879 
    880 The author of Cap'n Proto's C++ implementation also wrote (in the past) verison 2 of Google's
    881 Protocol Buffers.  As a result, Cap'n Proto's implementation benefits from a number of lessons
    882 learned the hard way:
    883 
    884 * Protobuf generated code is enormous due to the parsing and serializing code generated for every
    885   class.  This actually poses a significant problem in practice -- there exist server binaries
    886   containing literally hundreds of megabytes of compiled protobuf code.  Cap'n Proto generated code,
    887   on the other hand, is almost entirely inlined accessors.  The only things that go into `.capnp.o`
    888   files are default values for pointer fields (if needed, which is rare) and the encoded schema
    889   (just the raw bytes of a Cap'n-Proto-encoded schema structure).  The latter could even be removed
    890   if you don't use dynamic reflection.
    891 
    892 * The C++ Protobuf implementation used lots of dynamic initialization code (that runs before
    893   `main()`) to do things like register types in global tables.  This proved problematic for
    894   programs which linked in lots of protocols but needed to start up quickly.  Cap'n Proto does not
    895   use any dynamic initializers anywhere, period.
    896 
    897 * The C++ Protobuf implementation makes heavy use of STL in its interface and implementation.
    898   The proliferation of template instantiations gives the Protobuf runtime library a large footprint,
    899   and using STL in the interface can lead to weird ABI problems and slow compiles.  Cap'n Proto
    900   does not use any STL containers in its interface and makes sparing use in its implementation.
    901   As a result, the Cap'n Proto runtime library is smaller, and code that uses it compiles quickly.
    902 
    903 * The in-memory representation of messages in Protobuf-C++ involves many heap objects.  Each
    904   message (struct) is an object, each non-primitive repeated field allocates an array of pointers
    905   to more objects, and each string may actually add two heap objects.  Cap'n Proto by its nature
    906   uses arena allocation, so the entire message is allocated in a few contiguous segments.  This
    907   means Cap'n Proto spends very little time allocating memory, stores messages more compactly, and
    908   avoids memory fragmentation.
    909 
    910 * Related to the last point, Protobuf-C++ relies heavily on object reuse for performance.
    911   Building or parsing into a newly-allocated Protobuf object is significantly slower than using
    912   an existing one.  However, the memory usage of a Protobuf object will tend to grow the more times
    913   it is reused, particularly if it is used to parse messages of many different "shapes", so the
    914   objects need to be deleted and re-allocated from time to time.  All this makes tuning Protobufs
    915   fairly tedious.  In contrast, enabling memory reuse with Cap'n Proto is as simple as providing
    916   a byte buffer to use as scratch space when you build or read in a message.  Provide enough scratch
    917   space to hold the entire message and Cap'n Proto won't allocate any memory.  Or don't -- since
    918   Cap'n Proto doesn't do much allocation in the first place, the benefits of scratch space are
    919   small.
	capnproto FORK: Cap'n Proto serialization/RPC system - core tools and C++ library
	git clone https://git.neptards.moe/neptards/capnproto.git
	Log \| Files \| Refs \| README \| LICENSE