qemu

FORK: QEMU emulator
git clone https://git.neptards.moe/neptards/qemu.git
Log | Files | Refs | Submodules | LICENSE

vhost-user.rst (58033B)


      1 .. _vhost_user_proto:
      2 
      3 ===================
      4 Vhost-user Protocol
      5 ===================
      6 
      7 ..
      8   Copyright 2014 Virtual Open Systems Sarl.
      9   Copyright 2019 Intel Corporation
     10   Licence: This work is licensed under the terms of the GNU GPL,
     11            version 2 or later. See the COPYING file in the top-level
     12            directory.
     13 
     14 .. contents:: Table of Contents
     15 
     16 Introduction
     17 ============
     18 
     19 This protocol is aiming to complement the ``ioctl`` interface used to
     20 control the vhost implementation in the Linux kernel. It implements
     21 the control plane needed to establish virtqueue sharing with a user
     22 space process on the same host. It uses communication over a Unix
     23 domain socket to share file descriptors in the ancillary data of the
     24 message.
     25 
     26 The protocol defines 2 sides of the communication, *front-end* and
     27 *back-end*. The *front-end* is the application that shares its virtqueues, in
     28 our case QEMU. The *back-end* is the consumer of the virtqueues.
     29 
     30 In the current implementation QEMU is the *front-end*, and the *back-end*
     31 is the external process consuming the virtio queues, for example a
     32 software Ethernet switch running in user space, such as Snabbswitch,
     33 or a block device back-end processing read & write to a virtual
     34 disk. In order to facilitate interoperability between various back-end
     35 implementations, it is recommended to follow the :ref:`Backend program
     36 conventions <backend_conventions>`.
     37 
     38 The *front-end* and *back-end* can be either a client (i.e. connecting) or
     39 server (listening) in the socket communication.
     40 
     41 Support for platforms other than Linux
     42 --------------------------------------
     43 
     44 While vhost-user was initially developed targeting Linux, nowadays it
     45 is supported on any platform that provides the following features:
     46 
     47 - A way for requesting shared memory represented by a file descriptor
     48   so it can be passed over a UNIX domain socket and then mapped by the
     49   other process.
     50 
     51 - AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can
     52   exchange messages through it, including ancillary data when needed.
     53 
     54 - Either eventfd or pipe/pipe2. On platforms where eventfd is not
     55   available, QEMU will automatically fall back to pipe2 or, as a last
     56   resort, pipe. Each file descriptor will be used for receiving or
     57   sending events by reading or writing (respectively) an 8-byte value
     58   to the corresponding it. The 8-value itself has no meaning and
     59   should not be interpreted.
     60 
     61 Message Specification
     62 =====================
     63 
     64 .. Note:: All numbers are in the machine native byte order.
     65 
     66 A vhost-user message consists of 3 header fields and a payload.
     67 
     68 +---------+-------+------+---------+
     69 | request | flags | size | payload |
     70 +---------+-------+------+---------+
     71 
     72 Header
     73 ------
     74 
     75 :request: 32-bit type of the request
     76 
     77 :flags: 32-bit bit field
     78 
     79 - Lower 2 bits are the version (currently 0x01)
     80 - Bit 2 is the reply flag - needs to be sent on each reply from the back-end
     81 - Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for
     82   details.
     83 
     84 :size: 32-bit size of the payload
     85 
     86 Payload
     87 -------
     88 
     89 Depending on the request type, **payload** can be:
     90 
     91 A single 64-bit integer
     92 ^^^^^^^^^^^^^^^^^^^^^^^
     93 
     94 +-----+
     95 | u64 |
     96 +-----+
     97 
     98 :u64: a 64-bit unsigned integer
     99 
    100 A vring state description
    101 ^^^^^^^^^^^^^^^^^^^^^^^^^
    102 
    103 +-------+-----+
    104 | index | num |
    105 +-------+-----+
    106 
    107 :index: a 32-bit index
    108 
    109 :num: a 32-bit number
    110 
    111 A vring address description
    112 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    113 
    114 +-------+-------+------+------------+------+-----------+-----+
    115 | index | flags | size | descriptor | used | available | log |
    116 +-------+-------+------+------------+------+-----------+-----+
    117 
    118 :index: a 32-bit vring index
    119 
    120 :flags: a 32-bit vring flags
    121 
    122 :descriptor: a 64-bit ring address of the vring descriptor table
    123 
    124 :used: a 64-bit ring address of the vring used ring
    125 
    126 :available: a 64-bit ring address of the vring available ring
    127 
    128 :log: a 64-bit guest address for logging
    129 
    130 Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has
    131 been negotiated. Otherwise it is a user address.
    132 
    133 Memory regions description
    134 ^^^^^^^^^^^^^^^^^^^^^^^^^^
    135 
    136 +-------------+---------+---------+-----+---------+
    137 | num regions | padding | region0 | ... | region7 |
    138 +-------------+---------+---------+-----+---------+
    139 
    140 :num regions: a 32-bit number of regions
    141 
    142 :padding: 32-bit
    143 
    144 A region is:
    145 
    146 +---------------+------+--------------+-------------+
    147 | guest address | size | user address | mmap offset |
    148 +---------------+------+--------------+-------------+
    149 
    150 :guest address: a 64-bit guest address of the region
    151 
    152 :size: a 64-bit size
    153 
    154 :user address: a 64-bit user address
    155 
    156 :mmap offset: 64-bit offset where region starts in the mapped memory
    157 
    158 Single memory region description
    159 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    160 
    161 +---------+---------------+------+--------------+-------------+
    162 | padding | guest address | size | user address | mmap offset |
    163 +---------+---------------+------+--------------+-------------+
    164 
    165 :padding: 64-bit
    166 
    167 :guest address: a 64-bit guest address of the region
    168 
    169 :size: a 64-bit size
    170 
    171 :user address: a 64-bit user address
    172 
    173 :mmap offset: 64-bit offset where region starts in the mapped memory
    174 
    175 Log description
    176 ^^^^^^^^^^^^^^^
    177 
    178 +----------+------------+
    179 | log size | log offset |
    180 +----------+------------+
    181 
    182 :log size: size of area used for logging
    183 
    184 :log offset: offset from start of supplied file descriptor where
    185              logging starts (i.e. where guest address 0 would be
    186              logged)
    187 
    188 An IOTLB message
    189 ^^^^^^^^^^^^^^^^
    190 
    191 +------+------+--------------+-------------------+------+
    192 | iova | size | user address | permissions flags | type |
    193 +------+------+--------------+-------------------+------+
    194 
    195 :iova: a 64-bit I/O virtual address programmed by the guest
    196 
    197 :size: a 64-bit size
    198 
    199 :user address: a 64-bit user address
    200 
    201 :permissions flags: an 8-bit value:
    202   - 0: No access
    203   - 1: Read access
    204   - 2: Write access
    205   - 3: Read/Write access
    206 
    207 :type: an 8-bit IOTLB message type:
    208   - 1: IOTLB miss
    209   - 2: IOTLB update
    210   - 3: IOTLB invalidate
    211   - 4: IOTLB access fail
    212 
    213 Virtio device config space
    214 ^^^^^^^^^^^^^^^^^^^^^^^^^^
    215 
    216 +--------+------+-------+---------+
    217 | offset | size | flags | payload |
    218 +--------+------+-------+---------+
    219 
    220 :offset: a 32-bit offset of virtio device's configuration space
    221 
    222 :size: a 32-bit configuration space access size in bytes
    223 
    224 :flags: a 32-bit value:
    225   - 0: Vhost front-end messages used for writable fields
    226   - 1: Vhost front-end messages used for live migration
    227 
    228 :payload: Size bytes array holding the contents of the virtio
    229           device's configuration space
    230 
    231 Vring area description
    232 ^^^^^^^^^^^^^^^^^^^^^^
    233 
    234 +-----+------+--------+
    235 | u64 | size | offset |
    236 +-----+------+--------+
    237 
    238 :u64: a 64-bit integer contains vring index and flags
    239 
    240 :size: a 64-bit size of this area
    241 
    242 :offset: a 64-bit offset of this area from the start of the
    243          supplied file descriptor
    244 
    245 Inflight description
    246 ^^^^^^^^^^^^^^^^^^^^
    247 
    248 +-----------+-------------+------------+------------+
    249 | mmap size | mmap offset | num queues | queue size |
    250 +-----------+-------------+------------+------------+
    251 
    252 :mmap size: a 64-bit size of area to track inflight I/O
    253 
    254 :mmap offset: a 64-bit offset of this area from the start
    255               of the supplied file descriptor
    256 
    257 :num queues: a 16-bit number of virtqueues
    258 
    259 :queue size: a 16-bit size of virtqueues
    260 
    261 C structure
    262 -----------
    263 
    264 In QEMU the vhost-user message is implemented with the following struct:
    265 
    266 .. code:: c
    267 
    268   typedef struct VhostUserMsg {
    269       VhostUserRequest request;
    270       uint32_t flags;
    271       uint32_t size;
    272       union {
    273           uint64_t u64;
    274           struct vhost_vring_state state;
    275           struct vhost_vring_addr addr;
    276           VhostUserMemory memory;
    277           VhostUserLog log;
    278           struct vhost_iotlb_msg iotlb;
    279           VhostUserConfig config;
    280           VhostUserVringArea area;
    281           VhostUserInflight inflight;
    282       };
    283   } QEMU_PACKED VhostUserMsg;
    284 
    285 Communication
    286 =============
    287 
    288 The protocol for vhost-user is based on the existing implementation of
    289 vhost for the Linux Kernel. Most messages that can be sent via the
    290 Unix domain socket implementing vhost-user have an equivalent ioctl to
    291 the kernel implementation.
    292 
    293 The communication consists of the *front-end* sending message requests and
    294 the *back-end* sending message replies. Most of the requests don't require
    295 replies. Here is a list of the ones that do:
    296 
    297 * ``VHOST_USER_GET_FEATURES``
    298 * ``VHOST_USER_GET_PROTOCOL_FEATURES``
    299 * ``VHOST_USER_GET_VRING_BASE``
    300 * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
    301 * ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
    302 
    303 .. seealso::
    304 
    305    :ref:`REPLY_ACK <reply_ack>`
    306        The section on ``REPLY_ACK`` protocol extension.
    307 
    308 There are several messages that the front-end sends with file descriptors passed
    309 in the ancillary data:
    310 
    311 * ``VHOST_USER_ADD_MEM_REG``
    312 * ``VHOST_USER_SET_MEM_TABLE``
    313 * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
    314 * ``VHOST_USER_SET_LOG_FD``
    315 * ``VHOST_USER_SET_VRING_KICK``
    316 * ``VHOST_USER_SET_VRING_CALL``
    317 * ``VHOST_USER_SET_VRING_ERR``
    318 * ``VHOST_USER_SET_SLAVE_REQ_FD``
    319 * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
    320 
    321 If *front-end* is unable to send the full message or receives a wrong
    322 reply it will close the connection. An optional reconnection mechanism
    323 can be implemented.
    324 
    325 If *back-end* detects some error such as incompatible features, it may also
    326 close the connection. This should only happen in exceptional circumstances.
    327 
    328 Any protocol extensions are gated by protocol feature bits, which
    329 allows full backwards compatibility on both front-end and back-end.  As
    330 older back-ends don't support negotiating protocol features, a feature
    331 bit was dedicated for this purpose::
    332 
    333   #define VHOST_USER_F_PROTOCOL_FEATURES 30
    334 
    335 Note that VHOST_USER_F_PROTOCOL_FEATURES is the UNUSED (30) feature
    336 bit defined in `VIRTIO 1.1 6.3 Legacy Interface: Reserved Feature Bits
    337 <https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4130003>`_.
    338 VIRTIO devices do not advertise this feature bit and therefore VIRTIO
    339 drivers cannot negotiate it.
    340 
    341 This reserved feature bit was reused by the vhost-user protocol to add
    342 vhost-user protocol feature negotiation in a backwards compatible
    343 fashion. Old vhost-user front-end and back-end implementations continue to
    344 work even though they are not aware of vhost-user protocol feature
    345 negotiation.
    346 
    347 Ring states
    348 -----------
    349 
    350 Rings can be in one of three states:
    351 
    352 * stopped: the back-end must not process the ring at all.
    353 
    354 * started but disabled: the back-end must process the ring without
    355   causing any side effects.  For example, for a networking device,
    356   in the disabled state the back-end must not supply any new RX packets,
    357   but must process and discard any TX packets.
    358 
    359 * started and enabled.
    360 
    361 Each ring is initialized in a stopped state.  The back-end must start
    362 ring upon receiving a kick (that is, detecting that file descriptor is
    363 readable) on the descriptor specified by ``VHOST_USER_SET_VRING_KICK``
    364 or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated,
    365 and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``.
    366 
    367 Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.
    368 
    369 If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
    370 ring starts directly in the enabled state.
    371 
    372 If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
    373 initialized in a disabled state and is enabled by
    374 ``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
    375 
    376 While processing the rings (whether they are enabled or not), the back-end
    377 must support changing some configuration aspects on the fly.
    378 
    379 Multiple queue support
    380 ----------------------
    381 
    382 Many devices have a fixed number of virtqueues.  In this case the front-end
    383 already knows the number of available virtqueues without communicating with the
    384 back-end.
    385 
    386 Some devices do not have a fixed number of virtqueues.  Instead the maximum
    387 number of virtqueues is chosen by the back-end.  The number can depend on host
    388 resource availability or back-end implementation details.  Such devices are called
    389 multiple queue devices.
    390 
    391 Multiple queue support allows the back-end to advertise the maximum number of
    392 queues.  This is treated as a protocol extension, hence the back-end has to
    393 implement protocol features first. The multiple queues feature is supported
    394 only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set.
    395 
    396 The max number of queues the back-end supports can be queried with message
    397 ``VHOST_USER_GET_QUEUE_NUM``. Front-end should stop when the number of requested
    398 queues is bigger than that.
    399 
    400 As all queues share one connection, the front-end uses a unique index for each
    401 queue in the sent message to identify a specified queue.
    402 
    403 The front-end enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``.
    404 vhost-user-net has historically automatically enabled the first queue pair.
    405 
    406 Back-ends should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol
    407 feature, even for devices with a fixed number of virtqueues, since it is simple
    408 to implement and offers a degree of introspection.
    409 
    410 Front-ends must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for
    411 devices with a fixed number of virtqueues.  Only true multiqueue devices
    412 require this protocol feature.
    413 
    414 Migration
    415 ---------
    416 
    417 During live migration, the front-end may need to track the modifications
    418 the back-end makes to the memory mapped regions. The front-end should mark
    419 the dirty pages in a log. Once it complies to this logging, it may
    420 declare the ``VHOST_F_LOG_ALL`` vhost feature.
    421 
    422 To start/stop logging of data/used ring writes, the front-end may send
    423 messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and
    424 ``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's
    425 flags set to 1/0, respectively.
    426 
    427 All the modifications to memory pointed by vring "descriptor" should
    428 be marked. Modifications to "used" vring should be marked if
    429 ``VHOST_VRING_F_LOG`` is part of ring's flags.
    430 
    431 Dirty pages are of size::
    432 
    433   #define VHOST_LOG_PAGE 0x1000
    434 
    435 The log memory fd is provided in the ancillary data of
    436 ``VHOST_USER_SET_LOG_BASE`` message when the back-end has
    437 ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature.
    438 
    439 The size of the log is supplied as part of ``VhostUserMsg`` which
    440 should be large enough to cover all known guest addresses. Log starts
    441 at the supplied offset in the supplied file descriptor.  The log
    442 covers from address 0 to the maximum of guest regions. In pseudo-code,
    443 to mark page at ``addr`` as dirty::
    444 
    445   page = addr / VHOST_LOG_PAGE
    446   log[page / 8] |= 1 << page % 8
    447 
    448 Where ``addr`` is the guest physical address.
    449 
    450 Use atomic operations, as the log may be concurrently manipulated.
    451 
    452 Note that when logging modifications to the used ring (when
    453 ``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should
    454 be used to calculate the log offset: the write to first byte of the
    455 used ring is logged at this offset from log start. Also note that this
    456 value might be outside the legal guest physical address range
    457 (i.e. does not have to be covered by the ``VhostUserMemory`` table), but
    458 the bit offset of the last byte of the ring must fall within the size
    459 supplied by ``VhostUserLog``.
    460 
    461 ``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in
    462 ancillary data, it may be used to inform the front-end that the log has
    463 been modified.
    464 
    465 Once the source has finished migration, rings will be stopped by the
    466 source. No further update must be done before rings are restarted.
    467 
    468 In postcopy migration the back-end is started before all the memory has
    469 been received from the source host, and care must be taken to avoid
    470 accessing pages that have yet to be received.  The back-end opens a
    471 'userfault'-fd and registers the memory with it; this fd is then
    472 passed back over to the front-end.  The front-end services requests on the
    473 userfaultfd for pages that are accessed and when the page is available
    474 it performs WAKE ioctl's on the userfaultfd to wake the stalled
    475 back-end.  The front-end indicates support for this via the
    476 ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
    477 
    478 Memory access
    479 -------------
    480 
    481 The front-end sends a list of vhost memory regions to the back-end using the
    482 ``VHOST_USER_SET_MEM_TABLE`` message.  Each region has two base
    483 addresses: a guest address and a user address.
    484 
    485 Messages contain guest addresses and/or user addresses to reference locations
    486 within the shared memory.  The mapping of these addresses works as follows.
    487 
    488 User addresses map to the vhost memory region containing that user address.
    489 
    490 When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated:
    491 
    492 * Guest addresses map to the vhost memory region containing that guest
    493   address.
    494 
    495 When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated:
    496 
    497 * Guest addresses are also called I/O virtual addresses (IOVAs).  They are
    498   translated to user addresses via the IOTLB.
    499 
    500 * The vhost memory region guest address is not used.
    501 
    502 IOMMU support
    503 -------------
    504 
    505 When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the
    506 front-end sends IOTLB entries update & invalidation by sending
    507 ``VHOST_USER_IOTLB_MSG`` requests to the back-end with a ``struct
    508 vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload
    509 has to be filled with the update message type (2), the I/O virtual
    510 address, the size, the user virtual address, and the permissions
    511 flags. Addresses and size must be within vhost memory regions set via
    512 the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the
    513 ``iotlb`` payload has to be filled with the invalidation message type
    514 (3), the I/O virtual address and the size. On success, the back-end is
    515 expected to reply with a zero payload, non-zero otherwise.
    516 
    517 The back-end relies on the back-end communication channel (see :ref:`Back-end
    518 communication <backend_communication>` section below) to send IOTLB miss
    519 and access failure events, by sending ``VHOST_USER_SLAVE_IOTLB_MSG``
    520 requests to the front-end with a ``struct vhost_iotlb_msg`` as
    521 payload. For miss events, the iotlb payload has to be filled with the
    522 miss message type (1), the I/O virtual address and the permissions
    523 flags. For access failure event, the iotlb payload has to be filled
    524 with the access failure message type (4), the I/O virtual address and
    525 the permissions flags.  For synchronization purpose, the back-end may
    526 rely on the reply-ack feature, so the front-end may send a reply when
    527 operation is completed if the reply-ack feature is negotiated and
    528 back-ends requests a reply. For miss events, completed operation means
    529 either front-end sent an update message containing the IOTLB entry
    530 containing requested address and permission, or front-end sent nothing if
    531 the IOTLB miss message is invalid (invalid IOVA or permission).
    532 
    533 The front-end isn't expected to take the initiative to send IOTLB update
    534 messages, as the back-end sends IOTLB miss messages for the guest virtual
    535 memory areas it needs to access.
    536 
    537 .. _backend_communication:
    538 
    539 Back-end communication
    540 ----------------------
    541 
    542 An optional communication channel is provided if the back-end declares
    543 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` protocol feature, to allow the
    544 back-end to make requests to the front-end.
    545 
    546 The fd is provided via ``VHOST_USER_SET_SLAVE_REQ_FD`` ancillary data.
    547 
    548 A back-end may then send ``VHOST_USER_SLAVE_*`` messages to the front-end
    549 using this fd communication channel.
    550 
    551 If ``VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD`` protocol feature is
    552 negotiated, back-end can send file descriptors (at most 8 descriptors in
    553 each message) to front-end via ancillary data using this fd communication
    554 channel.
    555 
    556 Inflight I/O tracking
    557 ---------------------
    558 
    559 To support reconnecting after restart or crash, back-end may need to
    560 resubmit inflight I/Os. If virtqueue is processed in order, we can
    561 easily achieve that by getting the inflight descriptors from
    562 descriptor table (split virtqueue) or descriptor ring (packed
    563 virtqueue). However, it can't work when we process descriptors
    564 out-of-order because some entries which store the information of
    565 inflight descriptors in available ring (split virtqueue) or descriptor
    566 ring (packed virtqueue) might be overridden by new entries. To solve
    567 this problem, the back-end need to allocate an extra buffer to store this
    568 information of inflight descriptors and share it with front-end for
    569 persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and
    570 ``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer
    571 between front-end and back-end. And the format of this buffer is described
    572 below:
    573 
    574 +---------------+---------------+-----+---------------+
    575 | queue0 region | queue1 region | ... | queueN region |
    576 +---------------+---------------+-----+---------------+
    577 
    578 N is the number of available virtqueues. The back-end could get it from num
    579 queues field of ``VhostUserInflight``.
    580 
    581 For split virtqueue, queue region can be implemented as:
    582 
    583 .. code:: c
    584 
    585   typedef struct DescStateSplit {
    586       /* Indicate whether this descriptor is inflight or not.
    587        * Only available for head-descriptor. */
    588       uint8_t inflight;
    589 
    590       /* Padding */
    591       uint8_t padding[5];
    592 
    593       /* Maintain a list for the last batch of used descriptors.
    594        * Only available when batching is used for submitting */
    595       uint16_t next;
    596 
    597       /* Used to preserve the order of fetching available descriptors.
    598        * Only available for head-descriptor. */
    599       uint64_t counter;
    600   } DescStateSplit;
    601 
    602   typedef struct QueueRegionSplit {
    603       /* The feature flags of this region. Now it's initialized to 0. */
    604       uint64_t features;
    605 
    606       /* The version of this region. It's 1 currently.
    607        * Zero value indicates an uninitialized buffer */
    608       uint16_t version;
    609 
    610       /* The size of DescStateSplit array. It's equal to the virtqueue size.
    611        * The back-end could get it from queue size field of VhostUserInflight. */
    612       uint16_t desc_num;
    613 
    614       /* The head of list that track the last batch of used descriptors. */
    615       uint16_t last_batch_head;
    616 
    617       /* Store the idx value of used ring */
    618       uint16_t used_idx;
    619 
    620       /* Used to track the state of each descriptor in descriptor table */
    621       DescStateSplit desc[];
    622   } QueueRegionSplit;
    623 
    624 To track inflight I/O, the queue region should be processed as follows:
    625 
    626 When receiving available buffers from the driver:
    627 
    628 #. Get the next available head-descriptor index from available ring, ``i``
    629 
    630 #. Set ``desc[i].counter`` to the value of global counter
    631 
    632 #. Increase global counter by 1
    633 
    634 #. Set ``desc[i].inflight`` to 1
    635 
    636 When supplying used buffers to the driver:
    637 
    638 1. Get corresponding used head-descriptor index, i
    639 
    640 2. Set ``desc[i].next`` to ``last_batch_head``
    641 
    642 3. Set ``last_batch_head`` to ``i``
    643 
    644 #. Steps 1,2,3 may be performed repeatedly if batching is possible
    645 
    646 #. Increase the ``idx`` value of used ring by the size of the batch
    647 
    648 #. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0
    649 
    650 #. Set ``used_idx`` to the ``idx`` value of used ring
    651 
    652 When reconnecting:
    653 
    654 #. If the value of ``used_idx`` does not match the ``idx`` value of
    655    used ring (means the inflight field of ``DescStateSplit`` entries in
    656    last batch may be incorrect),
    657 
    658    a. Subtract the value of ``used_idx`` from the ``idx`` value of
    659       used ring to get last batch size of ``DescStateSplit`` entries
    660 
    661    #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch
    662       list which starts from ``last_batch_head``
    663 
    664    #. Set ``used_idx`` to the ``idx`` value of used ring
    665 
    666 #. Resubmit inflight ``DescStateSplit`` entries in order of their
    667    counter value
    668 
    669 For packed virtqueue, queue region can be implemented as:
    670 
    671 .. code:: c
    672 
    673   typedef struct DescStatePacked {
    674       /* Indicate whether this descriptor is inflight or not.
    675        * Only available for head-descriptor. */
    676       uint8_t inflight;
    677 
    678       /* Padding */
    679       uint8_t padding;
    680 
    681       /* Link to the next free entry */
    682       uint16_t next;
    683 
    684       /* Link to the last entry of descriptor list.
    685        * Only available for head-descriptor. */
    686       uint16_t last;
    687 
    688       /* The length of descriptor list.
    689        * Only available for head-descriptor. */
    690       uint16_t num;
    691 
    692       /* Used to preserve the order of fetching available descriptors.
    693        * Only available for head-descriptor. */
    694       uint64_t counter;
    695 
    696       /* The buffer id */
    697       uint16_t id;
    698 
    699       /* The descriptor flags */
    700       uint16_t flags;
    701 
    702       /* The buffer length */
    703       uint32_t len;
    704 
    705       /* The buffer address */
    706       uint64_t addr;
    707   } DescStatePacked;
    708 
    709   typedef struct QueueRegionPacked {
    710       /* The feature flags of this region. Now it's initialized to 0. */
    711       uint64_t features;
    712 
    713       /* The version of this region. It's 1 currently.
    714        * Zero value indicates an uninitialized buffer */
    715       uint16_t version;
    716 
    717       /* The size of DescStatePacked array. It's equal to the virtqueue size.
    718        * The back-end could get it from queue size field of VhostUserInflight. */
    719       uint16_t desc_num;
    720 
    721       /* The head of free DescStatePacked entry list */
    722       uint16_t free_head;
    723 
    724       /* The old head of free DescStatePacked entry list */
    725       uint16_t old_free_head;
    726 
    727       /* The used index of descriptor ring */
    728       uint16_t used_idx;
    729 
    730       /* The old used index of descriptor ring */
    731       uint16_t old_used_idx;
    732 
    733       /* Device ring wrap counter */
    734       uint8_t used_wrap_counter;
    735 
    736       /* The old device ring wrap counter */
    737       uint8_t old_used_wrap_counter;
    738 
    739       /* Padding */
    740       uint8_t padding[7];
    741 
    742       /* Used to track the state of each descriptor fetched from descriptor ring */
    743       DescStatePacked desc[];
    744   } QueueRegionPacked;
    745 
    746 To track inflight I/O, the queue region should be processed as follows:
    747 
    748 When receiving available buffers from the driver:
    749 
    750 #. Get the next available descriptor entry from descriptor ring, ``d``
    751 
    752 #. If ``d`` is head descriptor,
    753 
    754    a. Set ``desc[old_free_head].num`` to 0
    755 
    756    #. Set ``desc[old_free_head].counter`` to the value of global counter
    757 
    758    #. Increase global counter by 1
    759 
    760    #. Set ``desc[old_free_head].inflight`` to 1
    761 
    762 #. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to
    763    ``free_head``
    764 
    765 #. Increase ``desc[old_free_head].num`` by 1
    766 
    767 #. Set ``desc[free_head].addr``, ``desc[free_head].len``,
    768    ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``,
    769    ``d.len``, ``d.flags``, ``d.id``
    770 
    771 #. Set ``free_head`` to ``desc[free_head].next``
    772 
    773 #. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head``
    774 
    775 When supplying used buffers to the driver:
    776 
    777 1. Get corresponding used head-descriptor entry from descriptor ring,
    778    ``d``
    779 
    780 2. Get corresponding ``DescStatePacked`` entry, ``e``
    781 
    782 3. Set ``desc[e.last].next`` to ``free_head``
    783 
    784 4. Set ``free_head`` to the index of ``e``
    785 
    786 #. Steps 1,2,3,4 may be performed repeatedly if batching is possible
    787 
    788 #. Increase ``used_idx`` by the size of the batch and update
    789    ``used_wrap_counter`` if needed
    790 
    791 #. Update ``d.flags``
    792 
    793 #. Set the ``inflight`` field of each head ``DescStatePacked`` entry
    794    in the batch to 0
    795 
    796 #. Set ``old_free_head``,  ``old_used_idx``, ``old_used_wrap_counter``
    797    to ``free_head``, ``used_idx``, ``used_wrap_counter``
    798 
    799 When reconnecting:
    800 
    801 #. If ``used_idx`` does not match ``old_used_idx`` (means the
    802    ``inflight`` field of ``DescStatePacked`` entries in last batch may
    803    be incorrect),
    804 
    805    a. Get the next descriptor ring entry through ``old_used_idx``, ``d``
    806 
    807    #. Use ``old_used_wrap_counter`` to calculate the available flags
    808 
    809    #. If ``d.flags`` is not equal to the calculated flags value (means
    810       back-end has submitted the buffer to guest driver before crash, so
    811       it has to commit the in-progres update), set ``old_free_head``,
    812       ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``,
    813       ``used_idx``, ``used_wrap_counter``
    814 
    815 #. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to
    816    ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter``
    817    (roll back any in-progress update)
    818 
    819 #. Set the ``inflight`` field of each ``DescStatePacked`` entry in
    820    free list to 0
    821 
    822 #. Resubmit inflight ``DescStatePacked`` entries in order of their
    823    counter value
    824 
    825 In-band notifications
    826 ---------------------
    827 
    828 In some limited situations (e.g. for simulation) it is desirable to
    829 have the kick, call and error (if used) signals done via in-band
    830 messages instead of asynchronous eventfd notifications. This can be
    831 done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS``
    832 protocol feature.
    833 
    834 Note that due to the fact that too many messages on the sockets can
    835 cause the sending application(s) to block, it is not advised to use
    836 this feature unless absolutely necessary. It is also considered an
    837 error to negotiate this feature without also negotiating
    838 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``,
    839 the former is necessary for getting a message channel from the back-end
    840 to the front-end, while the latter needs to be used with the in-band
    841 notification messages to block until they are processed, both to avoid
    842 blocking later and for proper processing (at least in the simulation
    843 use case.) As it has no other way of signalling this error, the back-end
    844 should close the connection as a response to a
    845 ``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band
    846 notifications feature flag without the other two.
    847 
    848 Protocol features
    849 -----------------
    850 
    851 .. code:: c
    852 
    853   #define VHOST_USER_PROTOCOL_F_MQ                    0
    854   #define VHOST_USER_PROTOCOL_F_LOG_SHMFD             1
    855   #define VHOST_USER_PROTOCOL_F_RARP                  2
    856   #define VHOST_USER_PROTOCOL_F_REPLY_ACK             3
    857   #define VHOST_USER_PROTOCOL_F_MTU                   4
    858   #define VHOST_USER_PROTOCOL_F_SLAVE_REQ             5
    859   #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN          6
    860   #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION        7
    861   #define VHOST_USER_PROTOCOL_F_PAGEFAULT             8
    862   #define VHOST_USER_PROTOCOL_F_CONFIG                9
    863   #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD        10
    864   #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER        11
    865   #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD       12
    866   #define VHOST_USER_PROTOCOL_F_RESET_DEVICE         13
    867   #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14
    868   #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
    869   #define VHOST_USER_PROTOCOL_F_STATUS               16
    870 
    871 Front-end message types
    872 -----------------------
    873 
    874 ``VHOST_USER_GET_FEATURES``
    875   :id: 1
    876   :equivalent ioctl: ``VHOST_GET_FEATURES``
    877   :request payload: N/A
    878   :reply payload: ``u64``
    879 
    880   Get from the underlying vhost implementation the features bitmask.
    881   Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals back-end support
    882   for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
    883   ``VHOST_USER_SET_PROTOCOL_FEATURES``.
    884 
    885 ``VHOST_USER_SET_FEATURES``
    886   :id: 2
    887   :equivalent ioctl: ``VHOST_SET_FEATURES``
    888   :request payload: ``u64``
    889   :reply payload: N/A
    890 
    891   Enable features in the underlying vhost implementation using a
    892   bitmask.  Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals
    893   back-end support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
    894   ``VHOST_USER_SET_PROTOCOL_FEATURES``.
    895 
    896 ``VHOST_USER_GET_PROTOCOL_FEATURES``
    897   :id: 15
    898   :equivalent ioctl: ``VHOST_GET_FEATURES``
    899   :request payload: N/A
    900   :reply payload: ``u64``
    901 
    902   Get the protocol feature bitmask from the underlying vhost
    903   implementation.  Only legal if feature bit
    904   ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
    905   ``VHOST_USER_GET_FEATURES``.  It does not need to be acknowledged by
    906   ``VHOST_USER_SET_FEATURES``.
    907 
    908 .. Note::
    909    Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must
    910    support this message even before ``VHOST_USER_SET_FEATURES`` was
    911    called.
    912 
    913 ``VHOST_USER_SET_PROTOCOL_FEATURES``
    914   :id: 16
    915   :equivalent ioctl: ``VHOST_SET_FEATURES``
    916   :request payload: ``u64``
    917   :reply payload: N/A
    918 
    919   Enable protocol features in the underlying vhost implementation.
    920 
    921   Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
    922   ``VHOST_USER_GET_FEATURES``.  It does not need to be acknowledged by
    923   ``VHOST_USER_SET_FEATURES``.
    924 
    925 .. Note::
    926    Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must support
    927    this message even before ``VHOST_USER_SET_FEATURES`` was called.
    928 
    929 ``VHOST_USER_SET_OWNER``
    930   :id: 3
    931   :equivalent ioctl: ``VHOST_SET_OWNER``
    932   :request payload: N/A
    933   :reply payload: N/A
    934 
    935   Issued when a new connection is established. It marks the sender
    936   as the front-end that owns of the session. This can be used on the *back-end*
    937   as a "session start" flag.
    938 
    939 ``VHOST_USER_RESET_OWNER``
    940   :id: 4
    941   :request payload: N/A
    942   :reply payload: N/A
    943 
    944 .. admonition:: Deprecated
    945 
    946    This is no longer used. Used to be sent to request disabling all
    947    rings, but some back-ends interpreted it to also discard connection
    948    state (this interpretation would lead to bugs).  It is recommended
    949    that back-ends either ignore this message, or use it to disable all
    950    rings.
    951 
    952 ``VHOST_USER_SET_MEM_TABLE``
    953   :id: 5
    954   :equivalent ioctl: ``VHOST_SET_MEM_TABLE``
    955   :request payload: memory regions description
    956   :reply payload: (postcopy only) memory regions description
    957 
    958   Sets the memory map regions on the back-end so it can translate the
    959   vring addresses. In the ancillary data there is an array of file
    960   descriptors for each memory mapped region. The size and ordering of
    961   the fds matches the number and ordering of memory regions.
    962 
    963   When ``VHOST_USER_POSTCOPY_LISTEN`` has been received,
    964   ``SET_MEM_TABLE`` replies with the bases of the memory mapped
    965   regions to the front-end.  The back-end must have mmap'd the regions but
    966   not yet accessed them and should not yet generate a userfault
    967   event.
    968 
    969 .. Note::
    970    ``NEED_REPLY_MASK`` is not set in this case.  QEMU will then
    971    reply back to the list of mappings with an empty
    972    ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon
    973    reception of this message may the guest start accessing the memory
    974    and generating faults.
    975 
    976 ``VHOST_USER_SET_LOG_BASE``
    977   :id: 6
    978   :equivalent ioctl: ``VHOST_SET_LOG_BASE``
    979   :request payload: u64
    980   :reply payload: N/A
    981 
    982   Sets logging shared memory space.
    983 
    984   When the back-end has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature,
    985   the log memory fd is provided in the ancillary data of
    986   ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared
    987   memory area provided in the message.
    988 
    989 ``VHOST_USER_SET_LOG_FD``
    990   :id: 7
    991   :equivalent ioctl: ``VHOST_SET_LOG_FD``
    992   :request payload: N/A
    993   :reply payload: N/A
    994 
    995   Sets the logging file descriptor, which is passed as ancillary data.
    996 
    997 ``VHOST_USER_SET_VRING_NUM``
    998   :id: 8
    999   :equivalent ioctl: ``VHOST_SET_VRING_NUM``
   1000   :request payload: vring state description
   1001   :reply payload: N/A
   1002 
   1003   Set the size of the queue.
   1004 
   1005 ``VHOST_USER_SET_VRING_ADDR``
   1006   :id: 9
   1007   :equivalent ioctl: ``VHOST_SET_VRING_ADDR``
   1008   :request payload: vring address description
   1009   :reply payload: N/A
   1010 
   1011   Sets the addresses of the different aspects of the vring.
   1012 
   1013 ``VHOST_USER_SET_VRING_BASE``
   1014   :id: 10
   1015   :equivalent ioctl: ``VHOST_SET_VRING_BASE``
   1016   :request payload: vring state description
   1017   :reply payload: N/A
   1018 
   1019   Sets the base offset in the available vring.
   1020 
   1021 ``VHOST_USER_GET_VRING_BASE``
   1022   :id: 11
   1023   :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE``
   1024   :request payload: vring state description
   1025   :reply payload: vring state description
   1026 
   1027   Get the available vring base offset.
   1028 
   1029 ``VHOST_USER_SET_VRING_KICK``
   1030   :id: 12
   1031   :equivalent ioctl: ``VHOST_SET_VRING_KICK``
   1032   :request payload: ``u64``
   1033   :reply payload: N/A
   1034 
   1035   Set the event file descriptor for adding buffers to the vring. It is
   1036   passed in the ancillary data.
   1037 
   1038   Bits (0-7) of the payload contain the vring index. Bit 8 is the
   1039   invalid FD flag. This flag is set when there is no file descriptor
   1040   in the ancillary data. This signals that polling should be used
   1041   instead of waiting for the kick. Note that if the protocol feature
   1042   ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated
   1043   this message isn't necessary as the ring is also started on the
   1044   ``VHOST_USER_VRING_KICK`` message, it may however still be used to
   1045   set an event file descriptor (which will be preferred over the
   1046   message) or to enable polling.
   1047 
   1048 ``VHOST_USER_SET_VRING_CALL``
   1049   :id: 13
   1050   :equivalent ioctl: ``VHOST_SET_VRING_CALL``
   1051   :request payload: ``u64``
   1052   :reply payload: N/A
   1053 
   1054   Set the event file descriptor to signal when buffers are used. It is
   1055   passed in the ancillary data.
   1056 
   1057   Bits (0-7) of the payload contain the vring index. Bit 8 is the
   1058   invalid FD flag. This flag is set when there is no file descriptor
   1059   in the ancillary data. This signals that polling will be used
   1060   instead of waiting for the call. Note that if the protocol features
   1061   ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
   1062   ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message
   1063   isn't necessary as the ``VHOST_USER_SLAVE_VRING_CALL`` message can be
   1064   used, it may however still be used to set an event file descriptor
   1065   or to enable polling.
   1066 
   1067 ``VHOST_USER_SET_VRING_ERR``
   1068   :id: 14
   1069   :equivalent ioctl: ``VHOST_SET_VRING_ERR``
   1070   :request payload: ``u64``
   1071   :reply payload: N/A
   1072 
   1073   Set the event file descriptor to signal when error occurs. It is
   1074   passed in the ancillary data.
   1075 
   1076   Bits (0-7) of the payload contain the vring index. Bit 8 is the
   1077   invalid FD flag. This flag is set when there is no file descriptor
   1078   in the ancillary data. Note that if the protocol features
   1079   ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
   1080   ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message
   1081   isn't necessary as the ``VHOST_USER_SLAVE_VRING_ERR`` message can be
   1082   used, it may however still be used to set an event file descriptor
   1083   (which will be preferred over the message).
   1084 
   1085 ``VHOST_USER_GET_QUEUE_NUM``
   1086   :id: 17
   1087   :equivalent ioctl: N/A
   1088   :request payload: N/A
   1089   :reply payload: u64
   1090 
   1091   Query how many queues the back-end supports.
   1092 
   1093   This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ``
   1094   is set in queried protocol features by
   1095   ``VHOST_USER_GET_PROTOCOL_FEATURES``.
   1096 
   1097 ``VHOST_USER_SET_VRING_ENABLE``
   1098   :id: 18
   1099   :equivalent ioctl: N/A
   1100   :request payload: vring state description
   1101   :reply payload: N/A
   1102 
   1103   Signal the back-end to enable or disable corresponding vring.
   1104 
   1105   This request should be sent only when
   1106   ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated.
   1107 
   1108 ``VHOST_USER_SEND_RARP``
   1109   :id: 19
   1110   :equivalent ioctl: N/A
   1111   :request payload: ``u64``
   1112   :reply payload: N/A
   1113 
   1114   Ask vhost user back-end to broadcast a fake RARP to notify the migration
   1115   is terminated for guest that does not support GUEST_ANNOUNCE.
   1116 
   1117   Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is
   1118   present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
   1119   ``VHOST_USER_PROTOCOL_F_RARP`` is present in
   1120   ``VHOST_USER_GET_PROTOCOL_FEATURES``.  The first 6 bytes of the
   1121   payload contain the mac address of the guest to allow the vhost user
   1122   back-end to construct and broadcast the fake RARP.
   1123 
   1124 ``VHOST_USER_NET_SET_MTU``
   1125   :id: 20
   1126   :equivalent ioctl: N/A
   1127   :request payload: ``u64``
   1128   :reply payload: N/A
   1129 
   1130   Set host MTU value exposed to the guest.
   1131 
   1132   This request should be sent only when ``VIRTIO_NET_F_MTU`` feature
   1133   has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES``
   1134   is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
   1135   ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in
   1136   ``VHOST_USER_GET_PROTOCOL_FEATURES``.
   1137 
   1138   If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
   1139   respond with zero in case the specified MTU is valid, or non-zero
   1140   otherwise.
   1141 
   1142 ``VHOST_USER_SET_SLAVE_REQ_FD``
   1143   :id: 21
   1144   :equivalent ioctl: N/A
   1145   :request payload: N/A
   1146   :reply payload: N/A
   1147 
   1148   Set the socket file descriptor for back-end initiated requests. It is passed
   1149   in the ancillary data.
   1150 
   1151   This request should be sent only when
   1152   ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol
   1153   feature bit ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` bit is present in
   1154   ``VHOST_USER_GET_PROTOCOL_FEATURES``.  If
   1155   ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
   1156   respond with zero for success, non-zero otherwise.
   1157 
   1158 ``VHOST_USER_IOTLB_MSG``
   1159   :id: 22
   1160   :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
   1161   :request payload: ``struct vhost_iotlb_msg``
   1162   :reply payload: ``u64``
   1163 
   1164   Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
   1165 
   1166   The front-end sends such requests to update and invalidate entries in the
   1167   device IOTLB. The back-end has to acknowledge the request with sending
   1168   zero as ``u64`` payload for success, non-zero otherwise.
   1169 
   1170   This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM``
   1171   feature has been successfully negotiated.
   1172 
   1173 ``VHOST_USER_SET_VRING_ENDIAN``
   1174   :id: 23
   1175   :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN``
   1176   :request payload: vring state description
   1177   :reply payload: N/A
   1178 
   1179   Set the endianness of a VQ for legacy devices. Little-endian is
   1180   indicated with state.num set to 0 and big-endian is indicated with
   1181   state.num set to 1. Other values are invalid.
   1182 
   1183   This request should be sent only when
   1184   ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated.
   1185   Backends that negotiated this feature should handle both
   1186   endiannesses and expect this message once (per VQ) during device
   1187   configuration (ie. before the front-end starts the VQ).
   1188 
   1189 ``VHOST_USER_GET_CONFIG``
   1190   :id: 24
   1191   :equivalent ioctl: N/A
   1192   :request payload: virtio device config space
   1193   :reply payload: virtio device config space
   1194 
   1195   When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
   1196   submitted by the vhost-user front-end to fetch the contents of the
   1197   virtio device configuration space, vhost-user back-end's payload size
   1198   MUST match the front-end's request, vhost-user back-end uses zero length of
   1199   payload to indicate an error to the vhost-user front-end. The vhost-user
   1200   front-end may cache the contents to avoid repeated
   1201   ``VHOST_USER_GET_CONFIG`` calls.
   1202 
   1203 ``VHOST_USER_SET_CONFIG``
   1204   :id: 25
   1205   :equivalent ioctl: N/A
   1206   :request payload: virtio device config space
   1207   :reply payload: N/A
   1208 
   1209   When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
   1210   submitted by the vhost-user front-end when the Guest changes the virtio
   1211   device configuration space and also can be used for live migration
   1212   on the destination host. The vhost-user back-end must check the flags
   1213   field, and back-ends MUST NOT accept SET_CONFIG for read-only
   1214   configuration space fields unless the live migration bit is set.
   1215 
   1216 ``VHOST_USER_CREATE_CRYPTO_SESSION``
   1217   :id: 26
   1218   :equivalent ioctl: N/A
   1219   :request payload: crypto session description
   1220   :reply payload: crypto session description
   1221 
   1222   Create a session for crypto operation. The back-end must return
   1223   the session id, 0 or positive for success, negative for failure.
   1224   This request should be sent only when
   1225   ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
   1226   successfully negotiated.  It's a required feature for crypto
   1227   devices.
   1228 
   1229 ``VHOST_USER_CLOSE_CRYPTO_SESSION``
   1230   :id: 27
   1231   :equivalent ioctl: N/A
   1232   :request payload: ``u64``
   1233   :reply payload: N/A
   1234 
   1235   Close a session for crypto operation which was previously
   1236   created by ``VHOST_USER_CREATE_CRYPTO_SESSION``.
   1237 
   1238   This request should be sent only when
   1239   ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
   1240   successfully negotiated.  It's a required feature for crypto
   1241   devices.
   1242 
   1243 ``VHOST_USER_POSTCOPY_ADVISE``
   1244   :id: 28
   1245   :request payload: N/A
   1246   :reply payload: userfault fd
   1247 
   1248   When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the front-end
   1249   advises back-end that a migration with postcopy enabled is underway,
   1250   the back-end must open a userfaultfd for later use.  Note that at this
   1251   stage the migration is still in precopy mode.
   1252 
   1253 ``VHOST_USER_POSTCOPY_LISTEN``
   1254   :id: 29
   1255   :request payload: N/A
   1256   :reply payload: N/A
   1257 
   1258   The front-end advises back-end that a transition to postcopy mode has
   1259   happened.  The back-end must ensure that shared memory is registered
   1260   with userfaultfd to cause faulting of non-present pages.
   1261 
   1262   This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``,
   1263   and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported.
   1264 
   1265 ``VHOST_USER_POSTCOPY_END``
   1266   :id: 30
   1267   :request payload: N/A
   1268   :reply payload: ``u64``
   1269 
   1270   The front-end advises that postcopy migration has now completed.  The back-end
   1271   must disable the userfaultfd. The reply is an acknowledgement
   1272   only.
   1273 
   1274   When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message
   1275   is sent at the end of the migration, after
   1276   ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent.
   1277 
   1278   The value returned is an error indication; 0 is success.
   1279 
   1280 ``VHOST_USER_GET_INFLIGHT_FD``
   1281   :id: 31
   1282   :equivalent ioctl: N/A
   1283   :request payload: inflight description
   1284   :reply payload: N/A
   1285 
   1286   When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
   1287   been successfully negotiated, this message is submitted by the front-end to
   1288   get a shared buffer from back-end. The shared buffer will be used to
   1289   track inflight I/O by back-end. QEMU should retrieve a new one when vm
   1290   reset.
   1291 
   1292 ``VHOST_USER_SET_INFLIGHT_FD``
   1293   :id: 32
   1294   :equivalent ioctl: N/A
   1295   :request payload: inflight description
   1296   :reply payload: N/A
   1297 
   1298   When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
   1299   been successfully negotiated, this message is submitted by the front-end to
   1300   send the shared inflight buffer back to the back-end so that the back-end
   1301   could get inflight I/O after a crash or restart.
   1302 
   1303 ``VHOST_USER_GPU_SET_SOCKET``
   1304   :id: 33
   1305   :equivalent ioctl: N/A
   1306   :request payload: N/A
   1307   :reply payload: N/A
   1308 
   1309   Sets the GPU protocol socket file descriptor, which is passed as
   1310   ancillary data. The GPU protocol is used to inform the front-end of
   1311   rendering state and updates. See vhost-user-gpu.rst for details.
   1312 
   1313 ``VHOST_USER_RESET_DEVICE``
   1314   :id: 34
   1315   :equivalent ioctl: N/A
   1316   :request payload: N/A
   1317   :reply payload: N/A
   1318 
   1319   Ask the vhost user back-end to disable all rings and reset all
   1320   internal device state to the initial state, ready to be
   1321   reinitialized. The back-end retains ownership of the device
   1322   throughout the reset operation.
   1323 
   1324   Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
   1325   feature is set by the back-end.
   1326 
   1327 ``VHOST_USER_VRING_KICK``
   1328   :id: 35
   1329   :equivalent ioctl: N/A
   1330   :request payload: vring state description
   1331   :reply payload: N/A
   1332 
   1333   When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
   1334   feature has been successfully negotiated, this message may be
   1335   submitted by the front-end to indicate that a buffer was added to
   1336   the vring instead of signalling it using the vring's kick file
   1337   descriptor or having the back-end rely on polling.
   1338 
   1339   The state.num field is currently reserved and must be set to 0.
   1340 
   1341 ``VHOST_USER_GET_MAX_MEM_SLOTS``
   1342   :id: 36
   1343   :equivalent ioctl: N/A
   1344   :request payload: N/A
   1345   :reply payload: u64
   1346 
   1347   When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
   1348   feature has been successfully negotiated, this message is submitted
   1349   by the front-end to the back-end. The back-end should return the message with a
   1350   u64 payload containing the maximum number of memory slots for
   1351   QEMU to expose to the guest. The value returned by the back-end
   1352   will be capped at the maximum number of ram slots which can be
   1353   supported by the target platform.
   1354 
   1355 ``VHOST_USER_ADD_MEM_REG``
   1356   :id: 37
   1357   :equivalent ioctl: N/A
   1358   :request payload: N/A
   1359   :reply payload: single memory region description
   1360 
   1361   When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
   1362   feature has been successfully negotiated, this message is submitted
   1363   by the front-end to the back-end. The message payload contains a memory
   1364   region descriptor struct, describing a region of guest memory which
   1365   the back-end device must map in. When the
   1366   ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
   1367   been successfully negotiated, along with the
   1368   ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and
   1369   update the memory tables of the back-end device.
   1370 
   1371   Exactly one file descriptor from which the memory is mapped is
   1372   passed in the ancillary data.
   1373 
   1374   In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end
   1375   replies with the bases of the memory mapped region to the front-end.
   1376   For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``.
   1377   They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly.
   1378 
   1379 ``VHOST_USER_REM_MEM_REG``
   1380   :id: 38
   1381   :equivalent ioctl: N/A
   1382   :request payload: N/A
   1383   :reply payload: single memory region description
   1384 
   1385   When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
   1386   feature has been successfully negotiated, this message is submitted
   1387   by the front-end to the back-end. The message payload contains a memory
   1388   region descriptor struct, describing a region of guest memory which
   1389   the back-end device must unmap. When the
   1390   ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
   1391   been successfully negotiated, along with the
   1392   ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and
   1393   update the memory tables of the back-end device.
   1394 
   1395   The memory region to be removed is identified by its guest address,
   1396   user address and size. The mmap offset is ignored.
   1397 
   1398   No file descriptors SHOULD be passed in the ancillary data. For
   1399   compatibility with existing incorrect implementations, the back-end MAY
   1400   accept messages with one file descriptor. If a file descriptor is
   1401   passed, the back-end MUST close it without using it otherwise.
   1402 
   1403 ``VHOST_USER_SET_STATUS``
   1404   :id: 39
   1405   :equivalent ioctl: VHOST_VDPA_SET_STATUS
   1406   :request payload: ``u64``
   1407   :reply payload: N/A
   1408 
   1409   When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
   1410   successfully negotiated, this message is submitted by the front-end to
   1411   notify the back-end with updated device status as defined in the Virtio
   1412   specification.
   1413 
   1414 ``VHOST_USER_GET_STATUS``
   1415   :id: 40
   1416   :equivalent ioctl: VHOST_VDPA_GET_STATUS
   1417   :request payload: N/A
   1418   :reply payload: ``u64``
   1419 
   1420   When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
   1421   successfully negotiated, this message is submitted by the front-end to
   1422   query the back-end for its device status as defined in the Virtio
   1423   specification.
   1424 
   1425 
   1426 Back-end message types
   1427 ----------------------
   1428 
   1429 For this type of message, the request is sent by the back-end and the reply
   1430 is sent by the front-end.
   1431 
   1432 ``VHOST_USER_SLAVE_IOTLB_MSG``
   1433   :id: 1
   1434   :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
   1435   :request payload: ``struct vhost_iotlb_msg``
   1436   :reply payload: N/A
   1437 
   1438   Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
   1439   The back-end sends such requests to notify of an IOTLB miss, or an IOTLB
   1440   access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is
   1441   negotiated, and back-end set the ``VHOST_USER_NEED_REPLY`` flag, the front-end
   1442   must respond with zero when operation is successfully completed, or
   1443   non-zero otherwise.  This request should be send only when
   1444   ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully
   1445   negotiated.
   1446 
   1447 ``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG``
   1448   :id: 2
   1449   :equivalent ioctl: N/A
   1450   :request payload: N/A
   1451   :reply payload: N/A
   1452 
   1453   When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user
   1454   back-end sends such messages to notify that the virtio device's
   1455   configuration space has changed, for those host devices which can
   1456   support such feature, host driver can send ``VHOST_USER_GET_CONFIG``
   1457   message to the back-end to get the latest content. If
   1458   ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the back-end sets the
   1459   ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond with zero when
   1460   operation is successfully completed, or non-zero otherwise.
   1461 
   1462 ``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG``
   1463   :id: 3
   1464   :equivalent ioctl: N/A
   1465   :request payload: vring area description
   1466   :reply payload: N/A
   1467 
   1468   Sets host notifier for a specified queue. The queue index is
   1469   contained in the ``u64`` field of the vring area description. The
   1470   host notifier is described by the file descriptor (typically it's a
   1471   VFIO device fd) which is passed as ancillary data and the size
   1472   (which is mmap size and should be the same as host page size) and
   1473   offset (which is mmap offset) carried in the vring area
   1474   description. QEMU can mmap the file descriptor based on the size and
   1475   offset to get a memory range. Registering a host notifier means
   1476   mapping this memory range to the VM as the specified queue's notify
   1477   MMIO region. The back-end sends this request to tell QEMU to de-register
   1478   the existing notifier if any and register the new notifier if the
   1479   request is sent with a file descriptor.
   1480 
   1481   This request should be sent only when
   1482   ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
   1483   successfully negotiated.
   1484 
   1485 ``VHOST_USER_SLAVE_VRING_CALL``
   1486   :id: 4
   1487   :equivalent ioctl: N/A
   1488   :request payload: vring state description
   1489   :reply payload: N/A
   1490 
   1491   When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
   1492   feature has been successfully negotiated, this message may be
   1493   submitted by the back-end to indicate that a buffer was used from
   1494   the vring instead of signalling this using the vring's call file
   1495   descriptor or having the front-end relying on polling.
   1496 
   1497   The state.num field is currently reserved and must be set to 0.
   1498 
   1499 ``VHOST_USER_SLAVE_VRING_ERR``
   1500   :id: 5
   1501   :equivalent ioctl: N/A
   1502   :request payload: vring state description
   1503   :reply payload: N/A
   1504 
   1505   When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
   1506   feature has been successfully negotiated, this message may be
   1507   submitted by the back-end to indicate that an error occurred on the
   1508   specific vring, instead of signalling the error file descriptor
   1509   set by the front-end via ``VHOST_USER_SET_VRING_ERR``.
   1510 
   1511   The state.num field is currently reserved and must be set to 0.
   1512 
   1513 .. _reply_ack:
   1514 
   1515 VHOST_USER_PROTOCOL_F_REPLY_ACK
   1516 -------------------------------
   1517 
   1518 The original vhost-user specification only demands replies for certain
   1519 commands. This differs from the vhost protocol implementation where
   1520 commands are sent over an ``ioctl()`` call and block until the back-end
   1521 has completed.
   1522 
   1523 With this protocol extension negotiated, the sender (QEMU) can set the
   1524 ``need_reply`` [Bit 3] flag to any command. This indicates that the
   1525 back-end MUST respond with a Payload ``VhostUserMsg`` indicating success
   1526 or failure. The payload should be set to zero on success or non-zero
   1527 on failure, unless the message already has an explicit reply body.
   1528 
   1529 The reply payload gives QEMU a deterministic indication of the result
   1530 of the command. Today, QEMU is expected to terminate the main vhost-user
   1531 loop upon receiving such errors. In future, qemu could be taught to be more
   1532 resilient for selective requests.
   1533 
   1534 For the message types that already solicit a reply from the back-end,
   1535 the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit
   1536 being set brings no behavioural change. (See the Communication_
   1537 section for details.)
   1538 
   1539 .. _backend_conventions:
   1540 
   1541 Backend program conventions
   1542 ===========================
   1543 
   1544 vhost-user back-ends can provide various devices & services and may
   1545 need to be configured manually depending on the use case. However, it
   1546 is a good idea to follow the conventions listed here when
   1547 possible. Users, QEMU or libvirt, can then rely on some common
   1548 behaviour to avoid heterogeneous configuration and management of the
   1549 back-end programs and facilitate interoperability.
   1550 
   1551 Each back-end installed on a host system should come with at least one
   1552 JSON file that conforms to the vhost-user.json schema. Each file
   1553 informs the management applications about the back-end type, and binary
   1554 location. In addition, it defines rules for management apps for
   1555 picking the highest priority back-end when multiple match the search
   1556 criteria (see ``@VhostUserBackend`` documentation in the schema file).
   1557 
   1558 If the back-end is not capable of enabling a requested feature on the
   1559 host (such as 3D acceleration with virgl), or the initialization
   1560 failed, the back-end should fail to start early and exit with a status
   1561 != 0. It may also print a message to stderr for further details.
   1562 
   1563 The back-end program must not daemonize itself, but it may be
   1564 daemonized by the management layer. It may also have a restricted
   1565 access to the system.
   1566 
   1567 File descriptors 0, 1 and 2 will exist, and have regular
   1568 stdin/stdout/stderr usage (they may have been redirected to /dev/null
   1569 by the management layer, or to a log handler).
   1570 
   1571 The back-end program must end (as quickly and cleanly as possible) when
   1572 the SIGTERM signal is received. Eventually, it may receive SIGKILL by
   1573 the management layer after a few seconds.
   1574 
   1575 The following command line options have an expected behaviour. They
   1576 are mandatory, unless explicitly said differently:
   1577 
   1578 --socket-path=PATH
   1579 
   1580   This option specify the location of the vhost-user Unix domain socket.
   1581   It is incompatible with --fd.
   1582 
   1583 --fd=FDNUM
   1584 
   1585   When this argument is given, the back-end program is started with the
   1586   vhost-user socket as file descriptor FDNUM. It is incompatible with
   1587   --socket-path.
   1588 
   1589 --print-capabilities
   1590 
   1591   Output to stdout the back-end capabilities in JSON format, and then
   1592   exit successfully. Other options and arguments should be ignored, and
   1593   the back-end program should not perform its normal function.  The
   1594   capabilities can be reported dynamically depending on the host
   1595   capabilities.
   1596 
   1597 The JSON output is described in the ``vhost-user.json`` schema, by
   1598 ```@VHostUserBackendCapabilities``.  Example:
   1599 
   1600 .. code:: json
   1601 
   1602   {
   1603     "type": "foo",
   1604     "features": [
   1605       "feature-a",
   1606       "feature-b"
   1607     ]
   1608   }
   1609 
   1610 vhost-user-input
   1611 ----------------
   1612 
   1613 Command line options:
   1614 
   1615 --evdev-path=PATH
   1616 
   1617   Specify the linux input device.
   1618 
   1619   (optional)
   1620 
   1621 --no-grab
   1622 
   1623   Do no request exclusive access to the input device.
   1624 
   1625   (optional)
   1626 
   1627 vhost-user-gpu
   1628 --------------
   1629 
   1630 Command line options:
   1631 
   1632 --render-node=PATH
   1633 
   1634   Specify the GPU DRM render node.
   1635 
   1636   (optional)
   1637 
   1638 --virgl
   1639 
   1640   Enable virgl rendering support.
   1641 
   1642   (optional)
   1643 
   1644 vhost-user-blk
   1645 --------------
   1646 
   1647 Command line options:
   1648 
   1649 --blk-file=PATH
   1650 
   1651   Specify block device or file path.
   1652 
   1653   (optional)
   1654 
   1655 --read-only
   1656 
   1657   Enable read-only.
   1658 
   1659   (optional)