qemu

FORK: QEMU emulator
git clone https://git.neptards.moe/neptards/qemu.git
Log | Files | Refs | Submodules | LICENSE

nvme.rst (11545B)


      1 ==============
      2 NVMe Emulation
      3 ==============
      4 
      5 QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and
      6 ``nvme-subsys`` devices.
      7 
      8 See the following sections for specific information on
      9 
     10   * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_.
     11   * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_,
     12     `Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data
     13     Protection`_,
     14 
     15 Adding NVMe Devices
     16 ===================
     17 
     18 Controller Emulation
     19 --------------------
     20 
     21 The QEMU emulated NVMe controller implements version 1.4 of the NVM Express
     22 specification. All mandatory features are implement with a couple of exceptions
     23 and limitations:
     24 
     25   * Accounting numbers in the SMART/Health log page are reset when the device
     26     is power cycled.
     27   * Interrupt Coalescing is not supported and is disabled by default.
     28 
     29 The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the
     30 following parameters:
     31 
     32 .. code-block:: console
     33 
     34     -drive file=nvm.img,if=none,id=nvm
     35     -device nvme,serial=deadbeef,drive=nvm
     36 
     37 There are a number of optional general parameters for the ``nvme`` device. Some
     38 are mentioned here, but see ``-device nvme,help`` to list all possible
     39 parameters.
     40 
     41 ``max_ioqpairs=UINT32`` (default: ``64``)
     42   Set the maximum number of allowed I/O queue pairs. This replaces the
     43   deprecated ``num_queues`` parameter.
     44 
     45 ``msix_qsize=UINT16`` (default: ``65``)
     46   The number of MSI-X vectors that the device should support.
     47 
     48 ``mdts=UINT8`` (default: ``7``)
     49   Set the Maximum Data Transfer Size of the device.
     50 
     51 ``use-intel-id`` (default: ``off``)
     52   Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and
     53   Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID
     54   previously used.
     55 
     56 Additional Namespaces
     57 ---------------------
     58 
     59 In the simplest possible invocation sketched above, the device only support a
     60 single namespace with the namespace identifier ``1``. To support multiple
     61 namespaces and additional features, the ``nvme-ns`` device must be used.
     62 
     63 .. code-block:: console
     64 
     65    -device nvme,id=nvme-ctrl-0,serial=deadbeef
     66    -drive file=nvm-1.img,if=none,id=nvm-1
     67    -device nvme-ns,drive=nvm-1
     68    -drive file=nvm-2.img,if=none,id=nvm-2
     69    -device nvme-ns,drive=nvm-2
     70 
     71 The namespaces defined by the ``nvme-ns`` device will attach to the most
     72 recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace
     73 identifiers are allocated automatically, starting from ``1``.
     74 
     75 There are a number of parameters available:
     76 
     77 ``nsid`` (default: ``0``)
     78   Explicitly set the namespace identifier.
     79 
     80 ``uuid`` (default: *autogenerated*)
     81   Set the UUID of the namespace. This will be reported as a "Namespace UUID"
     82   descriptor in the Namespace Identification Descriptor List.
     83 
     84 ``eui64``
     85   Set the EUI-64 of the namespace. This will be reported as a "IEEE Extended
     86   Unique Identifier" descriptor in the Namespace Identification Descriptor List.
     87   Since machine type 6.1 a non-zero default value is used if the parameter
     88   is not provided. For earlier machine types the field defaults to 0.
     89 
     90 ``bus``
     91   If there are more ``nvme`` devices defined, this parameter may be used to
     92   attach the namespace to a specific ``nvme`` device (identified by an ``id``
     93   parameter on the controller device).
     94 
     95 NVM Subsystems
     96 --------------
     97 
     98 Additional features becomes available if the controller device (``nvme``) is
     99 linked to an NVM Subsystem device (``nvme-subsys``).
    100 
    101 The NVM Subsystem emulation allows features such as shared namespaces and
    102 multipath I/O.
    103 
    104 .. code-block:: console
    105 
    106    -device nvme-subsys,id=nvme-subsys-0,nqn=subsys0
    107    -device nvme,serial=deadbeef,subsys=nvme-subsys-0
    108    -device nvme,serial=deadbeef,subsys=nvme-subsys-0
    109 
    110 This will create an NVM subsystem with two controllers. Having controllers
    111 linked to an ``nvme-subsys`` device allows additional ``nvme-ns`` parameters:
    112 
    113 ``shared`` (default: ``on`` since 6.2)
    114   Specifies that the namespace will be attached to all controllers in the
    115   subsystem. If set to ``off``, the namespace will remain a private namespace
    116   and may only be attached to a single controller at a time. Shared namespaces
    117   are always automatically attached to all controllers (also when controllers
    118   are hotplugged).
    119 
    120 ``detached`` (default: ``off``)
    121   If set to ``on``, the namespace will be be available in the subsystem, but
    122   not attached to any controllers initially. A shared namespace with this set
    123   to ``on`` will never be automatically attached to controllers.
    124 
    125 Thus, adding
    126 
    127 .. code-block:: console
    128 
    129    -drive file=nvm-1.img,if=none,id=nvm-1
    130    -device nvme-ns,drive=nvm-1,nsid=1
    131    -drive file=nvm-2.img,if=none,id=nvm-2
    132    -device nvme-ns,drive=nvm-2,nsid=3,shared=off,detached=on
    133 
    134 will cause NSID 1 will be a shared namespace that is initially attached to both
    135 controllers. NSID 3 will be a private namespace due to ``shared=off`` and only
    136 attachable to a single controller at a time. Additionally it will not be
    137 attached to any controller initially (due to ``detached=on``) or to hotplugged
    138 controllers.
    139 
    140 Optional Features
    141 =================
    142 
    143 Controller Memory Buffer
    144 ------------------------
    145 
    146 ``nvme`` device parameters related to the Controller Memory Buffer support:
    147 
    148 ``cmb_size_mb=UINT32`` (default: ``0``)
    149   This adds a Controller Memory Buffer of the given size at offset zero in BAR
    150   2.
    151 
    152 ``legacy-cmb`` (default: ``off``)
    153   By default, the device uses the "v1.4 scheme" for the Controller Memory
    154   Buffer support (i.e, the CMB is initially disabled and must be explicitly
    155   enabled by the host). Set this to ``on`` to behave as a v1.3 device wrt. the
    156   CMB.
    157 
    158 Simple Copy
    159 -----------
    160 
    161 The device includes support for TP 4065 ("Simple Copy Command"). A number of
    162 additional ``nvme-ns`` device parameters may be used to control the Copy
    163 command limits:
    164 
    165 ``mssrl=UINT16`` (default: ``128``)
    166   Set the Maximum Single Source Range Length (``MSSRL``). This is the maximum
    167   number of logical blocks that may be specified in each source range.
    168 
    169 ``mcl=UINT32`` (default: ``128``)
    170   Set the Maximum Copy Length (``MCL``). This is the maximum number of logical
    171   blocks that may be specified in a Copy command (the total for all source
    172   ranges).
    173 
    174 ``msrc=UINT8`` (default: ``127``)
    175   Set the Maximum Source Range Count (``MSRC``). This is the maximum number of
    176   source ranges that may be used in a Copy command. This is a 0's based value.
    177 
    178 Zoned Namespaces
    179 ----------------
    180 
    181 A namespaces may be "Zoned" as defined by TP 4053 ("Zoned Namespaces"). Set
    182 ``zoned=on`` on an ``nvme-ns`` device to configure it as a zoned namespace.
    183 
    184 The namespace may be configured with additional parameters
    185 
    186 ``zoned.zone_size=SIZE`` (default: ``128MiB``)
    187   Define the zone size (``ZSZE``).
    188 
    189 ``zoned.zone_capacity=SIZE`` (default: ``0``)
    190   Define the zone capacity (``ZCAP``). If left at the default (``0``), the zone
    191   capacity will equal the zone size.
    192 
    193 ``zoned.descr_ext_size=UINT32`` (default: ``0``)
    194   Set the Zone Descriptor Extension Size (``ZDES``). Must be a multiple of 64
    195   bytes.
    196 
    197 ``zoned.cross_read=BOOL`` (default: ``off``)
    198   Set to ``on`` to allow reads to cross zone boundaries.
    199 
    200 ``zoned.max_active=UINT32`` (default: ``0``)
    201   Set the maximum number of active resources (``MAR``). The default (``0``)
    202   allows all zones to be active.
    203 
    204 ``zoned.max_open=UINT32`` (default: ``0``)
    205   Set the maximum number of open resources (``MOR``). The default (``0``)
    206   allows all zones to be open. If ``zoned.max_active`` is specified, this value
    207   must be less than or equal to that.
    208 
    209 ``zoned.zasl=UINT8`` (default: ``0``)
    210   Set the maximum data transfer size for the Zone Append command. Like
    211   ``mdts``, the value is specified as a power of two (2^n) and is in units of
    212   the minimum memory page size (CAP.MPSMIN). The default value (``0``)
    213   has this property inherit the ``mdts`` value.
    214 
    215 Metadata
    216 --------
    217 
    218 The virtual namespace device supports LBA metadata in the form separate
    219 metadata (``MPTR``-based) and extended LBAs.
    220 
    221 ``ms=UINT16`` (default: ``0``)
    222   Defines the number of metadata bytes per LBA.
    223 
    224 ``mset=UINT8`` (default: ``0``)
    225   Set to ``1`` to enable extended LBAs.
    226 
    227 End-to-End Data Protection
    228 --------------------------
    229 
    230 The virtual namespace device supports DIF- and DIX-based protection information
    231 (depending on ``mset``).
    232 
    233 ``pi=UINT8`` (default: ``0``)
    234   Enable protection information of the specified type (type ``1``, ``2`` or
    235   ``3``).
    236 
    237 ``pil=UINT8`` (default: ``0``)
    238   Controls the location of the protection information within the metadata. Set
    239   to ``1`` to transfer protection information as the first eight bytes of
    240   metadata. Otherwise, the protection information is transferred as the last
    241   eight bytes.
    242 
    243 Virtualization Enhancements and SR-IOV (Experimental Support)
    244 -------------------------------------------------------------
    245 
    246 The ``nvme`` device supports Single Root I/O Virtualization and Sharing
    247 along with Virtualization Enhancements. The controller has to be linked to
    248 an NVM Subsystem device (``nvme-subsys``) for use with SR-IOV.
    249 
    250 A number of parameters are present (**please note, that they may be
    251 subject to change**):
    252 
    253 ``sriov_max_vfs`` (default: ``0``)
    254   Indicates the maximum number of PCIe virtual functions supported
    255   by the controller. Specifying a non-zero value enables reporting of both
    256   SR-IOV and ARI (Alternative Routing-ID Interpretation) capabilities
    257   by the NVMe device. Virtual function controllers will not report SR-IOV.
    258 
    259 ``sriov_vq_flexible``
    260   Indicates the total number of flexible queue resources assignable to all
    261   the secondary controllers. Implicitly sets the number of primary
    262   controller's private resources to ``(max_ioqpairs - sriov_vq_flexible)``.
    263 
    264 ``sriov_vi_flexible``
    265   Indicates the total number of flexible interrupt resources assignable to
    266   all the secondary controllers. Implicitly sets the number of primary
    267   controller's private resources to ``(msix_qsize - sriov_vi_flexible)``.
    268 
    269 ``sriov_max_vi_per_vf`` (default: ``0``)
    270   Indicates the maximum number of virtual interrupt resources assignable
    271   to a secondary controller. The default ``0`` resolves to
    272   ``(sriov_vi_flexible / sriov_max_vfs)``
    273 
    274 ``sriov_max_vq_per_vf`` (default: ``0``)
    275   Indicates the maximum number of virtual queue resources assignable to
    276   a secondary controller. The default ``0`` resolves to
    277   ``(sriov_vq_flexible / sriov_max_vfs)``
    278 
    279 The simplest possible invocation enables the capability to set up one VF
    280 controller and assign an admin queue, an IO queue, and a MSI-X interrupt.
    281 
    282 .. code-block:: console
    283 
    284    -device nvme-subsys,id=subsys0
    285    -device nvme,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,
    286     sriov_vq_flexible=2,sriov_vi_flexible=1
    287 
    288 The minimum steps required to configure a functional NVMe secondary
    289 controller are:
    290 
    291   * unbind flexible resources from the primary controller
    292 
    293 .. code-block:: console
    294 
    295    nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0
    296    nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0
    297 
    298   * perform a Function Level Reset on the primary controller to actually
    299     release the resources
    300 
    301 .. code-block:: console
    302 
    303    echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
    304 
    305   * enable VF
    306 
    307 .. code-block:: console
    308 
    309    echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
    310 
    311   * assign the flexible resources to the VF and set it ONLINE
    312 
    313 .. code-block:: console
    314 
    315    nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
    316    nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
    317    nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
    318 
    319   * bind the NVMe driver to the VF
    320 
    321 .. code-block:: console
    322 
    323    echo 0000:01:00.1 > /sys/bus/pci/drivers/nvme/bind