live-block-operations.rst - qemu - FORK: QEMU emulator

live-block-operations.rst (38221B)
      1 ..
      2     Copyright (C) 2017 Red Hat Inc.
      3 
      4     This work is licensed under the terms of the GNU GPL, version 2 or
      5     later.  See the COPYING file in the top-level directory.
      6 
      7 ============================
      8 Live Block Device Operations
      9 ============================
     10 
     11 QEMU Block Layer currently (as of QEMU 2.9) supports four major kinds of
     12 live block device jobs -- stream, commit, mirror, and backup.  These can
     13 be used to manipulate disk image chains to accomplish certain tasks,
     14 namely: live copy data from backing files into overlays; shorten long
     15 disk image chains by merging data from overlays into backing files; live
     16 synchronize data from a disk image chain (including current active disk)
     17 to another target image; and point-in-time (and incremental) backups of
     18 a block device.  Below is a description of the said block (QMP)
     19 primitives, and some (non-exhaustive list of) examples to illustrate
     20 their use.
     21 
     22 .. note::
     23     The file ``qapi/block-core.json`` in the QEMU source tree has the
     24     canonical QEMU API (QAPI) schema documentation for the QMP
     25     primitives discussed here.
     26 
     27 .. todo (kashyapc):: Remove the ".. contents::" directive when Sphinx is
     28                      integrated.
     29 
     30 .. contents::
     31 
     32 Disk image backing chain notation
     33 ---------------------------------
     34 
     35 A simple disk image chain.  (This can be created live using QMP
     36 ``blockdev-snapshot-sync``, or offline via ``qemu-img``)::
     37 
     38                    (Live QEMU)
     39                         |
     40                         .
     41                         V
     42 
     43             [A] <----- [B]
     44 
     45     (backing file)    (overlay)
     46 
     47 The arrow can be read as: Image [A] is the backing file of disk image
     48 [B].  And live QEMU is currently writing to image [B], consequently, it
     49 is also referred to as the "active layer".
     50 
     51 There are two kinds of terminology that are common when referring to
     52 files in a disk image backing chain:
     53 
     54 (1) Directional: 'base' and 'top'.  Given the simple disk image chain
     55     above, image [A] can be referred to as 'base', and image [B] as
     56     'top'.  (This terminology can be seen in the QAPI schema file,
     57     block-core.json.)
     58 
     59 (2) Relational: 'backing file' and 'overlay'.  Again, taking the same
     60     simple disk image chain from the above, disk image [A] is referred
     61     to as the backing file, and image [B] as overlay.
     62 
     63    Throughout this document, we will use the relational terminology.
     64 
     65 .. important::
     66     The overlay files can generally be any format that supports a
     67     backing file, although QCOW2 is the preferred format and the one
     68     used in this document.
     69 
     70 
     71 Brief overview of live block QMP primitives
     72 -------------------------------------------
     73 
     74 The following are the four different kinds of live block operations that
     75 QEMU block layer supports.
     76 
     77 (1) ``block-stream``: Live copy of data from backing files into overlay
     78     files.
     79 
     80     .. note:: Once the 'stream' operation has finished, three things to
     81               note:
     82 
     83                 (a) QEMU rewrites the backing chain to remove
     84                     reference to the now-streamed and redundant backing
     85                     file;
     86 
     87                 (b) the streamed file *itself* won't be removed by QEMU,
     88                     and must be explicitly discarded by the user;
     89 
     90                 (c) the streamed file remains valid -- i.e. further
     91                     overlays can be created based on it.  Refer the
     92                     ``block-stream`` section further below for more
     93                     details.
     94 
     95 (2) ``block-commit``: Live merge of data from overlay files into backing
     96     files (with the optional goal of removing the overlay file from the
     97     chain).  Since QEMU 2.0, this includes "active ``block-commit``"
     98     (i.e. merge the current active layer into the base image).
     99 
    100     .. note:: Once the 'commit' operation has finished, there are three
    101               things to note here as well:
    102 
    103                 (a) QEMU rewrites the backing chain to remove reference
    104                     to now-redundant overlay images that have been
    105                     committed into a backing file;
    106 
    107                 (b) the committed file *itself* won't be removed by QEMU
    108                     -- it ought to be manually removed;
    109 
    110                 (c) however, unlike in the case of ``block-stream``, the
    111                     intermediate images will be rendered invalid -- i.e.
    112                     no more further overlays can be created based on
    113                     them.  Refer the ``block-commit`` section further
    114                     below for more details.
    115 
    116 (3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize a running
    117     disk to another image.
    118 
    119 (4) ``blockdev-backup`` (and the deprecated ``drive-backup``):
    120     Point-in-time (live) copy of a block device to a destination.
    121 
    122 
    123 .. _`Interacting with a QEMU instance`:
    124 
    125 Interacting with a QEMU instance
    126 --------------------------------
    127 
    128 To show some example invocations of command-line, we will use the
    129 following invocation of QEMU, with a QMP server running over UNIX
    130 socket:
    131 
    132 .. parsed-literal::
    133 
    134   $ |qemu_system| -display none -no-user-config -nodefaults \\
    135     -m 512 -blockdev \\
    136     node-name=node-A,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./a.qcow2 \\
    137     -device virtio-blk,drive=node-A,id=virtio0 \\
    138     -monitor stdio -qmp unix:/tmp/qmp-sock,server=on,wait=off
    139 
    140 The ``-blockdev`` command-line option, used above, is available from
    141 QEMU 2.9 onwards.  In the above invocation, notice the ``node-name``
    142 parameter that is used to refer to the disk image a.qcow2 ('node-A') --
    143 this is a cleaner way to refer to a disk image (as opposed to referring
    144 to it by spelling out file paths).  So, we will continue to designate a
    145 ``node-name`` to each further disk image created (either via
    146 ``blockdev-snapshot-sync``, or ``blockdev-add``) as part of the disk
    147 image chain, and continue to refer to the disks using their
    148 ``node-name`` (where possible, because ``block-commit`` does not yet, as
    149 of QEMU 2.9, accept ``node-name`` parameter) when performing various
    150 block operations.
    151 
    152 To interact with the QEMU instance launched above, we will use the
    153 ``qmp-shell`` utility (located at: ``qemu/scripts/qmp``, as part of the
    154 QEMU source directory), which takes key-value pairs for QMP commands.
    155 Invoke it as below (which will also print out the complete raw JSON
    156 syntax for reference -- examples in the following sections)::
    157 
    158     $ ./qmp-shell -v -p /tmp/qmp-sock
    159     (QEMU)
    160 
    161 .. note::
    162     In the event we have to repeat a certain QMP command, we will: for
    163     the first occurrence of it, show the ``qmp-shell`` invocation, *and*
    164     the corresponding raw JSON QMP syntax; but for subsequent
    165     invocations, present just the ``qmp-shell`` syntax, and omit the
    166     equivalent JSON output.
    167 
    168 
    169 Example disk image chain
    170 ------------------------
    171 
    172 We will use the below disk image chain (and occasionally spelling it
    173 out where appropriate) when discussing various primitives::
    174 
    175     [A] <-- [B] <-- [C] <-- [D]
    176 
    177 Where [A] is the original base image; [B] and [C] are intermediate
    178 overlay images; image [D] is the active layer -- i.e. live QEMU is
    179 writing to it.  (The rule of thumb is: live QEMU will always be pointing
    180 to the rightmost image in a disk image chain.)
    181 
    182 The above image chain can be created by invoking
    183 ``blockdev-snapshot-sync`` commands as following (which shows the
    184 creation of overlay image [B]) using the ``qmp-shell`` (our invocation
    185 also prints the raw JSON invocation of it)::
    186 
    187     (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
    188     {
    189         "execute": "blockdev-snapshot-sync",
    190         "arguments": {
    191             "node-name": "node-A",
    192             "snapshot-file": "b.qcow2",
    193             "format": "qcow2",
    194             "snapshot-node-name": "node-B"
    195         }
    196     }
    197 
    198 Here, "node-A" is the name QEMU internally uses to refer to the base
    199 image [A] -- it is the backing file, based on which the overlay image,
    200 [B], is created.
    201 
    202 To create the rest of the overlay images, [C], and [D] (omitting the raw
    203 JSON output for brevity)::
    204 
    205     (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2
    206     (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2
    207 
    208 
    209 A note on points-in-time vs file names
    210 --------------------------------------
    211 
    212 In our disk image chain::
    213 
    214     [A] <-- [B] <-- [C] <-- [D]
    215 
    216 We have *three* points in time and an active layer:
    217 
    218 - Point 1: Guest state when [B] was created is contained in file [A]
    219 - Point 2: Guest state when [C] was created is contained in [A] + [B]
    220 - Point 3: Guest state when [D] was created is contained in
    221   [A] + [B] + [C]
    222 - Active layer: Current guest state is contained in [A] + [B] + [C] +
    223   [D]
    224 
    225 Therefore, be aware with naming choices:
    226 
    227 - Naming a file after the time it is created is misleading -- the
    228   guest data for that point in time is *not* contained in that file
    229   (as explained earlier)
    230 - Rather, think of files as a *delta* from the backing file
    231 
    232 
    233 Live block streaming --- ``block-stream``
    234 -----------------------------------------
    235 
    236 The ``block-stream`` command allows you to do live copy data from backing
    237 files into overlay images.
    238 
    239 Given our original example disk image chain from earlier::
    240 
    241     [A] <-- [B] <-- [C] <-- [D]
    242 
    243 The disk image chain can be shortened in one of the following different
    244 ways (not an exhaustive list).
    245 
    246 .. _`Case-1`:
    247 
    248 (1) Merge everything into the active layer: I.e. copy all contents from
    249     the base image, [A], and overlay images, [B] and [C], into [D],
    250     *while* the guest is running.  The resulting chain will be a
    251     standalone image, [D] -- with contents from [A], [B] and [C] merged
    252     into it (where live QEMU writes go to)::
    253 
    254         [D]
    255 
    256 .. _`Case-2`:
    257 
    258 (2) Taking the same example disk image chain mentioned earlier, merge
    259     only images [B] and [C] into [D], the active layer.  The result will
    260     be contents of images [B] and [C] will be copied into [D], and the
    261     backing file pointer of image [D] will be adjusted to point to image
    262     [A].  The resulting chain will be::
    263 
    264         [A] <-- [D]
    265 
    266 .. _`Case-3`:
    267 
    268 (3) Intermediate streaming (available since QEMU 2.8): Starting afresh
    269     with the original example disk image chain, with a total of four
    270     images, it is possible to copy contents from image [B] into image
    271     [C].  Once the copy is finished, image [B] can now be (optionally)
    272     discarded; and the backing file pointer of image [C] will be
    273     adjusted to point to [A].  I.e. after performing "intermediate
    274     streaming" of [B] into [C], the resulting image chain will be (where
    275     live QEMU is writing to [D])::
    276 
    277         [A] <-- [C] <-- [D]
    278 
    279 
    280 QMP invocation for ``block-stream``
    281 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    282 
    283 For `Case-1`_, to merge contents of all the backing files into the
    284 active layer, where 'node-D' is the current active image (by default
    285 ``block-stream`` will flatten the entire chain); ``qmp-shell`` (and its
    286 corresponding JSON output)::
    287 
    288     (QEMU) block-stream device=node-D job-id=job0
    289     {
    290         "execute": "block-stream",
    291         "arguments": {
    292             "device": "node-D",
    293             "job-id": "job0"
    294         }
    295     }
    296 
    297 For `Case-2`_, merge contents of the images [B] and [C] into [D], where
    298 image [D] ends up referring to image [A] as its backing file::
    299 
    300     (QEMU) block-stream device=node-D base-node=node-A job-id=job0
    301 
    302 And for `Case-3`_, of "intermediate" streaming", merge contents of
    303 images [B] into [C], where [C] ends up referring to [A] as its backing
    304 image::
    305 
    306     (QEMU) block-stream device=node-C base-node=node-A job-id=job0
    307 
    308 Progress of a ``block-stream`` operation can be monitored via the QMP
    309 command::
    310 
    311     (QEMU) query-block-jobs
    312     {
    313         "execute": "query-block-jobs",
    314         "arguments": {}
    315     }
    316 
    317 
    318 Once the ``block-stream`` operation has completed, QEMU will emit an
    319 event, ``BLOCK_JOB_COMPLETED``.  The intermediate overlays remain valid,
    320 and can now be (optionally) discarded, or retained to create further
    321 overlays based on them.  Finally, the ``block-stream`` jobs can be
    322 restarted at anytime.
    323 
    324 
    325 Live block commit --- ``block-commit``
    326 --------------------------------------
    327 
    328 The ``block-commit`` command lets you merge live data from overlay
    329 images into backing file(s).  Since QEMU 2.0, this includes "live active
    330 commit" (i.e. it is possible to merge the "active layer", the right-most
    331 image in a disk image chain where live QEMU will be writing to, into the
    332 base image).  This is analogous to ``block-stream``, but in the opposite
    333 direction.
    334 
    335 Again, starting afresh with our example disk image chain, where live
    336 QEMU is writing to the right-most image in the chain, [D]::
    337 
    338     [A] <-- [B] <-- [C] <-- [D]
    339 
    340 The disk image chain can be shortened in one of the following ways:
    341 
    342 .. _`block-commit_Case-1`:
    343 
    344 (1) Commit content from only image [B] into image [A].  The resulting
    345     chain is the following, where image [C] is adjusted to point at [A]
    346     as its new backing file::
    347 
    348         [A] <-- [C] <-- [D]
    349 
    350 (2) Commit content from images [B] and [C] into image [A].  The
    351     resulting chain, where image [D] is adjusted to point to image [A]
    352     as its new backing file::
    353 
    354         [A] <-- [D]
    355 
    356 .. _`block-commit_Case-3`:
    357 
    358 (3) Commit content from images [B], [C], and the active layer [D] into
    359     image [A].  The resulting chain (in this case, a consolidated single
    360     image)::
    361 
    362         [A]
    363 
    364 (4) Commit content from image only image [C] into image [B].  The
    365     resulting chain::
    366 
    367 	[A] <-- [B] <-- [D]
    368 
    369 (5) Commit content from image [C] and the active layer [D] into image
    370     [B].  The resulting chain::
    371 
    372 	[A] <-- [B]
    373 
    374 
    375 QMP invocation for ``block-commit``
    376 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    377 
    378 For :ref:`Case-1 <block-commit_Case-1>`, to merge contents only from
    379 image [B] into image [A], the invocation is as follows::
    380 
    381     (QEMU) block-commit device=node-D base=a.qcow2 top=b.qcow2 job-id=job0
    382     {
    383         "execute": "block-commit",
    384         "arguments": {
    385             "device": "node-D",
    386             "job-id": "job0",
    387             "top": "b.qcow2",
    388             "base": "a.qcow2"
    389         }
    390     }
    391 
    392 Once the above ``block-commit`` operation has completed, a
    393 ``BLOCK_JOB_COMPLETED`` event will be issued, and no further action is
    394 required.  As the end result, the backing file of image [C] is adjusted
    395 to point to image [A], and the original 4-image chain will end up being
    396 transformed to::
    397 
    398     [A] <-- [C] <-- [D]
    399 
    400 .. note::
    401     The intermediate image [B] is invalid (as in: no more further
    402     overlays based on it can be created).
    403 
    404     Reasoning: An intermediate image after a 'stream' operation still
    405     represents that old point-in-time, and may be valid in that context.
    406     However, an intermediate image after a 'commit' operation no longer
    407     represents any point-in-time, and is invalid in any context.
    408 
    409 
    410 However, :ref:`Case-3 <block-commit_Case-3>` (also called: "active
    411 ``block-commit``") is a *two-phase* operation: In the first phase, the
    412 content from the active overlay, along with the intermediate overlays,
    413 is copied into the backing file (also called the base image).  In the
    414 second phase, adjust the said backing file as the current active image
    415 -- possible via issuing the command ``block-job-complete``.  Optionally,
    416 the ``block-commit`` operation can be cancelled by issuing the command
    417 ``block-job-cancel``, but be careful when doing this.
    418 
    419 Once the ``block-commit`` operation has completed, the event
    420 ``BLOCK_JOB_READY`` will be emitted, signalling that the synchronization
    421 has finished.  Now the job can be gracefully completed by issuing the
    422 command ``block-job-complete`` -- until such a command is issued, the
    423 'commit' operation remains active.
    424 
    425 The following is the flow for :ref:`Case-3 <block-commit_Case-3>` to
    426 convert a disk image chain such as this::
    427 
    428     [A] <-- [B] <-- [C] <-- [D]
    429 
    430 Into::
    431 
    432     [A]
    433 
    434 Where content from all the subsequent overlays, [B], and [C], including
    435 the active layer, [D], is committed back to [A] -- which is where live
    436 QEMU is performing all its current writes).
    437 
    438 Start the "active ``block-commit``" operation::
    439 
    440     (QEMU) block-commit device=node-D base=a.qcow2 top=d.qcow2 job-id=job0
    441     {
    442         "execute": "block-commit",
    443         "arguments": {
    444             "device": "node-D",
    445             "job-id": "job0",
    446             "top": "d.qcow2",
    447             "base": "a.qcow2"
    448         }
    449     }
    450 
    451 
    452 Once the synchronization has completed, the event ``BLOCK_JOB_READY`` will
    453 be emitted.
    454 
    455 Then, optionally query for the status of the active block operations.
    456 We can see the 'commit' job is now ready to be completed, as indicated
    457 by the line *"ready": true*::
    458 
    459     (QEMU) query-block-jobs
    460     {
    461         "execute": "query-block-jobs",
    462         "arguments": {}
    463     }
    464     {
    465         "return": [
    466             {
    467                 "busy": false,
    468                 "type": "commit",
    469                 "len": 1376256,
    470                 "paused": false,
    471                 "ready": true,
    472                 "io-status": "ok",
    473                 "offset": 1376256,
    474                 "device": "job0",
    475                 "speed": 0
    476             }
    477         ]
    478     }
    479 
    480 Gracefully complete the 'commit' block device job::
    481 
    482     (QEMU) block-job-complete device=job0
    483     {
    484         "execute": "block-job-complete",
    485         "arguments": {
    486             "device": "job0"
    487         }
    488     }
    489     {
    490         "return": {}
    491     }
    492 
    493 Finally, once the above job is completed, an event
    494 ``BLOCK_JOB_COMPLETED`` will be emitted.
    495 
    496 .. note::
    497     The invocation for rest of the cases (2, 4, and 5), discussed in the
    498     previous section, is omitted for brevity.
    499 
    500 
    501 Live disk synchronization --- ``drive-mirror`` and ``blockdev-mirror``
    502 ----------------------------------------------------------------------
    503 
    504 Synchronize a running disk image chain (all or part of it) to a target
    505 image.
    506 
    507 Again, given our familiar disk image chain::
    508 
    509     [A] <-- [B] <-- [C] <-- [D]
    510 
    511 The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``)
    512 allows you to copy data from the entire chain into a single target image
    513 (which can be located on a different host), [E].
    514 
    515 .. note::
    516 
    517     When you cancel an in-progress 'mirror' job *before* the source and
    518     target are synchronized, ``block-job-cancel`` will emit the event
    519     ``BLOCK_JOB_CANCELLED``.  However, note that if you cancel a
    520     'mirror' job *after* it has indicated (via the event
    521     ``BLOCK_JOB_READY``) that the source and target have reached
    522     synchronization, then the event emitted by ``block-job-cancel``
    523     changes to ``BLOCK_JOB_COMPLETED``.
    524 
    525     Besides the 'mirror' job, the "active ``block-commit``" is the only
    526     other block device job that emits the event ``BLOCK_JOB_READY``.
    527     The rest of the block device jobs ('stream', "non-active
    528     ``block-commit``", and 'backup') end automatically.
    529 
    530 So there are two possible actions to take, after a 'mirror' job has
    531 emitted the event ``BLOCK_JOB_READY``, indicating that the source and
    532 target have reached synchronization:
    533 
    534 (1) Issuing the command ``block-job-cancel`` (after it emits the event
    535     ``BLOCK_JOB_COMPLETED``) will create a point-in-time (which is at
    536     the time of *triggering* the cancel command) copy of the entire disk
    537     image chain (or only the top-most image, depending on the ``sync``
    538     mode), contained in the target image [E]. One use case for this is
    539     live VM migration with non-shared storage.
    540 
    541 (2) Issuing the command ``block-job-complete`` (after it emits the event
    542     ``BLOCK_JOB_COMPLETED``) will adjust the guest device (i.e. live
    543     QEMU) to point to the target image, [E], causing all the new writes
    544     from this point on to happen there.
    545 
    546 About synchronization modes: The synchronization mode determines
    547 *which* part of the disk image chain will be copied to the target.
    548 Currently, there are four different kinds:
    549 
    550 (1) ``full`` -- Synchronize the content of entire disk image chain to
    551     the target
    552 
    553 (2) ``top`` -- Synchronize only the contents of the top-most disk image
    554     in the chain to the target
    555 
    556 (3) ``none`` -- Synchronize only the new writes from this point on.
    557 
    558     .. note:: In the case of ``blockdev-backup`` (or deprecated
    559               ``drive-backup``), the behavior of ``none``
    560               synchronization mode is different.  Normally, a
    561               ``backup`` job consists of two parts: Anything that is
    562               overwritten by the guest is first copied out to the
    563               backup, and in the background the whole image is copied
    564               from start to end. With ``sync=none``, it's only the
    565               first part.
    566 
    567 (4) ``incremental`` -- Synchronize content that is described by the
    568     dirty bitmap
    569 
    570 .. note::
    571     Refer to the :doc:`bitmaps` document in the QEMU source
    572     tree to learn about the detailed workings of the ``incremental``
    573     synchronization mode.
    574 
    575 
    576 QMP invocation for ``drive-mirror``
    577 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    578 
    579 To copy the contents of the entire disk image chain, from [A] all the
    580 way to [D], to a new target (``drive-mirror`` will create the destination
    581 file, if it doesn't already exist), call it [E]::
    582 
    583     (QEMU) drive-mirror device=node-D target=e.qcow2 sync=full job-id=job0
    584     {
    585         "execute": "drive-mirror",
    586         "arguments": {
    587             "device": "node-D",
    588             "job-id": "job0",
    589             "target": "e.qcow2",
    590             "sync": "full"
    591         }
    592     }
    593 
    594 The ``"sync": "full"``, from the above, means: copy the *entire* chain
    595 to the destination.
    596 
    597 Following the above, querying for active block jobs will show that a
    598 'mirror' job is "ready" to be completed (and QEMU will also emit an
    599 event, ``BLOCK_JOB_READY``)::
    600 
    601     (QEMU) query-block-jobs
    602     {
    603         "execute": "query-block-jobs",
    604         "arguments": {}
    605     }
    606     {
    607         "return": [
    608             {
    609                 "busy": false,
    610                 "type": "mirror",
    611                 "len": 21757952,
    612                 "paused": false,
    613                 "ready": true,
    614                 "io-status": "ok",
    615                 "offset": 21757952,
    616                 "device": "job0",
    617                 "speed": 0
    618             }
    619         ]
    620     }
    621 
    622 And, as noted in the previous section, there are two possible actions
    623 at this point:
    624 
    625 (a) Create a point-in-time snapshot by ending the synchronization.  The
    626     point-in-time is at the time of *ending* the sync.  (The result of
    627     the following being: the target image, [E], will be populated with
    628     content from the entire chain, [A] to [D])::
    629 
    630         (QEMU) block-job-cancel device=job0
    631         {
    632             "execute": "block-job-cancel",
    633             "arguments": {
    634                 "device": "job0"
    635             }
    636         }
    637 
    638 (b) Or, complete the operation and pivot the live QEMU to the target
    639     copy::
    640 
    641         (QEMU) block-job-complete device=job0
    642 
    643 In either of the above cases, if you once again run the
    644 ``query-block-jobs`` command, there should not be any active block
    645 operation.
    646 
    647 Comparing 'commit' and 'mirror': In both then cases, the overlay images
    648 can be discarded.  However, with 'commit', the *existing* base image
    649 will be modified (by updating it with contents from overlays); while in
    650 the case of 'mirror', a *new* target image is populated with the data
    651 from the disk image chain.
    652 
    653 
    654 QMP invocation for live storage migration with ``drive-mirror`` + NBD
    655 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    656 
    657 Live storage migration (without shared storage setup) is one of the most
    658 common use-cases that takes advantage of the ``drive-mirror`` primitive
    659 and QEMU's built-in Network Block Device (NBD) server.  Here's a quick
    660 walk-through of this setup.
    661 
    662 Given the disk image chain::
    663 
    664     [A] <-- [B] <-- [C] <-- [D]
    665 
    666 Instead of copying content from the entire chain, synchronize *only* the
    667 contents of the *top*-most disk image (i.e. the active layer), [D], to a
    668 target, say, [TargetDisk].
    669 
    670 .. important::
    671     The destination host must already have the contents of the backing
    672     chain, involving images [A], [B], and [C], visible via other means
    673     -- whether by ``cp``, ``rsync``, or by some storage array-specific
    674     command.)
    675 
    676 Sometimes, this is also referred to as "shallow copy" -- because only
    677 the "active layer", and not the rest of the image chain, is copied to
    678 the destination.
    679 
    680 .. note::
    681     In this example, for the sake of simplicity, we'll be using the same
    682     ``localhost`` as both source and destination.
    683 
    684 As noted earlier, on the destination host the contents of the backing
    685 chain -- from images [A] to [C] -- are already expected to exist in some
    686 form (e.g. in a file called, ``Contents-of-A-B-C.qcow2``).  Now, on the
    687 destination host, let's create a target overlay image (with the image
    688 ``Contents-of-A-B-C.qcow2`` as its backing file), to which the contents
    689 of image [D] (from the source QEMU) will be mirrored to::
    690 
    691     $ qemu-img create -f qcow2 -b ./Contents-of-A-B-C.qcow2 \
    692         -F qcow2 ./target-disk.qcow2
    693 
    694 And start the destination QEMU (we already have the source QEMU running
    695 -- discussed in the section: `Interacting with a QEMU instance`_)
    696 instance, with the following invocation.  (As noted earlier, for
    697 simplicity's sake, the destination QEMU is started on the same host, but
    698 it could be located elsewhere):
    699 
    700 .. parsed-literal::
    701 
    702   $ |qemu_system| -display none -no-user-config -nodefaults \\
    703     -m 512 -blockdev \\
    704     node-name=node-TargetDisk,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./target-disk.qcow2 \\
    705     -device virtio-blk,drive=node-TargetDisk,id=virtio0 \\
    706     -S -monitor stdio -qmp unix:./qmp-sock2,server=on,wait=off \\
    707     -incoming tcp:localhost:6666
    708 
    709 Given the disk image chain on source QEMU::
    710 
    711     [A] <-- [B] <-- [C] <-- [D]
    712 
    713 On the destination host, it is expected that the contents of the chain
    714 ``[A] <-- [B] <-- [C]`` are *already* present, and therefore copy *only*
    715 the content of image [D].
    716 
    717 (1) [On *destination* QEMU] As part of the first step, start the
    718     built-in NBD server on a given host (local host, represented by
    719     ``::``)and port::
    720 
    721         (QEMU) nbd-server-start addr={"type":"inet","data":{"host":"::","port":"49153"}}
    722         {
    723             "execute": "nbd-server-start",
    724             "arguments": {
    725                 "addr": {
    726                     "data": {
    727                         "host": "::",
    728                         "port": "49153"
    729                     },
    730                     "type": "inet"
    731                 }
    732             }
    733         }
    734 
    735 (2) [On *destination* QEMU] And export the destination disk image using
    736     QEMU's built-in NBD server::
    737 
    738         (QEMU) nbd-server-add device=node-TargetDisk writable=true
    739         {
    740             "execute": "nbd-server-add",
    741             "arguments": {
    742                 "device": "node-TargetDisk"
    743             }
    744         }
    745 
    746 (3) [On *source* QEMU] Then, invoke ``drive-mirror`` (NB: since we're
    747     running ``drive-mirror`` with ``mode=existing`` (meaning:
    748     synchronize to a pre-created file, therefore 'existing', file on the
    749     target host), with the synchronization mode as 'top' (``"sync:
    750     "top"``)::
    751 
    752         (QEMU) drive-mirror device=node-D target=nbd:localhost:49153:exportname=node-TargetDisk sync=top mode=existing job-id=job0
    753         {
    754             "execute": "drive-mirror",
    755             "arguments": {
    756                 "device": "node-D",
    757                 "mode": "existing",
    758                 "job-id": "job0",
    759                 "target": "nbd:localhost:49153:exportname=node-TargetDisk",
    760                 "sync": "top"
    761             }
    762         }
    763 
    764 (4) [On *source* QEMU] Once ``drive-mirror`` copies the entire data, and the
    765     event ``BLOCK_JOB_READY`` is emitted, issue ``block-job-cancel`` to
    766     gracefully end the synchronization, from source QEMU::
    767 
    768         (QEMU) block-job-cancel device=job0
    769         {
    770             "execute": "block-job-cancel",
    771             "arguments": {
    772                 "device": "job0"
    773             }
    774         }
    775 
    776 (5) [On *destination* QEMU] Then, stop the NBD server::
    777 
    778         (QEMU) nbd-server-stop
    779         {
    780             "execute": "nbd-server-stop",
    781             "arguments": {}
    782         }
    783 
    784 (6) [On *destination* QEMU] Finally, resume the guest vCPUs by issuing the
    785     QMP command ``cont``::
    786 
    787         (QEMU) cont
    788         {
    789             "execute": "cont",
    790             "arguments": {}
    791         }
    792 
    793 .. note::
    794     Higher-level libraries (e.g. libvirt) automate the entire above
    795     process (although note that libvirt does not allow same-host
    796     migrations to localhost for other reasons).
    797 
    798 
    799 Notes on ``blockdev-mirror``
    800 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    801 
    802 The ``blockdev-mirror`` command is equivalent in core functionality to
    803 ``drive-mirror``, except that it operates at node-level in a BDS graph.
    804 
    805 Also: for ``blockdev-mirror``, the 'target' image needs to be explicitly
    806 created (using ``qemu-img``) and attach it to live QEMU via
    807 ``blockdev-add``, which assigns a name to the to-be created target node.
    808 
    809 E.g. the sequence of actions to create a point-in-time backup of an
    810 entire disk image chain, to a target, using ``blockdev-mirror`` would be:
    811 
    812 (0) Create the QCOW2 overlays, to arrive at a backing chain of desired
    813     depth
    814 
    815 (1) Create the target image (using ``qemu-img``), say, ``e.qcow2``
    816 
    817 (2) Attach the above created file (``e.qcow2``), run-time, using
    818     ``blockdev-add`` to QEMU
    819 
    820 (3) Perform ``blockdev-mirror`` (use ``"sync": "full"`` to copy the
    821     entire chain to the target).  And notice the event
    822     ``BLOCK_JOB_READY``
    823 
    824 (4) Optionally, query for active block jobs, there should be a 'mirror'
    825     job ready to be completed
    826 
    827 (5) Gracefully complete the 'mirror' block device job, and notice the
    828     event ``BLOCK_JOB_COMPLETED``
    829 
    830 (6) Shutdown the guest by issuing the QMP ``quit`` command so that
    831     caches are flushed
    832 
    833 (7) Then, finally, compare the contents of the disk image chain, and
    834     the target copy with ``qemu-img compare``.  You should notice:
    835     "Images are identical"
    836 
    837 
    838 QMP invocation for ``blockdev-mirror``
    839 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    840 
    841 Given the disk image chain::
    842 
    843     [A] <-- [B] <-- [C] <-- [D]
    844 
    845 To copy the contents of the entire disk image chain, from [A] all the
    846 way to [D], to a new target, call it [E].  The following is the flow.
    847 
    848 Create the overlay images, [B], [C], and [D]::
    849 
    850     (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
    851     (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2
    852     (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2
    853 
    854 Create the target image, [E]::
    855 
    856     $ qemu-img create -f qcow2 e.qcow2 39M
    857 
    858 Add the above created target image to QEMU, via ``blockdev-add``::
    859 
    860     (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"}
    861     {
    862         "execute": "blockdev-add",
    863         "arguments": {
    864             "node-name": "node-E",
    865             "driver": "qcow2",
    866             "file": {
    867                 "driver": "file",
    868                 "filename": "e.qcow2"
    869             }
    870         }
    871     }
    872 
    873 Perform ``blockdev-mirror``, and notice the event ``BLOCK_JOB_READY``::
    874 
    875     (QEMU) blockdev-mirror device=node-B target=node-E sync=full job-id=job0
    876     {
    877         "execute": "blockdev-mirror",
    878         "arguments": {
    879             "device": "node-D",
    880             "job-id": "job0",
    881             "target": "node-E",
    882             "sync": "full"
    883         }
    884     }
    885 
    886 Query for active block jobs, there should be a 'mirror' job ready::
    887 
    888     (QEMU) query-block-jobs
    889     {
    890         "execute": "query-block-jobs",
    891         "arguments": {}
    892     }
    893     {
    894         "return": [
    895             {
    896                 "busy": false,
    897                 "type": "mirror",
    898                 "len": 21561344,
    899                 "paused": false,
    900                 "ready": true,
    901                 "io-status": "ok",
    902                 "offset": 21561344,
    903                 "device": "job0",
    904                 "speed": 0
    905             }
    906         ]
    907     }
    908 
    909 Gracefully complete the block device job operation, and notice the
    910 event ``BLOCK_JOB_COMPLETED``::
    911 
    912     (QEMU) block-job-complete device=job0
    913     {
    914         "execute": "block-job-complete",
    915         "arguments": {
    916             "device": "job0"
    917         }
    918     }
    919     {
    920         "return": {}
    921     }
    922 
    923 Shutdown the guest, by issuing the ``quit`` QMP command::
    924 
    925     (QEMU) quit
    926     {
    927         "execute": "quit",
    928         "arguments": {}
    929     }
    930 
    931 
    932 Live disk backup --- ``blockdev-backup`` and the deprecated``drive-backup``
    933 ---------------------------------------------------------------------------
    934 
    935 The ``blockdev-backup`` (and the deprecated ``drive-backup``) allows
    936 you to create a point-in-time snapshot.
    937 
    938 In this case, the point-in-time is when you *start* the
    939 ``blockdev-backup`` (or deprecated ``drive-backup``) command.
    940 
    941 
    942 QMP invocation for ``drive-backup``
    943 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    944 
    945 Note that ``drive-backup`` command is deprecated since QEMU 6.2 and
    946 will be removed in future.
    947 
    948 Yet again, starting afresh with our example disk image chain::
    949 
    950     [A] <-- [B] <-- [C] <-- [D]
    951 
    952 To create a target image [E], with content populated from image [A] to
    953 [D], from the above chain, the following is the syntax.  (If the target
    954 image does not exist, ``drive-backup`` will create it)::
    955 
    956     (QEMU) drive-backup device=node-D sync=full target=e.qcow2 job-id=job0
    957     {
    958         "execute": "drive-backup",
    959         "arguments": {
    960             "device": "node-D",
    961             "job-id": "job0",
    962             "sync": "full",
    963             "target": "e.qcow2"
    964         }
    965     }
    966 
    967 Once the above ``drive-backup`` has completed, a ``BLOCK_JOB_COMPLETED`` event
    968 will be issued, indicating the live block device job operation has
    969 completed, and no further action is required.
    970 
    971 
    972 Moving from the deprecated ``drive-backup`` to newer ``blockdev-backup``
    973 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    974 
    975 ``blockdev-backup`` differs from ``drive-backup`` in how you specify
    976 the backup target. With ``blockdev-backup`` you can't specify filename
    977 as a target.  Instead you use ``node-name`` of existing block node,
    978 which you may add by ``blockdev-add`` or ``blockdev-create`` commands.
    979 Correspondingly, ``blockdev-backup`` doesn't have ``mode`` and
    980 ``format`` arguments which don't apply to an existing block node. See
    981 following sections for details and examples.
    982 
    983 
    984 Notes on ``blockdev-backup``
    985 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    986 
    987 The ``blockdev-backup`` command operates at node-level in a Block Driver
    988 State (BDS) graph.
    989 
    990 E.g. the sequence of actions to create a point-in-time backup
    991 of an entire disk image chain, to a target, using ``blockdev-backup``
    992 would be:
    993 
    994 (0) Create the QCOW2 overlays, to arrive at a backing chain of desired
    995     depth
    996 
    997 (1) Create the target image (using ``qemu-img``), say, ``e.qcow2``
    998 
    999 (2) Attach the above created file (``e.qcow2``), run-time, using
   1000     ``blockdev-add`` to QEMU
   1001 
   1002 (3) Perform ``blockdev-backup`` (use ``"sync": "full"`` to copy the
   1003     entire chain to the target).  And notice the event
   1004     ``BLOCK_JOB_COMPLETED``
   1005 
   1006 (4) Shutdown the guest, by issuing the QMP ``quit`` command, so that
   1007     caches are flushed
   1008 
   1009 (5) Then, finally, compare the contents of the disk image chain, and
   1010     the target copy with ``qemu-img compare``.  You should notice:
   1011     "Images are identical"
   1012 
   1013 The following section shows an example QMP invocation for
   1014 ``blockdev-backup``.
   1015 
   1016 QMP invocation for ``blockdev-backup``
   1017 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1018 
   1019 Given a disk image chain of depth 1 where image [B] is the active
   1020 overlay (live QEMU is writing to it)::
   1021 
   1022     [A] <-- [B]
   1023 
   1024 The following is the procedure to copy the content from the entire chain
   1025 to a target image (say, [E]), which has the full content from [A] and
   1026 [B].
   1027 
   1028 Create the overlay [B]::
   1029 
   1030     (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
   1031     {
   1032         "execute": "blockdev-snapshot-sync",
   1033         "arguments": {
   1034             "node-name": "node-A",
   1035             "snapshot-file": "b.qcow2",
   1036             "format": "qcow2",
   1037             "snapshot-node-name": "node-B"
   1038         }
   1039     }
   1040 
   1041 
   1042 Create a target image that will contain the copy::
   1043 
   1044     $ qemu-img create -f qcow2 e.qcow2 39M
   1045 
   1046 Then add it to QEMU via ``blockdev-add``::
   1047 
   1048     (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"}
   1049     {
   1050         "execute": "blockdev-add",
   1051         "arguments": {
   1052             "node-name": "node-E",
   1053             "driver": "qcow2",
   1054             "file": {
   1055                 "driver": "file",
   1056                 "filename": "e.qcow2"
   1057             }
   1058         }
   1059     }
   1060 
   1061 Then invoke ``blockdev-backup`` to copy the contents from the entire
   1062 image chain, consisting of images [A] and [B] to the target image
   1063 'e.qcow2'::
   1064 
   1065     (QEMU) blockdev-backup device=node-B target=node-E sync=full job-id=job0
   1066     {
   1067         "execute": "blockdev-backup",
   1068         "arguments": {
   1069             "device": "node-B",
   1070             "job-id": "job0",
   1071             "target": "node-E",
   1072             "sync": "full"
   1073         }
   1074     }
   1075 
   1076 Once the above 'backup' operation has completed, the event,
   1077 ``BLOCK_JOB_COMPLETED`` will be emitted, signalling successful
   1078 completion.
   1079 
   1080 Next, query for any active block device jobs (there should be none)::
   1081 
   1082     (QEMU) query-block-jobs
   1083     {
   1084         "execute": "query-block-jobs",
   1085         "arguments": {}
   1086     }
   1087 
   1088 Shutdown the guest::
   1089 
   1090     (QEMU) quit
   1091     {
   1092             "execute": "quit",
   1093                 "arguments": {}
   1094     }
   1095             "return": {}
   1096     }
   1097 
   1098 .. note::
   1099     The above step is really important; if forgotten, an error, "Failed
   1100     to get shared "write" lock on e.qcow2", will be thrown when you do
   1101     ``qemu-img compare`` to verify the integrity of the disk image
   1102     with the backup content.
   1103 
   1104 
   1105 The end result will be the image 'e.qcow2' containing a
   1106 point-in-time backup of the disk image chain -- i.e. contents from
   1107 images [A] and [B] at the time the ``blockdev-backup`` command was
   1108 initiated.
   1109 
   1110 One way to confirm the backup disk image contains the identical content
   1111 with the disk image chain is to compare the backup and the contents of
   1112 the chain, you should see "Images are identical".  (NB: this is assuming
   1113 QEMU was launched with ``-S`` option, which will not start the CPUs at
   1114 guest boot up)::
   1115 
   1116     $ qemu-img compare b.qcow2 e.qcow2
   1117     Warning: Image size mismatch!
   1118     Images are identical.
   1119 
   1120 NOTE: The "Warning: Image size mismatch!" is expected, as we created the
   1121 target image (e.qcow2) with 39M size.
	qemu FORK: QEMU emulator
	git clone https://git.neptards.moe/neptards/qemu.git
	Log \| Files \| Refs \| Submodules \| LICENSE