mirror of https://gitlab.com/qemu-project/qemu
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
220 lines
8.4 KiB
ReStructuredText
220 lines
8.4 KiB
ReStructuredText
===================================
|
|
QED Image File Format Specification
|
|
===================================
|
|
|
|
The file format looks like this::
|
|
|
|
+----------+----------+----------+-----+
|
|
| cluster0 | cluster1 | cluster2 | ... |
|
|
+----------+----------+----------+-----+
|
|
|
|
The first cluster begins with the ``header``. The header contains information
|
|
about where regular clusters start; this allows the header to be extensible and
|
|
store extra information about the image file. A regular cluster may be
|
|
a ``data cluster``, an ``L2``, or an ``L1 table``. L1 and L2 tables are composed
|
|
of one or more contiguous clusters.
|
|
|
|
Normally the file size will be a multiple of the cluster size. If the file size
|
|
is not a multiple, extra information after the last cluster may not be preserved
|
|
if data is written. Legitimate extra information should use space between the header
|
|
and the first regular cluster.
|
|
|
|
All fields are little-endian.
|
|
|
|
Header
|
|
------
|
|
|
|
::
|
|
|
|
Header {
|
|
uint32_t magic; /* QED\0 */
|
|
|
|
uint32_t cluster_size; /* in bytes */
|
|
uint32_t table_size; /* for L1 and L2 tables, in clusters */
|
|
uint32_t header_size; /* in clusters */
|
|
|
|
uint64_t features; /* format feature bits */
|
|
uint64_t compat_features; /* compat feature bits */
|
|
uint64_t autoclear_features; /* self-resetting feature bits */
|
|
|
|
uint64_t l1_table_offset; /* in bytes */
|
|
uint64_t image_size; /* total logical image size, in bytes */
|
|
|
|
/* if (features & QED_F_BACKING_FILE) */
|
|
uint32_t backing_filename_offset; /* in bytes from start of header */
|
|
uint32_t backing_filename_size; /* in bytes */
|
|
}
|
|
|
|
Field descriptions:
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
- ``cluster_size`` must be a power of 2 in range [2^12, 2^26].
|
|
- ``table_size`` must be a power of 2 in range [1, 16].
|
|
- ``header_size`` is the number of clusters used by the header and any additional
|
|
information stored before regular clusters.
|
|
- ``features``, ``compat_features``, and ``autoclear_features`` are file format
|
|
extension bitmaps. They work as follows:
|
|
|
|
- An image with unknown ``features`` bits enabled must not be opened. File format
|
|
changes that are not backwards-compatible must use ``features`` bits.
|
|
- An image with unknown ``compat_features`` bits enabled can be opened safely.
|
|
The unknown features are simply ignored and represent backwards-compatible
|
|
changes to the file format.
|
|
- An image with unknown ``autoclear_features`` bits enable can be opened safely
|
|
after clearing the unknown bits. This allows for backwards-compatible changes
|
|
to the file format which degrade gracefully and can be re-enabled again by a
|
|
new program later.
|
|
- ``l1_table_offset`` is the offset of the first byte of the L1 table in the image
|
|
file and must be a multiple of ``cluster_size``.
|
|
- ``image_size`` is the block device size seen by the guest and must be a multiple
|
|
of 512 bytes.
|
|
- ``backing_filename_offset`` and ``backing_filename_size`` describe a string in
|
|
(byte offset, byte size) form. It is not NUL-terminated and has no alignment constraints.
|
|
The string must be stored within the first ``header_size`` clusters. The backing filename
|
|
may be an absolute path or relative to the image file.
|
|
|
|
Feature bits:
|
|
~~~~~~~~~~~~~
|
|
|
|
- ``QED_F_BACKING_FILE = 0x01``. The image uses a backing file.
|
|
- ``QED_F_NEED_CHECK = 0x02``. The image needs a consistency check before use.
|
|
- ``QED_F_BACKING_FORMAT_NO_PROBE = 0x04``. The backing file is a raw disk image
|
|
and no file format autodetection should be attempted. This should be used to
|
|
ensure that raw backing files are never detected as an image format if they happen
|
|
to contain magic constants.
|
|
|
|
There are currently no defined ``compat_features`` or ``autoclear_features`` bits.
|
|
|
|
Fields predicated on a feature bit are only used when that feature is set.
|
|
The fields always take up header space, regardless of whether or not the feature
|
|
bit is set.
|
|
|
|
Tables
|
|
------
|
|
|
|
Tables provide the translation from logical offsets in the block device to cluster
|
|
offsets in the file.
|
|
|
|
::
|
|
|
|
#define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
|
|
|
|
Table {
|
|
uint64_t offsets[TABLE_NOFFSETS];
|
|
}
|
|
|
|
The tables are organized as follows::
|
|
|
|
+----------+
|
|
| L1 table |
|
|
+----------+
|
|
,------' | '------.
|
|
+----------+ | +----------+
|
|
| L2 table | ... | L2 table |
|
|
+----------+ +----------+
|
|
,------' | '------.
|
|
+----------+ | +----------+
|
|
| Data | ... | Data |
|
|
+----------+ +----------+
|
|
|
|
A table is made up of one or more contiguous clusters. The ``table_size`` header
|
|
field determines table size for an image file. For example, ``cluster_size=64 KB``
|
|
and ``table_size=4`` results in 256 KB tables.
|
|
|
|
The logical image size must be less than or equal to the maximum possible size of
|
|
clusters rooted by the L1 table:
|
|
|
|
.. code::
|
|
|
|
header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size
|
|
|
|
L1, L2, and data cluster offsets must be aligned to ``header.cluster_size``.
|
|
The following offsets have special meanings:
|
|
|
|
L2 table offsets
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
- 0 - unallocated. The L2 table is not yet allocated.
|
|
|
|
Data cluster offsets
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- 0 - unallocated. The data cluster is not yet allocated.
|
|
- 1 - zero. The data cluster contents are all zeroes and no cluster is allocated.
|
|
|
|
Future format extensions may wish to store per-offset information. The least
|
|
significant 12 bits of an offset are reserved for this purpose and must be set
|
|
to zero. Image files with ``cluster_size`` > 2^12 will have more unused bits
|
|
which should also be zeroed.
|
|
|
|
Unallocated L2 tables and data clusters
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Reads to an unallocated area of the image file access the backing file. If there
|
|
is no backing file, then zeroes are produced. The backing file may be smaller
|
|
than the image file and reads of unallocated areas beyond the end of the backing
|
|
file produce zeroes.
|
|
|
|
Writes to an unallocated area cause a new data clusters to be allocated, and a new
|
|
L2 table if that is also unallocated. The new data cluster is populated with data
|
|
from the backing file (or zeroes if no backing file) and the data being written.
|
|
|
|
Zero data clusters
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
Zero data clusters are a space-efficient way of storing zeroed regions of the image.
|
|
|
|
Reads to a zero data cluster produce zeroes.
|
|
|
|
.. note::
|
|
The difference between an unallocated and a zero data cluster is that zero data
|
|
clusters stop the reading of contents from the backing file.
|
|
|
|
Writes to a zero data cluster cause a new data cluster to be allocated. The new
|
|
data cluster is populated with zeroes and the data being written.
|
|
|
|
Logical offset translation
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Logical offsets are translated into cluster offsets as follows::
|
|
|
|
table_bits table_bits cluster_bits
|
|
<--------> <--------> <--------------->
|
|
+----------+----------+-----------------+
|
|
| L1 index | L2 index | byte offset |
|
|
+----------+----------+-----------------+
|
|
|
|
Structure of a logical offset
|
|
|
|
offset_mask = ~(cluster_size - 1) # mask for the image file byte offset
|
|
|
|
def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
|
|
l2_offset = l1_table[l1_index]
|
|
l2_table = load_table(l2_offset)
|
|
cluster_offset = l2_table[l2_index] & offset_mask
|
|
return cluster_offset + byte_offset
|
|
|
|
Consistency checking
|
|
--------------------
|
|
|
|
This section is informational and included to provide background on the use
|
|
of the ``QED_F_NEED_CHECK features`` bit.
|
|
|
|
The ``QED_F_NEED_CHECK`` bit is used to mark an image as dirty before starting
|
|
an operation that could leave the image in an inconsistent state if interrupted
|
|
by a crash or power failure. A dirty image must be checked on open because its
|
|
metadata may not be consistent.
|
|
|
|
Consistency check includes the following invariants:
|
|
|
|
- Each cluster is referenced once and only once. It is an inconsistency to have
|
|
a cluster referenced more than once by L1 or L2 tables. A cluster has been leaked
|
|
if it has no references.
|
|
- Offsets must be within the image file size and must be ``cluster_size`` aligned.
|
|
- Table offsets must at least ``table_size`` * ``cluster_size`` bytes from the end
|
|
of the image file so that there is space for the entire table.
|
|
|
|
The consistency check process starts from ``l1_table_offset`` and scans all L2 tables.
|
|
After the check completes with no other errors besides leaks, the ``QED_F_NEED_CHECK``
|
|
bit can be cleared and the image can be accessed.
|