xl_overview.md - libjxl - FORK: libjxl patches used on blog

xl_overview.md (6847B)
      1 # XL Overview
      2 
      3 ## Requirements
      4 
      5 JPEG XL was designed for two main requirements:
      6 
      7 *   high quality: visually lossless at reasonable bitrates;
      8 *   decoding speed: multithreaded decoding should be able to reach around
      9     400 Megapixel/s on large images.
     10 
     11 These goals apply to various types of images, including HDR content, whose
     12 support is made possible by full-precision (float32) computations and extensive
     13 support of color spaces and transfer functions.
     14 
     15 High performance is achieved by designing the format with careful consideration
     16 of memory bandwidth usage and ease of SIMD/GPU implementation.
     17 
     18 The full requirements for JPEG XL are listed in document wg1m82079.
     19 
     20 ## General architecture
     21 
     22 The architecture follows the traditional block transform model with improvements
     23 in the individual components. For a quick overview, we sketch a "block diagram"
     24 of the lossy format decoder in the form of module names in **bold** followed by
     25 a brief description. Note that post-processing modules in [brackets] are
     26 optional - they are unnecessary or even counterproductive at very high quality
     27 settings.
     28 
     29 **Header**: decode metadata (e.g. image dimensions) from compressed fields
     30 (smaller than Exp-Golomb thanks to per-field encodings). The compression and
     31 small number of required fields enables very compact headers - much smaller than
     32 JFIF and HEVC. The container supports multiple images (e.g. animations/bursts)
     33 and passes (progressive).
     34 
     35 **Bitstream**: decode transform coefficient residuals using rANS-encoded
     36 <#bits,bits> symbols
     37 
     38 **Dequantize**: from adaptive quant map side information, plus chroma from luma
     39 
     40 **DC prediction**: expand DC residuals using adaptive (history-based) predictors
     41 
     42 **Chroma from luma**: restore predicted X from B and Y from B
     43 
     44 **IDCT:** 2x2..32x32, floating-point
     45 
     46 **[Gaborish]**: additional deblocking convolution with 3x3 kernel
     47 
     48 **[Edge preserving filter]**: nonlinear adaptive smoothing controlled by side
     49 information
     50 
     51 **[Noise injection]**: add perceptually pleasing noise according to a per-image
     52 noise model
     53 
     54 **Color space conversion**: from perceptual opsin XYB to linear RGB
     55 
     56 **[Converting to other color spaces via ICC]**
     57 
     58 The encoder is basically the reverse:
     59 
     60 **Color space conversion**: from linear RGB to perceptual opsin XYB
     61 
     62 **[Noise estimation]**: compute a noise model for the image
     63 
     64 **[Gaborish]**: sharpening to counteract the blurring on the decoder side
     65 
     66 **DCT**: transform sizes communicated via per-block side information
     67 
     68 **Chroma from luma**: find the best multipliers of Y for X and B channels of
     69 entire image
     70 
     71 **Adaptive quantization**: iterative search for quant map that yields the best
     72 perceived restoration
     73 
     74 **Quantize**: store 16-bit prediction residuals
     75 
     76 **DC prediction**: store residuals (prediction happens in quantized space)
     77 
     78 **Entropy coding**: rANS and context modeling with clustering
     79 
     80 
     81 # File Structure
     82 
     83 A codestream begins with a `FileHeader` followed by one or more "passes"
     84 (= scans: e.g. DC or AC_LF) which are then added together (summing the
     85 respective color components in Opsin space) to form the final image. There is no
     86 limit to the number of passes, so an encoder could choose to send salient parts
     87 first, followed by arbitrary decompositions of the final image (in terms of
     88 resolution, bit depth, quality or spatial location).
     89 
     90 Each pass contains groups of AC and DC data. A group is a subset of pixels that
     91 can be decoded in parallel. DC groups contain 256x256 DCs (from 2048x2048 input
     92 pixels), AC groups cover 256x256 input pixels.
     93 
     94 Each pass starts with a table of contents (sizes of each of their DC+AC
     95 groups), which enables parallel decoding and/or the decoding of a subset.
     96 However, there is no higher-level TOC of passes, as that would prevent
     97 appending additional images and could be too constraining for the encoder.
     98 
     99 
    100 ## Lossless
    101 
    102 JPEG XL supports tools for lossless coding designed by Alexander Rhatushnyak and
    103 Jon Sneyers. They are about 60-75% of size of PNG, and smaller than WebP
    104 lossless for photos.
    105 
    106 An adaptive predictor computes 4 from the NW, N, NE and W pixels and combines
    107 them with weights based on previous errors. The error value is encoded in a
    108 bucket chosen based on a heuristic max error. The result is entropy-coded using
    109 the ANS encoder.
    110 
    111 ## Current Reference Implementation
    112 
    113 ### Conventions
    114 
    115 The software is written in C++ and built using CMake 3.6 or later.
    116 
    117 Error handling is done by having functions return values of type `jxl::Status`
    118 (a thin wrapper around bool which checks that it is not ignored). A convenience
    119 macro named `JXL_RETURN_IF_ERROR` makes this more convenient by automatically
    120 forwarding errors, and another macro named `JXL_FAILURE` exits with an error
    121 message if reached, with no effect in optimized builds.
    122 
    123 To diagnose the cause of encoder/decoder failures (which often only result in a
    124 generic "decode failed" message), build using the following command:
    125 
    126 ```bash
    127 CMAKE_FLAGS="-DJXL_CRASH_ON_ERROR" ./ci.sh opt
    128 ```
    129 
    130 In such builds, the first JXL_FAILURE will print a message identifying where the
    131 problem is and the program will exit immediately afterwards.
    132 
    133 ### Architecture
    134 
    135 Getting back to the earlier block diagram:
    136 
    137 **Header** handling is implemented in `headers.h` and `field*`.
    138 
    139 **Bitstream**: `entropy_coder.h`, `dec_ans_*`.
    140 
    141 **(De)quantize**: `quantizer.h`.
    142 
    143 **DC prediction**: `predictor.h`.
    144 
    145 **Chroma from luma**: `chroma_from_luma.h`
    146 
    147 **(I)DCT**: `dct*.h`. Instead of operating directly on blocks of memory, the
    148 functions operate on thin wrappers which can handle blocks spread across
    149 multiple image lines.
    150 
    151 **DCT size selection**: `ac_strategy.cc`
    152 
    153 **[Gaborish]**: `enc_gaborish.h`.
    154 
    155 **[Edge preserving filter]**: `epf.h`
    156 
    157 **[Noise injection]**: `noise*` (currently disabled)
    158 
    159 **Color space conversion**: `color_*`, `dec_xyb.h`.
    160 
    161 ## Decoder overview
    162 
    163 After decoding headers, the decoder begins processing frames (`dec_frame.cc`).
    164 
    165 For each pass, it will read the DC group table of contents (TOC) and start
    166 decoding, dequantizing and restoring color correlation of each DC group
    167 (covering 2048x2048 pixels in the input image) in parallel
    168 (`compressed_dc.cc`). The DC is split into parts corresponding to each AC group
    169 (with 1px of extra border); the AC group TOC is read and each AC group (256x256
    170 pixels) is processed in parallel (`dec_group.cc`).
    171 
    172 In each AC group, the decoder reads per-block side information indicating the
    173 kind of DCT transform; this is followed by the quantization field. Then, AC
    174 coefficients are read, dequantized and have color correlation restored on a
    175 tile per tile basis for better locality.
    176 
    177 After all the groups are read, postprocessing is applied: Gaborish smoothing
    178 and edge preserving filter, to reduce blocking and other artifacts.
    179 
    180 Finally, the image is converted back from the XYB color space
    181 (`dec_xyb.cc`) and saved to the output image (`codec_*.cc`).
	libjxl FORK: libjxl patches used on blog
	git clone https://git.neptards.moe/blog/libjxl.git
	Log \| Files \| Refs \| Submodules \| README \| LICENSE