xl_overview.md (6847B)
1 # XL Overview 2 3 ## Requirements 4 5 JPEG XL was designed for two main requirements: 6 7 * high quality: visually lossless at reasonable bitrates; 8 * decoding speed: multithreaded decoding should be able to reach around 9 400 Megapixel/s on large images. 10 11 These goals apply to various types of images, including HDR content, whose 12 support is made possible by full-precision (float32) computations and extensive 13 support of color spaces and transfer functions. 14 15 High performance is achieved by designing the format with careful consideration 16 of memory bandwidth usage and ease of SIMD/GPU implementation. 17 18 The full requirements for JPEG XL are listed in document wg1m82079. 19 20 ## General architecture 21 22 The architecture follows the traditional block transform model with improvements 23 in the individual components. For a quick overview, we sketch a "block diagram" 24 of the lossy format decoder in the form of module names in **bold** followed by 25 a brief description. Note that post-processing modules in [brackets] are 26 optional - they are unnecessary or even counterproductive at very high quality 27 settings. 28 29 **Header**: decode metadata (e.g. image dimensions) from compressed fields 30 (smaller than Exp-Golomb thanks to per-field encodings). The compression and 31 small number of required fields enables very compact headers - much smaller than 32 JFIF and HEVC. The container supports multiple images (e.g. animations/bursts) 33 and passes (progressive). 34 35 **Bitstream**: decode transform coefficient residuals using rANS-encoded 36 <#bits,bits> symbols 37 38 **Dequantize**: from adaptive quant map side information, plus chroma from luma 39 40 **DC prediction**: expand DC residuals using adaptive (history-based) predictors 41 42 **Chroma from luma**: restore predicted X from B and Y from B 43 44 **IDCT:** 2x2..32x32, floating-point 45 46 **[Gaborish]**: additional deblocking convolution with 3x3 kernel 47 48 **[Edge preserving filter]**: nonlinear adaptive smoothing controlled by side 49 information 50 51 **[Noise injection]**: add perceptually pleasing noise according to a per-image 52 noise model 53 54 **Color space conversion**: from perceptual opsin XYB to linear RGB 55 56 **[Converting to other color spaces via ICC]** 57 58 The encoder is basically the reverse: 59 60 **Color space conversion**: from linear RGB to perceptual opsin XYB 61 62 **[Noise estimation]**: compute a noise model for the image 63 64 **[Gaborish]**: sharpening to counteract the blurring on the decoder side 65 66 **DCT**: transform sizes communicated via per-block side information 67 68 **Chroma from luma**: find the best multipliers of Y for X and B channels of 69 entire image 70 71 **Adaptive quantization**: iterative search for quant map that yields the best 72 perceived restoration 73 74 **Quantize**: store 16-bit prediction residuals 75 76 **DC prediction**: store residuals (prediction happens in quantized space) 77 78 **Entropy coding**: rANS and context modeling with clustering 79 80 81 # File Structure 82 83 A codestream begins with a `FileHeader` followed by one or more "passes" 84 (= scans: e.g. DC or AC_LF) which are then added together (summing the 85 respective color components in Opsin space) to form the final image. There is no 86 limit to the number of passes, so an encoder could choose to send salient parts 87 first, followed by arbitrary decompositions of the final image (in terms of 88 resolution, bit depth, quality or spatial location). 89 90 Each pass contains groups of AC and DC data. A group is a subset of pixels that 91 can be decoded in parallel. DC groups contain 256x256 DCs (from 2048x2048 input 92 pixels), AC groups cover 256x256 input pixels. 93 94 Each pass starts with a table of contents (sizes of each of their DC+AC 95 groups), which enables parallel decoding and/or the decoding of a subset. 96 However, there is no higher-level TOC of passes, as that would prevent 97 appending additional images and could be too constraining for the encoder. 98 99 100 ## Lossless 101 102 JPEG XL supports tools for lossless coding designed by Alexander Rhatushnyak and 103 Jon Sneyers. They are about 60-75% of size of PNG, and smaller than WebP 104 lossless for photos. 105 106 An adaptive predictor computes 4 from the NW, N, NE and W pixels and combines 107 them with weights based on previous errors. The error value is encoded in a 108 bucket chosen based on a heuristic max error. The result is entropy-coded using 109 the ANS encoder. 110 111 ## Current Reference Implementation 112 113 ### Conventions 114 115 The software is written in C++ and built using CMake 3.6 or later. 116 117 Error handling is done by having functions return values of type `jxl::Status` 118 (a thin wrapper around bool which checks that it is not ignored). A convenience 119 macro named `JXL_RETURN_IF_ERROR` makes this more convenient by automatically 120 forwarding errors, and another macro named `JXL_FAILURE` exits with an error 121 message if reached, with no effect in optimized builds. 122 123 To diagnose the cause of encoder/decoder failures (which often only result in a 124 generic "decode failed" message), build using the following command: 125 126 ```bash 127 CMAKE_FLAGS="-DJXL_CRASH_ON_ERROR" ./ci.sh opt 128 ``` 129 130 In such builds, the first JXL_FAILURE will print a message identifying where the 131 problem is and the program will exit immediately afterwards. 132 133 ### Architecture 134 135 Getting back to the earlier block diagram: 136 137 **Header** handling is implemented in `headers.h` and `field*`. 138 139 **Bitstream**: `entropy_coder.h`, `dec_ans_*`. 140 141 **(De)quantize**: `quantizer.h`. 142 143 **DC prediction**: `predictor.h`. 144 145 **Chroma from luma**: `chroma_from_luma.h` 146 147 **(I)DCT**: `dct*.h`. Instead of operating directly on blocks of memory, the 148 functions operate on thin wrappers which can handle blocks spread across 149 multiple image lines. 150 151 **DCT size selection**: `ac_strategy.cc` 152 153 **[Gaborish]**: `enc_gaborish.h`. 154 155 **[Edge preserving filter]**: `epf.h` 156 157 **[Noise injection]**: `noise*` (currently disabled) 158 159 **Color space conversion**: `color_*`, `dec_xyb.h`. 160 161 ## Decoder overview 162 163 After decoding headers, the decoder begins processing frames (`dec_frame.cc`). 164 165 For each pass, it will read the DC group table of contents (TOC) and start 166 decoding, dequantizing and restoring color correlation of each DC group 167 (covering 2048x2048 pixels in the input image) in parallel 168 (`compressed_dc.cc`). The DC is split into parts corresponding to each AC group 169 (with 1px of extra border); the AC group TOC is read and each AC group (256x256 170 pixels) is processed in parallel (`dec_group.cc`). 171 172 In each AC group, the decoder reads per-block side information indicating the 173 kind of DCT transform; this is followed by the quantization field. Then, AC 174 coefficients are read, dequantized and have color correlation restored on a 175 tile per tile basis for better locality. 176 177 After all the groups are read, postprocessing is applied: Gaborish smoothing 178 and edge preserving filter, to reduce blocking and other artifacts. 179 180 Finally, the image is converted back from the XYB color space 181 (`dec_xyb.cc`) and saved to the output image (`codec_*.cc`).