libjxl

FORK: libjxl patches used on blog
git clone https://git.neptards.moe/blog/libjxl.git
Log | Files | Refs | Submodules | README | LICENSE

fuzzing.md (9155B)


      1 # Fuzzing
      2 
      3 Fuzzing is a technique to find potential bugs by providing randomly generated
      4 invalid inputs. To detect potential bugs such as programming errors we use
      5 fuzzing in combination with ASan (Address Sanitizer), MSan (Memory Sanitizer),
      6 UBSan (Undefined Behavior Sanitizer) and asserts in the code. An invalid input
      7 will likely produce a decoding error (some API function returning error), which
      8 is absolutely not a problem, but what it should not do is access memory out of
      9 bounds, use uninitialized memory or hit a false assert condition.
     10 
     11 ## Automated Fuzzing with oss-fuzz
     12 
     13 libjxl fuzzing is integrated into [oss-fuzz](https://github.com/google/oss-fuzz)
     14 as the project `libjxl`. oss-fuzz regularly runs the fuzzers on the `main`
     15 branch and reports bugs into their bug tracker which remains private until the
     16 bugs are fixed in main.
     17 
     18 ## Fuzzer targets
     19 
     20 There are several fuzzer executable targets defined in the `tools/` directory
     21 to fuzz different parts of the code. The main one is `djxl_fuzzer`, which uses
     22 the public C decoder API to attempt to decode an image. The fuzzer input is not
     23 directly the .jxl file, the last few bytes of the fuzzer input are used to
     24 decide *how* will the API be used (if preview is requested, the pixel format
     25 requested, if the .jxl input data is provided altogether, etc) and the rest of
     26 the fuzzer input is provided as the .jxl file to the decoder. Some bugs might
     27 reproduce only if the .jxl input is decoded in certain way.
     28 
     29 The remaining fuzzer targets execute a specific portion the codec that might be
     30 easier to fuzz independently from the whole codec.
     31 
     32 ## Reproducing fuzzer bugs
     33 
     34 A fuzzer target, like `djxl_fuzzer` accepts as a parameter one or more files
     35 that will be used as inputs. This runs the fuzzer program in test-only mode
     36 where no new inputs are generated and only the provided files are tested. This
     37 is the easiest way to reproduce a bug found by the fuzzer using the generated
     38 test case from the bug report.
     39 
     40 oss-fuzz uses a specific compiler version and flags, and it is built using
     41 Docker. Different compiler versions will have different support for detecting
     42 certain actions as errors, so we want to reproduce the build from oss-fuzz as
     43 close as possible. To reproduce the build as generated by oss-fuzz there are a
     44 few helper commands in `ci.sh` as explained below.
     45 
     46 ### Generate the gcr.io/oss-fuzz/libjxl image
     47 
     48 First you need the ossfuzz libjxl builder image. This is the base oss-fuzz
     49 builder image with a few dependencies installed. To generate it you need to
     50 check out the oss-fuzz project and build it:
     51 
     52 ```bash
     53 git clone https://github.com/google/oss-fuzz.git ~/oss-fuzz
     54 cd ~/oss-fuzz
     55 sudo infra/helper.py build_image libjxl
     56 ```
     57 
     58 This will create the `gcr.io/oss-fuzz/libjxl` docker image. You can check if it
     59 was created verifying that it is listed in the output of the `sudo docker image
     60 ls` command.
     61 
     62 ### Build the fuzzer targets with oss-fuzz
     63 
     64 To build the fuzzer targets from the current libjxl source checkout, use the
     65 `./ci.sh ossfuzz_msan` command for MSan, `./ci.sh ossfuzz_asan` command for ASan
     66 or `./ci.sh ossfuzz_ubsan` command for UBSan. All the `JXL_ASSERT` and
     67 `JXL_DASSERT` calls are enabled in all the three modes. These ci.sh helpers will
     68 reproduce the oss-fuzz docker call to build libjxl mounting the current source
     69 directory into the Docker container. Ideally you will run this command in a
     70 different build directory separated from your regular builds.
     71 
     72 For example, for MSan builds run:
     73 
     74 ```bash
     75 BUILD_DIR=build-fuzzmsan ./ci.sh ossfuzz_msan
     76 ```
     77 
     78 After this, the fuzzer program will be generated in the build directory like
     79 for other build modes: `build-fuzzmsan/tools/djxl_fuzzer`.
     80 
     81 ### Iterating changes with oss-fuzz builds
     82 
     83 After modifying the source code to fix the fuzzer-found bug, or to include more
     84 debug information, you can rebuild only a specific fuzzer target to save on
     85 rebuilding time and immediately run the test case again. For example, for
     86 rebuilding and testing only `djxl_fuzzer` in MSan mode we can run:
     87 
     88 ```bash
     89 BUILD_DIR=build-fuzzmsan ./ci.sh ossfuzz_msan djxl_fuzzer && build-fuzzmsan/tools/djxl_fuzzer path/to/testcase.bin
     90 ```
     91 
     92 When MSan and ASan fuzzers fail they will print a stack trace at the point where
     93 the error occurred, and some related information. To make these these stack
     94 traces useful we need to convert the addresses to function names and source file
     95 names and lines, which is done with the "symbolizer". For UBSan to print a stack
     96 trace we need to set the `UBSAN_OPTIONS` environment variables when running the
     97 fuzzer.
     98 
     99 Set the following environment variables when testing the fuzzer binaries. Here
    100 `clang` should match the compiler version used by the container, you can pass a
    101 different compiler version in the following example by first installing the
    102 clang package for that version outside the container and using `clang-NN`
    103 (for example `clang-11`) instead of `clang` in the following commands:
    104 
    105 ```bash
    106 symbolizer=$($(realpath $(which clang)) -print-prog-name=llvm-symbolizer)
    107 export MSAN_SYMBOLIZER_PATH="${symbolizer}"
    108 export UBSAN_SYMBOLIZER_PATH="${symbolizer}"
    109 export ASAN_SYMBOLIZER_PATH="${symbolizer}"
    110 export ASAN_OPTIONS=detect_leaks=1
    111 export UBSAN_OPTIONS=print_stacktrace=1
    112 ```
    113 
    114 Note: The symbolizer binary must be a program called `llvm-symbolizer`, any
    115 other file name will fail. There are normally symlinks already installed with
    116 the right name which the `-print-prog-name` would print.
    117 
    118 ## Running the fuzzers locally
    119 
    120 Running the fuzzer targets in fuzzing mode can be achieved by running them with
    121 no parameters, or better with a parameter with the path to a *directory*
    122 containing a seed of files to use as a starting point. Note that passing a
    123 directory is considered a corpus to use for fuzzing while passing a file is
    124 considered an input to evaluate. Multi-process fuzzing is also supported. For
    125 details about all the fuzzing options run:
    126 
    127 ```bash
    128 build-fuzzmsan/tools/djxl_fuzzer -help=1
    129 ```
    130 
    131 ## Writing fuzzer-friendly code
    132 
    133 Fuzzing on itself can't find programming bugs unless an input makes the program
    134 perform an invalid operation (read/write out of bounds, perform an undefined
    135 behavior operation, etc). You can help the fuzzer find invalid situations by
    136 adding asserts:
    137 
    138  * `JXL_ASSERT()` is enabled in Release mode by default. It can be disabled
    139    with `-DJXL_ENABLE_ASSERT=0` but the intention is that it will run for all
    140    the users in released code. If performance of the check is not an issue (like
    141    checks done once per image, once per channel, once per group, etc) a
    142    JXL_ASSERT is appropriate. A failed assert is preferable to an out of bounds
    143    write.
    144 
    145  * `JXL_DASSERT()` is only enabled in Debug builds, which includes all the ASan,
    146    MSan and UBSan builds. Performance of these checks is not an issue if kept
    147    within reasonable limits (automated msan/asan test should finish withing 1
    148    hour for example). Fuzzing is more effective when the given input runs
    149    faster, so keep that in mind when adding a complex DASSERT that runs multiple
    150    times per output pixel.
    151 
    152  * For MSan builds it is also possible to specify that certain values must be
    153    initialized. This is automatic for values that are used to make decisions
    154    (like when used in an `if` statement or in the ternary operator condition)
    155    but those checks can be made explicit for image data using the
    156    `JXL_CHECK_IMAGE_INITIALIZED(image, rect)` macro. This helps document and
    157    check (only in MSan builds) that a given portion of the image is expected to
    158    be initialized, allowing to catch errors earlier in the process.
    159 
    160 ## Dealing with use-of-uninitialized memory
    161 
    162 In MSan builds it is considered an error to *use* uninitialized memory. Using
    163 the memory normally requires something like a decision / branch based on the
    164 uninitialized value, just running `memcpy()` or simple arithmetic over
    165 uninitialized memory is not a problem. Notably, computing `DemoteTo()`,
    166 `NearestInt()` or similar expressions that create a branch based on the value of
    167 the uninitialized memory will trigger an MSan error.
    168 
    169 In libjxl we often run vectorized operations over a series of values, rounding
    170 up to the next multiple of a vector size, thus operating over uninitialized
    171 values past the end of the requested region. These values are part of the image
    172 padding but are not initialized. This behavior would not create an MSan error
    173 unless the processing includes operations like `NearestInt()`. For such cases
    174 the preferred solution is to use `msan::UnpoisonMemory` over the portion of
    175 memory of the last SIMD vector before processing, and then running
    176 `msan::PoisonMemory` over the corresponding value in the output side. A note
    177 including why this is safe to do must be added, for example if the processing
    178 doesn't involve any cross-lane computation.
    179 
    180 Initializing padding memory in MSan builds is discouraged because it may hide
    181 bugs in functions that weren't supposed to read from the padding. Initializing
    182 padding memory in all builds, including Release builds, would mitigate the
    183 MSan potential security issue but it would hide the logic bug for a longer time
    184 and potentially incur in a performance hit.