blkdebug.txt (5817B)
1 Block I/O error injection using blkdebug 2 ---------------------------------------- 3 Copyright (C) 2014-2015 Red Hat Inc 4 5 This work is licensed under the terms of the GNU GPL, version 2 or later. See 6 the COPYING file in the top-level directory. 7 8 The blkdebug block driver is a rule-based error injection engine. It can be 9 used to exercise error code paths in block drivers including ENOSPC (out of 10 space) and EIO. 11 12 This document gives an overview of the features available in blkdebug. 13 14 Background 15 ---------- 16 Block drivers have many error code paths that handle I/O errors. Image formats 17 are especially complex since metadata I/O errors during cluster allocation or 18 while updating tables happen halfway through request processing and require 19 discipline to keep image files consistent. 20 21 Error injection allows test cases to trigger I/O errors at specific points. 22 This way, all error paths can be tested to make sure they are correct. 23 24 Rules 25 ----- 26 The blkdebug block driver takes a list of "rules" that tell the error injection 27 engine when to fail an I/O request. 28 29 Each I/O request is evaluated against the rules. If a rule matches the request 30 then its "action" is executed. 31 32 Rules can be placed in a configuration file; the configuration file 33 follows the same .ini-like format used by QEMU's -readconfig option, and 34 each section of the file represents a rule. 35 36 The following configuration file defines a single rule: 37 38 $ cat blkdebug.conf 39 [inject-error] 40 event = "read_aio" 41 errno = "28" 42 43 This rule fails all aio read requests with ENOSPC (28). Note that the errno 44 value depends on the host. On Linux, see 45 /usr/include/asm-generic/errno-base.h for errno values. 46 47 Invoke QEMU as follows: 48 49 $ qemu-system-x86_64 50 -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \ 51 -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0 52 53 Rules support the following attributes: 54 55 event - which type of operation to match (e.g. read_aio, write_aio, 56 flush_to_os, flush_to_disk). See the "Events" section for 57 information on events. 58 59 state - (optional) the engine must be in this state number in order for this 60 rule to match. See the "State transitions" section for information 61 on states. 62 63 errno - the numeric errno value to return when a request matches this rule. 64 The errno values depend on the host since the numeric values are not 65 standardized in the POSIX specification. 66 67 sector - (optional) a sector number that the request must overlap in order to 68 match this rule 69 70 once - (optional, default "off") only execute this action on the first 71 matching request 72 73 immediately - (optional, default "off") return a NULL BlockAIOCB 74 pointer and fail without an errno instead. This 75 exercises the code path where BlockAIOCB fails and the 76 caller's BlockCompletionFunc is not invoked. 77 78 Events 79 ------ 80 Block drivers provide information about the type of I/O request they are about 81 to make so rules can match specific types of requests. For example, the qcow2 82 block driver tells blkdebug when it accesses the L1 table so rules can match 83 only L1 table accesses and not other metadata or guest data requests. 84 85 The core events are: 86 87 read_aio - guest data read 88 89 write_aio - guest data write 90 91 flush_to_os - write out unwritten block driver state (e.g. cached metadata) 92 93 flush_to_disk - flush the host block device's disk cache 94 95 See qapi/block-core.json:BlkdebugEvent for the full list of events. 96 You may need to grep block driver source code to understand the 97 meaning of specific events. 98 99 State transitions 100 ----------------- 101 There are cases where more power is needed to match a particular I/O request in 102 a longer sequence of requests. For example: 103 104 write_aio 105 flush_to_disk 106 write_aio 107 108 How do we match the 2nd write_aio but not the first? This is where state 109 transitions come in. 110 111 The error injection engine has an integer called the "state" that always starts 112 initialized to 1. The state integer is internal to blkdebug and cannot be 113 observed from outside but rules can interact with it for powerful matching 114 behavior. 115 116 Rules can be conditional on the current state and they can transition to a new 117 state. 118 119 When a rule's "state" attribute is non-zero then the current state must equal 120 the attribute in order for the rule to match. 121 122 For example, to match the 2nd write_aio: 123 124 [set-state] 125 event = "write_aio" 126 state = "1" 127 new_state = "2" 128 129 [inject-error] 130 event = "write_aio" 131 state = "2" 132 errno = "5" 133 134 The first write_aio request matches the set-state rule and transitions from 135 state 1 to state 2. Once state 2 has been entered, the set-state rule no 136 longer matches since it requires state 1. But the inject-error rule now 137 matches the next write_aio request and injects EIO (5). 138 139 State transition rules support the following attributes: 140 141 event - which type of operation to match (e.g. read_aio, write_aio, 142 flush_to_os, flush_to_disk). See the "Events" section for 143 information on events. 144 145 state - (optional) the engine must be in this state number in order for this 146 rule to match 147 148 new_state - transition to this state number 149 150 Suspend and resume 151 ------------------ 152 Exercising code paths in block drivers may require specific ordering amongst 153 concurrent requests. The "breakpoint" feature allows requests to be halted on 154 a blkdebug event and resumed later. This makes it possible to achieve 155 deterministic ordering when multiple requests are in flight. 156 157 Breakpoints on blkdebug events are associated with a user-defined "tag" string. 158 This tag serves as an identifier by which the request can be resumed at a later 159 point. 160 161 See the qemu-io(1) break, resume, remove_break, and wait_break commands for 162 details.