multiple-iothreads.txt (6674B)
1 Copyright (c) 2014-2017 Red Hat Inc. 2 3 This work is licensed under the terms of the GNU GPL, version 2 or later. See 4 the COPYING file in the top-level directory. 5 6 7 This document explains the IOThread feature and how to write code that runs 8 outside the QEMU global mutex. 9 10 The main loop and IOThreads 11 --------------------------- 12 QEMU is an event-driven program that can do several things at once using an 13 event loop. The VNC server and the QMP monitor are both processed from the 14 same event loop, which monitors their file descriptors until they become 15 readable and then invokes a callback. 16 17 The default event loop is called the main loop (see main-loop.c). It is 18 possible to create additional event loop threads using -object 19 iothread,id=my-iothread. 20 21 Side note: The main loop and IOThread are both event loops but their code is 22 not shared completely. Sometimes it is useful to remember that although they 23 are conceptually similar they are currently not interchangeable. 24 25 Why IOThreads are useful 26 ------------------------ 27 IOThreads allow the user to control the placement of work. The main loop is a 28 scalability bottleneck on hosts with many CPUs. Work can be spread across 29 several IOThreads instead of just one main loop. When set up correctly this 30 can improve I/O latency and reduce jitter seen by the guest. 31 32 The main loop is also deeply associated with the QEMU global mutex, which is a 33 scalability bottleneck in itself. vCPU threads and the main loop use the QEMU 34 global mutex to serialize execution of QEMU code. This mutex is necessary 35 because a lot of QEMU's code historically was not thread-safe. 36 37 The fact that all I/O processing is done in a single main loop and that the 38 QEMU global mutex is contended by all vCPU threads and the main loop explain 39 why it is desirable to place work into IOThreads. 40 41 The experimental virtio-blk data-plane implementation has been benchmarked and 42 shows these effects: 43 ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf 44 45 How to program for IOThreads 46 ---------------------------- 47 The main difference between legacy code and new code that can run in an 48 IOThread is dealing explicitly with the event loop object, AioContext 49 (see include/block/aio.h). Code that only works in the main loop 50 implicitly uses the main loop's AioContext. Code that supports running 51 in IOThreads must be aware of its AioContext. 52 53 AioContext supports the following services: 54 * File descriptor monitoring (read/write/error on POSIX hosts) 55 * Event notifiers (inter-thread signalling) 56 * Timers 57 * Bottom Halves (BH) deferred callbacks 58 59 There are several old APIs that use the main loop AioContext: 60 * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor 61 * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier 62 * LEGACY timer_new_ms() - create a timer 63 * LEGACY qemu_bh_new() - create a BH 64 * LEGACY qemu_aio_wait() - run an event loop iteration 65 66 Since they implicitly work on the main loop they cannot be used in code that 67 runs in an IOThread. They might cause a crash or deadlock if called from an 68 IOThread since the QEMU global mutex is not held. 69 70 Instead, use the AioContext functions directly (see include/block/aio.h): 71 * aio_set_fd_handler() - monitor a file descriptor 72 * aio_set_event_notifier() - monitor an event notifier 73 * aio_timer_new() - create a timer 74 * aio_bh_new() - create a BH 75 * aio_poll() - run an event loop iteration 76 77 The AioContext can be obtained from the IOThread using 78 iothread_get_aio_context() or for the main loop using qemu_get_aio_context(). 79 Code that takes an AioContext argument works both in IOThreads or the main 80 loop, depending on which AioContext instance the caller passes in. 81 82 How to synchronize with an IOThread 83 ----------------------------------- 84 AioContext is not thread-safe so some rules must be followed when using file 85 descriptors, event notifiers, timers, or BHs across threads: 86 87 1. AioContext functions can always be called safely. They handle their 88 own locking internally. 89 90 2. Other threads wishing to access the AioContext must use 91 aio_context_acquire()/aio_context_release() for mutual exclusion. Once the 92 context is acquired no other thread can access it or run event loop iterations 93 in this AioContext. 94 95 Legacy code sometimes nests aio_context_acquire()/aio_context_release() calls. 96 Do not use nesting anymore, it is incompatible with the BDRV_POLL_WHILE() macro 97 used in the block layer and can lead to hangs. 98 99 There is currently no lock ordering rule if a thread needs to acquire multiple 100 AioContexts simultaneously. Therefore, it is only safe for code holding the 101 QEMU global mutex to acquire other AioContexts. 102 103 Side note: the best way to schedule a function call across threads is to call 104 aio_bh_schedule_oneshot(). No acquire/release or locking is needed. 105 106 AioContext and the block layer 107 ------------------------------ 108 The AioContext originates from the QEMU block layer, even though nowadays 109 AioContext is a generic event loop that can be used by any QEMU subsystem. 110 111 The block layer has support for AioContext integrated. Each BlockDriverState 112 is associated with an AioContext using bdrv_try_change_aio_context() and 113 bdrv_get_aio_context(). This allows block layer code to process I/O inside the 114 right AioContext. Other subsystems may wish to follow a similar approach. 115 116 Block layer code must therefore expect to run in an IOThread and avoid using 117 old APIs that implicitly use the main loop. See the "How to program for 118 IOThreads" above for information on how to do that. 119 120 If main loop code such as a QMP function wishes to access a BlockDriverState 121 it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure 122 that callbacks in the IOThread do not run in parallel. 123 124 Code running in the monitor typically needs to ensure that past 125 requests from the guest are completed. When a block device is running 126 in an IOThread, the IOThread can also process requests from the guest 127 (via ioeventfd). To achieve both objects, wrap the code between 128 bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained 129 section". The functions must be called between aio_context_acquire() 130 and aio_context_release(). You can freely release and re-acquire the 131 AioContext within a drained section. 132 133 Long-running jobs (usually in the form of coroutines) are best scheduled in 134 the BlockDriverState's AioContext to avoid the need to acquire/release around 135 each bdrv_*() call. The functions bdrv_add/remove_aio_context_notifier, 136 or alternatively blk_add/remove_aio_context_notifier if you use BlockBackends, 137 can be used to get a notification whenever bdrv_try_change_aio_context() moves a 138 BlockDriverState to a different AioContext.