mirror of https://gitlab.com/qemu-project/qemu
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
279 lines
9.9 KiB
ReStructuredText
279 lines
9.9 KiB
ReStructuredText
Locked Counters (aka ``QemuLockCnt``)
|
|
=====================================
|
|
|
|
QEMU often uses reference counts to track data structures that are being
|
|
accessed and should not be freed. For example, a loop that invoke
|
|
callbacks like this is not safe::
|
|
|
|
QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
|
|
if (ioh->revents & G_IO_OUT) {
|
|
ioh->fd_write(ioh->opaque);
|
|
}
|
|
}
|
|
|
|
``QLIST_FOREACH_SAFE`` protects against deletion of the current node (``ioh``)
|
|
by stashing away its ``next`` pointer. However, ``ioh->fd_write`` could
|
|
actually delete the next node from the list. The simplest way to
|
|
avoid this is to mark the node as deleted, and remove it from the
|
|
list in the above loop::
|
|
|
|
QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
|
|
if (ioh->deleted) {
|
|
QLIST_REMOVE(ioh, next);
|
|
g_free(ioh);
|
|
} else {
|
|
if (ioh->revents & G_IO_OUT) {
|
|
ioh->fd_write(ioh->opaque);
|
|
}
|
|
}
|
|
}
|
|
|
|
If however this loop must also be reentrant, i.e. it is possible that
|
|
``ioh->fd_write`` invokes the loop again, some kind of counting is needed::
|
|
|
|
walking_handlers++;
|
|
QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
|
|
if (ioh->deleted) {
|
|
if (walking_handlers == 1) {
|
|
QLIST_REMOVE(ioh, next);
|
|
g_free(ioh);
|
|
}
|
|
} else {
|
|
if (ioh->revents & G_IO_OUT) {
|
|
ioh->fd_write(ioh->opaque);
|
|
}
|
|
}
|
|
}
|
|
walking_handlers--;
|
|
|
|
One may think of using the RCU primitives, ``rcu_read_lock()`` and
|
|
``rcu_read_unlock()``; effectively, the RCU nesting count would take
|
|
the place of the walking_handlers global variable. Indeed,
|
|
reference counting and RCU have similar purposes, but their usage in
|
|
general is complementary:
|
|
|
|
- reference counting is fine-grained and limited to a single data
|
|
structure; RCU delays reclamation of *all* RCU-protected data
|
|
structures;
|
|
|
|
- reference counting works even in the presence of code that keeps
|
|
a reference for a long time; RCU critical sections in principle
|
|
should be kept short;
|
|
|
|
- reference counting is often applied to code that is not thread-safe
|
|
but is reentrant; in fact, usage of reference counting in QEMU predates
|
|
the introduction of threads by many years. RCU is generally used to
|
|
protect readers from other threads freeing memory after concurrent
|
|
modifications to a data structure.
|
|
|
|
- reclaiming data can be done by a separate thread in the case of RCU;
|
|
this can improve performance, but also delay reclamation undesirably.
|
|
With reference counting, reclamation is deterministic.
|
|
|
|
This file documents ``QemuLockCnt``, an abstraction for using reference
|
|
counting in code that has to be both thread-safe and reentrant.
|
|
|
|
|
|
``QemuLockCnt`` concepts
|
|
------------------------
|
|
|
|
A ``QemuLockCnt`` comprises both a counter and a mutex; it has primitives
|
|
to increment and decrement the counter, and to take and release the
|
|
mutex. The counter notes how many visits to the data structures are
|
|
taking place (the visits could be from different threads, or there could
|
|
be multiple reentrant visits from the same thread). The basic rules
|
|
governing the counter/mutex pair then are the following:
|
|
|
|
- Data protected by the QemuLockCnt must not be freed unless the
|
|
counter is zero and the mutex is taken.
|
|
|
|
- A new visit cannot be started while the counter is zero and the
|
|
mutex is taken.
|
|
|
|
Most of the time, the mutex protects all writes to the data structure,
|
|
not just frees, though there could be cases where this is not necessary.
|
|
|
|
Reads, instead, can be done without taking the mutex, as long as the
|
|
readers and writers use the same macros that are used for RCU, for
|
|
example ``qatomic_rcu_read``, ``qatomic_rcu_set``, ``QLIST_FOREACH_RCU``,
|
|
etc. This is because the reads are done outside a lock and a set
|
|
or ``QLIST_INSERT_HEAD``
|
|
can happen concurrently with the read. The RCU API ensures that the
|
|
processor and the compiler see all required memory barriers.
|
|
|
|
This could be implemented simply by protecting the counter with the
|
|
mutex, for example::
|
|
|
|
// (1)
|
|
qemu_mutex_lock(&walking_handlers_mutex);
|
|
walking_handlers++;
|
|
qemu_mutex_unlock(&walking_handlers_mutex);
|
|
|
|
...
|
|
|
|
// (2)
|
|
qemu_mutex_lock(&walking_handlers_mutex);
|
|
if (--walking_handlers == 0) {
|
|
QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
|
|
if (ioh->deleted) {
|
|
QLIST_REMOVE(ioh, next);
|
|
g_free(ioh);
|
|
}
|
|
}
|
|
}
|
|
qemu_mutex_unlock(&walking_handlers_mutex);
|
|
|
|
Here, no frees can happen in the code represented by the ellipsis.
|
|
If another thread is executing critical section (2), that part of
|
|
the code cannot be entered, because the thread will not be able
|
|
to increment the ``walking_handlers`` variable. And of course
|
|
during the visit any other thread will see a nonzero value for
|
|
``walking_handlers``, as in the single-threaded code.
|
|
|
|
Note that it is possible for multiple concurrent accesses to delay
|
|
the cleanup arbitrarily; in other words, for the ``walking_handlers``
|
|
counter to never become zero. For this reason, this technique is
|
|
more easily applicable if concurrent access to the structure is rare.
|
|
|
|
However, critical sections are easy to forget since you have to do
|
|
them for each modification of the counter. ``QemuLockCnt`` ensures that
|
|
all modifications of the counter take the lock appropriately, and it
|
|
can also be more efficient in two ways:
|
|
|
|
- it avoids taking the lock for many operations (for example
|
|
incrementing the counter while it is non-zero);
|
|
|
|
- on some platforms, one can implement ``QemuLockCnt`` to hold the lock
|
|
and the mutex in a single word, making the fast path no more expensive
|
|
than simply managing a counter using atomic operations (see
|
|
:doc:`atomics`). This can be very helpful if concurrent access to
|
|
the data structure is expected to be rare.
|
|
|
|
|
|
Using the same mutex for frees and writes can still incur some small
|
|
inefficiencies; for example, a visit can never start if the counter is
|
|
zero and the mutex is taken -- even if the mutex is taken by a write,
|
|
which in principle need not block a visit of the data structure.
|
|
However, these are usually not a problem if any of the following
|
|
assumptions are valid:
|
|
|
|
- concurrent access is possible but rare
|
|
|
|
- writes are rare
|
|
|
|
- writes are frequent, but this kind of write (e.g. appending to a
|
|
list) has a very small critical section.
|
|
|
|
For example, QEMU uses ``QemuLockCnt`` to manage an ``AioContext``'s list of
|
|
bottom halves and file descriptor handlers. Modifications to the list
|
|
of file descriptor handlers are rare. Creation of a new bottom half is
|
|
frequent and can happen on a fast path; however: 1) it is almost never
|
|
concurrent with a visit to the list of bottom halves; 2) it only has
|
|
three instructions in the critical path, two assignments and a ``smp_wmb()``.
|
|
|
|
|
|
``QemuLockCnt`` API
|
|
-------------------
|
|
|
|
.. kernel-doc:: include/qemu/lockcnt.h
|
|
|
|
|
|
``QemuLockCnt`` usage
|
|
---------------------
|
|
|
|
This section explains the typical usage patterns for ``QemuLockCnt`` functions.
|
|
|
|
Setting a variable to a non-NULL value can be done between
|
|
``qemu_lockcnt_lock`` and ``qemu_lockcnt_unlock``::
|
|
|
|
qemu_lockcnt_lock(&xyz_lockcnt);
|
|
if (!xyz) {
|
|
new_xyz = g_new(XYZ, 1);
|
|
...
|
|
qatomic_rcu_set(&xyz, new_xyz);
|
|
}
|
|
qemu_lockcnt_unlock(&xyz_lockcnt);
|
|
|
|
Accessing the value can be done between ``qemu_lockcnt_inc`` and
|
|
``qemu_lockcnt_dec``::
|
|
|
|
qemu_lockcnt_inc(&xyz_lockcnt);
|
|
if (xyz) {
|
|
XYZ *p = qatomic_rcu_read(&xyz);
|
|
...
|
|
/* Accesses can now be done through "p". */
|
|
}
|
|
qemu_lockcnt_dec(&xyz_lockcnt);
|
|
|
|
Freeing the object can similarly use ``qemu_lockcnt_lock`` and
|
|
``qemu_lockcnt_unlock``, but you also need to ensure that the count
|
|
is zero (i.e. there is no concurrent visit). Because ``qemu_lockcnt_inc``
|
|
takes the ``QemuLockCnt``'s lock, the count cannot become non-zero while
|
|
the object is being freed. Freeing an object looks like this::
|
|
|
|
qemu_lockcnt_lock(&xyz_lockcnt);
|
|
if (!qemu_lockcnt_count(&xyz_lockcnt)) {
|
|
g_free(xyz);
|
|
xyz = NULL;
|
|
}
|
|
qemu_lockcnt_unlock(&xyz_lockcnt);
|
|
|
|
If an object has to be freed right after a visit, you can combine
|
|
the decrement, the locking and the check on count as follows::
|
|
|
|
qemu_lockcnt_inc(&xyz_lockcnt);
|
|
if (xyz) {
|
|
XYZ *p = qatomic_rcu_read(&xyz);
|
|
...
|
|
/* Accesses can now be done through "p". */
|
|
}
|
|
if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) {
|
|
g_free(xyz);
|
|
xyz = NULL;
|
|
qemu_lockcnt_unlock(&xyz_lockcnt);
|
|
}
|
|
|
|
``QemuLockCnt`` can also be used to access a list as follows::
|
|
|
|
qemu_lockcnt_inc(&io_handlers_lockcnt);
|
|
QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) {
|
|
if (ioh->revents & G_IO_OUT) {
|
|
ioh->fd_write(ioh->opaque);
|
|
}
|
|
}
|
|
|
|
if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) {
|
|
QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
|
|
if (ioh->deleted) {
|
|
QLIST_REMOVE(ioh, next);
|
|
g_free(ioh);
|
|
}
|
|
}
|
|
qemu_lockcnt_unlock(&io_handlers_lockcnt);
|
|
}
|
|
|
|
Again, the RCU primitives are used because new items can be added to the
|
|
list during the walk. ``QLIST_FOREACH_RCU`` ensures that the processor and
|
|
the compiler see the appropriate memory barriers.
|
|
|
|
An alternative pattern uses ``qemu_lockcnt_dec_if_lock``::
|
|
|
|
qemu_lockcnt_inc(&io_handlers_lockcnt);
|
|
QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) {
|
|
if (ioh->deleted) {
|
|
if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) {
|
|
QLIST_REMOVE(ioh, next);
|
|
g_free(ioh);
|
|
qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt);
|
|
}
|
|
} else {
|
|
if (ioh->revents & G_IO_OUT) {
|
|
ioh->fd_write(ioh->opaque);
|
|
}
|
|
}
|
|
}
|
|
qemu_lockcnt_dec(&io_handlers_lockcnt);
|
|
|
|
Here you can use ``qemu_lockcnt_dec`` instead of ``qemu_lockcnt_dec_and_lock``,
|
|
because there is no special task to do if the count goes from 1 to 0.
|