Its rare that computing creates something as elegant as the Futex: A simple and highly elegant system on top of which all the important synchronization primitives can be built, which has minimal overhead, and which is screamingly fast.
Its even better when they work everywhere at the same efficiency – that is, whether placed in the process’ local memory, or a shared memory segment. And its such a shame that nobody has implemented them outside of Linux.
So, what is a Futex?
A futex is actually surprisingly simple. Creating one doesn’t require any system calls, and is in fact most trivial.
A futex is just an aligned, normally 32-bit, integer
Its truly quite simple. Now, this isn’t the whole story: you can’t implement an efficient mutex on just an integer. Its just that the other involved structures (such as wait queues) only get involved when someone needs to wait for the futex. Until then, it is just a piece of memory that someone allocated
(In case you’re wondering how they work in shared memory: Futexes are keyed upon their physical address)
If its so simple, how do I use one?
A futex is just a counter. In fact, it can be said that a futex supports just two operations: “down” and “up”; which decrement and increment it respectively. To begin this illustration, we are going to begin with the semaphore, which is nothing more than a generalisation of a mutex.
To begin, allocate your futex somewhere, and set its value to the maximum value of the semaphore. In other words, if you want to be able to acquire the Semaphore 3 times, set it to 3. Obviously, for a mutex you would set it to 1.
Now, to acquire the semaphore: Perform an atomic decrement and fetch. Look at the value you just set the semaphore to; if this is positive or zero, you succeeded, if it is negative then ask the kernel to block you, and we will get to that in a moment.
How about releasing the semaphore? This is also simple; do the opposite: an atomic fetch and increment. Again look at the value returned; if it was positive or zero, you’re done; and otherwise you need set the counter to one, then ask the kernel to wake people up.
Getting the Kernel involved
OK, so far we have been working entirely in user space; but we can’t have an efficient mutex without the kernel. Lets get on to that.
Firstly, blocking; blocking is done by the FUTEX_WAIT method. You pass the address of the futex and the value the decrement operation returned to the kernel, and it atomically checks that the futex still has the value you passed and blocks you
This check is required in order to avoid a race condition: Another process ups the futex before you get into the kernel. In the case the value has changed, you must spin around and retry to acquire again.
Once you have been unblocked, you need to attempt to lock the futex again.
Now, unblocking: this is done by the FUTEX_WAKE method. You pass the futex address, and the number of threads you wish to wake; 1 is obviously a good value most of the time (and should also be most fair).
There, that was simple enough, wasn’t it?
Expanding: Condition Variables
OK, that was pretty simple; but what about condition variables? They can be implemented in a rather simple manner too:
- Start the Futex at 0
- Down it each time a thread waits on it
- Up it each time you want to wake a single thread
- Exchange the current value for 0, then wake up -the_previous_value processes to wake up everyone
See? Those are pretty simple too!
In closing: The issues, and how to make them better
As you can see, Futexes are both delightfully simple, and beautifully efficient. However, there is an area I can see for improvement. It also corrects what I call a hole in POSIX: Or, that mutexes cannot be converted to file descriptors (To wait on using mechanisms like select).
(In fact, it is interesting to note that there was in the past an attempt to add this feature, FUTEX_FD, but it was racy).
I would add a single method, FUTEX_WAIT_FD. The behaviour of this is as follows:
FUTEX_WAIT_FD will atomically compare the value of the futex with the previous value (as passed in a parameter), and build a file descriptor. The returned file descriptor is read only; attempts to write to it will return an error. When another thread or process calls FUTEX_WAKE on the futex, this will cause the file descriptor to become readable; reading it will return the number of waiters that FUTEX_WAKE was asked to wake up. Once this file descriptor has become readable, the thread must re-attempt to lock the futex in the same way as is done when woken from FUTEX_WAIT. After the file descriptor has become readable, it will not be activated again and must be closed.
I can also see other possibilities for improving the futex system (and other locking systems in general); obviously, they should be adapted to the operating system that is using them.
Further Reading
- man futex(2)
- man futex(7)
- Fuss, futexes and furwocks: Fast Userlevel Locking in Linux (PDF, by Hubertus Franke, Rusty Russell, Matthew Kirkwood)
- Futexes are tricky (PDF, by Ulrich Drepper, going into how to use them, and how Glibc uses them, in depth)
