Mutex vs. binary semaphore
Mutices and semaphores are among the most basic tools in multithreaded programming. However most people I asked do not know what is the difference between them. So please let me introduce you to them.
Consider a resource which is shared between multiple threads. For example a container. You don’t want to have multiple threads modifying the container simultaneously, or one thread modifying the container while other threads are reading from it, otherwise you will end up with classical race conditions and unpredictable things will happen.
To guard a resource from other threads while you are accessing it, you use a synchronization primitive, which you conceptually associate with the guarded resource. Threads can lock the synchronization primitive when they need to access the resource. Then they can release the primitive after they are finished with accessing the resource. When the synchronization primitive is already locked, a thread trying to lock it will wait/stall until the other thread who locked it – unlocks it.
At the first glance, both mutex and binary semaphore fit the description of the above synchronization primitive. Well, not quite. Using binary semaphore in place of a mutex is a bad idea.
Conceptually a semaphore is like an integer. You can increment it and you can decrement it. If the semaphore’s value is 0, the thread trying to decrement it will wait/stall until somebody else increments it. This way, the semaphore never has a negative value.
A binary semaphore is just a semaphore capped at one, i.e. it’s value cannot exceed one. You can treat the decrement operation as “lock” and the increment operation as “unlock”.
The problem with the semaphore is that any thread can increment it or decrement it. In particular, if the semaphore’s value is 0 (“locked”), another thread can increment it (“unlock”), even if this is not the thread which locked it! It takes more discipline to write code which correctly uses binary semaphores for locking and there is still a potential for error.
Another problem is that most semaphore implementations allow sharing semaphores between processes. This makes them much heavier than e.g. POSIX mutices or critical sections on Windows, which are lightweight, because they only work within one process and don’t require calling into kernel space in most cases.
Unlike binary semaphores,mutices may also have another interesting property, depending on an implementation, i.e. they can be recursive. A recursive mutex can be locked twice from the same thread. This allows you to write an accessor function and not have to care whether the mutex is already locked by the current thread or not. However it is generally not recommended to use recursive mutices, the necessity for recursive mutices is a sign of a bad design and indicates that there may be potential problems with the interfaces or even hidden multithreading bugs.
In general mutices (or critical sections on Windows) are typically recommended over binary semaphores as synchronization primitives between multiple threads in the same process.
There is also one, much more nasty problem with the semaphores (both binary and counting). It is the priority inversion – a semaphore locked by a low priority process can block the whole OS if the higher priority processes need to wait for unlocking/releasing the semaphore. This condition happens though only for OS that uses fixed priority preemptive scheduling – most RTOS for embedded devices are prone.
Threading it’s an old school idea. Dogma which follows us every day in programming. Better is to avoid using mutex or anything like that. The best do not use threads at all – http://vimeo.com/71278954
The problem with the semaphore is that any thread can increment it or decrement it. In particular, if the semaphore’s value is 0 (“locked”), another thread can increment it (“unlock”), even if this is not the thread which locked it!
– Is it not that the same property of (binary) semaphores makes them more suitable for synchronization rather than protection? So for example, I want to share a task between two threads, for example task ‘a’ writes into a queue from which task ‘b’ must read. Here task ‘b’ waits on the binary semaphore before it can read from the queue. Whenever task ‘a’ writes into the queue it unlocks the semaphore thus ‘indicating’ to task ‘b’ that there is some data available on the queue for it to read.
Semaphores are more widely used for task-task sync or task-ISR sync, while it is better to use mutex for mutually exclusive access to resources. And task synchronization is not same thing as resource access.
http://www.writeulearn.com/binary-semaphore-mutex-semaphore/
Take a look at my efforts on explaining both of these concepts in a easy to understand way at https://theleancoder.wordpress.com/2015/10/14/diving-deep-with-semaphore-and-mutex/