Sunday, December 2, 2012

Thread safe programming

There are several ways in which a function can be thread safe.
It can be reentrant. This means that a function has no state, and does not touch any global or static variables, so it can be called from multiple threads simultaneously. The term comes from allowing one thread to enter the function while another thread is already inside it.
It can have a critical section. This term gets thrown around a lot, but frankly I prefer critical data. A critical section occurs any time your code touches data that is shared across multiple threads. So I prefer to put the focus on that critical data.
If you use a mutex properly, you can synchronize access to the critical data, properly protecting from thread unsafe modifications. Mutexes and Locks are very useful, but with great power comes great responsibility. You must not lock the same mutex twice within the same thread (that is a self-deadlock). You must be careful if you acquire more than one mutex, as it increases your risk for deadlock. You must consistently protect your data with mutexes.
If all of your functions are thread safe, and all of your shared data properly protected, your application should be thread safe.
As Crazy Eddie said, this is a huge subject. I recommend reading up on boost threads, and using them accordingly.
low-level caveat: compilers can reorder statements, which can break thread safety. With multiple cores, each core has its own cache, and you need to properly sync the caches to have thread safety. Also, even if the compiler doesn't reorder statements, the hardware might. So, full, guaranteed thread safety isn't actually possible today. You can get 99.99% of the way there though, and work is being done with compiler vendors and cpu makers to fix this lingering caveat.
Anyway, if you're looking for a checklist to make a class thread-safe:
  • Identify any data that is shared across threads (if you miss it, you can't protect it)
  • create a member boost::mutex m_mutex and use it whenever you try to access that shared member data (ideally the shared data is private to the class, so you can be more certain that you're protecting it properly).
  • clean up globals. Globals are bad anyways, and good luck trying to do anything thread-safe with globals.
  • Beware the static keyword. It's actually not thread safe. So if you're trying to do a singleton, it won't work right.
  • Beware the Double-Checked Lock Paradigm. Most people who use it get it wrong in some subtle ways, and it's prone to breakage by the low-level caveat.
That's an incomplete checklist. I'll add more if I think of it, but hopefully it's enough to get you started.

Thread safety is a computer programming concept applicable in the context of multi-threaded programs. A piece of code is thread-safe if it only manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the same time. There are various strategies for making thread-safe data structures.[1][2]
A key challenge in multi-threaded programming, thread safety was not a concern for most application developers until the 1990s when operating systems began to expose multiple threads for code execution. Today, a program may execute code on several threads simultaneously in a sharedaddress space where each of those threads has access to virtually all of the memory of every other thread. Thread safety is a property that allows code to run in multi-threaded environments by re-establishing some of the correspondences between the actual flow of control and the text of the program, by means of synchronization.

Levels of thread safety

Software libraries can provide certain thread-safety guarantees. For example, concurrent reads might be guaranteed to be thread-safe, but concurrent writes might not be. Whether or not a program using such a library is thread-safe depends on whether it uses the library in a manner consistent with those guarantees.
Different vendors use slightly different terminology for thread-safety:[3][4][5][6]
  • Thread safe: Implementation is guaranteed to be free of race conditions when accessed by multiple threads simultaneously.
  • Conditionally safe: Different threads can access different objects simultaneously, and access to shared data is protected from race conditions.
  • Not thread safe: Code should not be accessed simultaneously by different threads.
Thread safety guarantees usually also include design steps to prevent or limit the risk of different forms of deadlocks, as well as optimizations to maximize concurrent performance. However, deadlock-free guarantees can not always be given, since deadlocks can be caused by callbacks and violation of architectural layering independent of the library itself.

[edit]Implementation approaches

There are a several approaches for avoiding race conditions to achieve thread safety. The first class of approaches focuses on avoiding shared state, and includes:
Writing code in such a way that it can be partially executed by a thread, reexecuted by the same thread or simultaneously executed by another thread and still correctly complete the original execution. This requires the saving of state information in variables local to each execution, usually on a stack, instead of in static or global variables or other non-local state. All non-local state must be accessed through atomic operations and the data-structures must also be reentrant.
Thread-local storage 
Variables are localized so that each thread has its own private copy. These variables retain their values across subroutine and other code boundaries, and are thread-safe since they are local to each thread, even though the code which accesses them might be executed simultaneously by another thread.
The second class of approaches are synchronization-related, and are used in situations where shared state cannot be avoided:
Mutual exclusion
Access to shared data is serialized using mechanisms that ensure only one thread reads or writes to the shared data at any time. Incorporation of mutal exclusion needs to be well thought out, since improper usage can lead to side-effects like deadlockslivelocks and resource starvation.
Atomic operations 
Shared data are accessed by using atomic operations which cannot be interrupted by other threads. This usually requires using special machine language instructions, which might be available in a runtime library. Since the operations are atomic, the shared data are always kept in a valid state, no matter how other threads access it. Atomic operations form the basis of many thread locking mechanisms, and are used to implement mutual exclusion primitives.
Immutable objects 
The state of an object cannot be changed after construction. This implies that only read-only data is shared and inherent thread safety. Mutable (non-const) operations can then be implemented in such a way that they create new objects instead of modifying existing ones. This approach is used by the string implementations in Java, C# and python.[7]


In the following piece of C code, the function is thread-safe, but not reentrant:
int increment_counter ()
        static int counter = 0;
        static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
        // only allow one thread to increment at a time
        // store value before any other threads increment it further
        int result = counter;   
        return result;
In the above, increment_counter can be called by different threads without any problem since a mutex is used to synchronize all access to the shared counter variable. But if the function is used in a reentrant interrupt handler and a second interrupt arises inside the function, the second routine will hang forever. As interrupt servicing can disable other interrupts, the whole system could suffer.
The same function can be implemented to be both thread-safe and reentrant using the lock-free atomics in C++11:
int increment_counter ()
        static std::atomic<int> counter(0);
        // increment is guaranteed to be done atomically
        int result = ++counter;
        return result;