In my last article C++: Introduction to Concurrency, I've gone through fundaments of concurrency and it's benefits in terms of improving performance of application with simple C++ examples. In current article I'll try to explain pitfalls of sharing data between threads and how to avoid them.
1. Introduction
Imagine that you're sharing an small apartment with your friend. There's only one kitchen and one bathroom. Unless you're particularly friendly, you can't both use the bathroom at the same time. In some instances it may become frustrating if you've important meeting to attend and you can not use the bathroom as your friend has occupied it for a long time. Likewise, though it might be possible to both cook meal at the same time, if you've a combined oven and grill, it's not going to end well if one of you tries to grill tandoori Chicken at the same time other is baking cake. We all know the frustration of sharing space and getting halfway through a task only to find someone has borrowed something we need or changed from halfway we left it.
Above situation is equally applicable to threads. If you're sharing data between threads, you need to have rules which defines how and which thread can access which part of data, and have well defined mechanism to communicate with other threads about the updates on shared data.
2. Problems With Sharing Data Between Threads
The problem with sharing data between threads are all due to the consequences of modifying data. If all shared data is read only then will be no problem as reading data by one thread does not impact other thread. The problem arises only if data is shared between threads is modified by one or more thread.
In order to understand this problem, lets consider an example which involves deleting a node from doubly linked list in multi threaded program. In order to delete an node from such list would require below steps:
Identify the node to delete: Nx Node
Update the link from the node prior to Nx i.e. Nx-1 to point he node node after Nx i.e. Nx+1
Update the link from the node after Nx i.e Nx+1 to point the node prior to Nx i.e Nx-1
Delete node Nx
If we've two threads, one reading the node values and another deleting, then this will create issue as it may happen the reading thread is at Nx node and another thread deletes it in the meantime, causing reading thread in state where it reads node which no longer part of list, corrupting the data structure permanently and eventually crashing the program. Whatever the outcome, this is an example of one of the most common bugs in concurrent code also know as race condition.
When sharing data between threads, there are a number of potential problems, including:
Race conditions
Race conditions can occur when a thread accesses a data structure while another thread is modifying it. These can be difficult to find and reproduce because the window of opportunity is small.
False sharing
When multiple threads access the same cache line, it can cause costly invalidation misses and upgrades. This can happen even if the threads are accessing unrelated data that are just allocated in the same cache line.
Data corruption and deadlocks
These can occur when multiple threads access shared data without careful synchronization.
Operating system priorities
The operating system may prioritize a task that is waiting for an element in a queue, even if another task is ready to proceed.
Difficulty writing, debugging, and testing
Multithreaded applications can be difficult to write, debug, manage, and test.
To prevent these problems, we have to carefully choose the right synchronization mechanism applicable for use case.
3. Protecting Shared Data With Mutexes
Before jumping to mutex, lets try to revisit why we need synchronization primitives like mutex with simple example. Below example 1 contains print_block() function which based on supplied character and no. of count, prints onto standard console.
// Example 1
#include <iostream> // std::cout
#include <thread> // std::thread
void print_block (int n, char c) {
for (int i=0; i<n; ++i) {
std::cout << c;
}
std::cout << '\n';
}
int main ()
{
std::thread th1 (print_block, 50, '*');
std::thread th2 (print_block, 50, '$');
th1.join();
th2.join();
return 0;
}
Output:
****$$$$$$$$$$$$$$$$$$$$$**********************************************
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
As both thread th1 and th2 are competing print data on console, we get output shown in above output section which keep changing based on how threads are scheduled by OS. This output may suggest this outcome is not what we expected while writing above program.
3.1 What is Mutex?
Mutex or mutual exclusion is synchronization primitive. Mutex is object that synchronizes access to a resource. It is created with a unique name at the start of a program. The mutex locking mechanism ensures only one thread can acquire the mutex and enter the critical section. This thread only releases the mutex when it exits in the critical section.
Let rewrite the example 1 with the help of mutex. To do so we need to #include<mutex> into our example which contain std::mutex. Once std::mutex object created, we can lock() before entering critical section and unlock() mutex object once critical section execution is complete.
// Example 2
#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex
std::mutex mtx; // mutex for critical section
void print_block (int n, char c) {
// critical section i.e. exclusive access
// to std::cout signaled by locking mtx
mtx.lock();
for (int i=0; i<n; ++i) {
std::cout << c;
}
std::cout << '\n';
mtx.unlock();
}
int main ()
{
std::thread th1 (print_block, 50, '*');
std::thread th2 (print_block, 50, '$');
th1.join();
th2.join();
return 0;
}
Output:
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
**************************************************
Although we've solved major problem of earlier problem with mutex, we still have another issue which can impact our program. Consider an case of where example 2 is running where th1 started the execution and after locking the mutex object, th1 throws exception without unlocking the mutex. Now th2 start its execution, as mutex mtx is in locked, th2 keeps waiting for th1 to release the lock, which is not possible due to exception.
3.2 Using lock_guard with Mutex
The Standard C++ Library provides the std::lock_guard class template, which implements that RAII idiom for a mutex; it locks the supplied mutex on construction and unlocks it on
destruction, ensuring a locked mutex is always correctly unlocked.
// Example 3
#include <list>
#include <mutex> // std::mutex, std:lock_guard
#include <algorithm> // std::find
std::list<int> some_list; // ----- 1
std::mutex some_mutex; // ----- 2
void add_to_list(int new_value){
std::lock_guard<std::mutex> guard(some_mutex); // ----- 3
some_list.push_back(new_value);
}
bool list_contains(int value_to_find){
std::lock_guard<std::mutex> guard(some_mutex); // ------ 4
return std::find(some_list.begin(),some_list.end(),value_to_find)
!= some_list.end();
}
In example 3, there is single global variable highlighted using comment 1 and it's protected with corresponding global instance of std::mutex, shown by comment 2. The use of std::lock_guard<std::mutex> in add_to_list() shown by comment 3, and again in list_contains() shown by comment 4, means that the accesses in these functions are mutually exclusive: list_contains() will never see the list partway through a modification by add_to_list().
Conclusion
In C++, you create a mutex by constructing an instance of std::mutex, lock it with a call to the lock() member function, and unlock it with a call to the unlock() member function. But it isn’t recommended practice to call the member functions directly, because this means that you have to remember to call unlock() on every code path out of a function, including those due to exceptions. Using std::lock_guard class template we can leverage RAII idiom to safely use mutex. This article only covers basic aspect of challenges multithreaded application and other concurrency pitfalls would be covered in future articles.
Reference:
A Tour of C++ - Bjarne Stroustrup
C++ Concurrency In Action - Anthony Williams
Opmerkingen