top of page
  • Writer's pictureSunil Kumar Yadav

Debugging Data Race Conditions Using Thread Sanitizer


In past to speed up the execution of software one had to either upgrade the underlying hardware i.e. microcontroller or microprocessor which runs at a higher clock frequency or write a multithreaded application to speed up the application execution time. Upgrading the hardware is not always an option and hence using multi-threaded software to speed up the performance is the only cost-effective option. As we increase the number of threads in our application we need to ensure appropriate synchronization techniques are used to avoid deadlocks and race conditions.



What is Data Race?

Due to the complex nature of synchronization of resources by multi-threaded applications, it's important to ensure no task/thread leads to a race condition. Both terms are used in the developer community working on multithreaded applications and they are the root cause of many bugs but they both are different.


Race Condition

A race condition occurs when the timing or order of events affects the correctness of a piece of code. A race condition is a semantic error. It is a flaw that occurs in the timing or the ordering of events that leads to erroneous program behavior.



Data Race

A data race occurs when one thread accesses a mutable object while another thread is writing to it. That is a data race occurs when 2 instructions from different threads access the same memory location, at least one of these accesses is a write and there is no synchronization that is mandating any particular order among these accesses.

A race condition can occur without a data race, while a data race can occur without a race condition. For example, the order of events can be consistent, but if there’s always a read at the same time as a write, there’s still a data race.

Debugging such synchronization problems is hard and using traditional debugging methods like adding print statements or executing under the control of a debugger will not be helpful in such a scenario as these issues may not be reproducible all the time. For example, if we try to add print statements in threads then we may not be able to have an exact timing sequence due to the overhead of the print function call. This may result in masking the root cause of such race conditions. Using a debugger to investigate the root cause is even worst. For example, every breakpoint that is reached by the debugger will trigger the below chain of events.

  • context switch from running thread to debugger

  • debugger will stop all other threads (ex. default all-stop mode in GDB)

  • debugger will evaluate the breakpoint commands

  • debugger resumes all the thread which intern is a complicated multistep process.


ThreadSanitizer aka TSan

In order to effectively debug such synchronization issues, one approach could be to use sanitizer tools. For example, GCC and Clang have support for address and thread sanitizer. Using -fsanitize=thread flag at compile and link phase will create instrumented binary to detect and report data race conditions. In order to get reasonable performance enable optimization by adding -O2 and using the -g flag to get the file name and line number in the log message. For more detailed information on this option please refer GitHub page of TSan.


Let's try to understand this option with the help of a simple example.

// example_data_race.c
#include <pthread.h>
#include <stdio.h>

int Global;                   // shared resource between thread

void* Thread1(void* x) {
  Global++;                   // line 7
  return NULL;
}

void* Thread2(void* x) {
  Global--;                  // line 12
  return NULL;
}

int main() {
  pthread_t t[2];
  pthread_create(&t[0], NULL, Thread1, NULL);
  pthread_create(&t[1], NULL, Thread2, NULL);
  pthread_join(t[0], NULL);
  pthread_join(t[1], NULL);
}

In order to detect data race we need to pass -g -O2 -pthread -fsanitize=thread as additional flags.

$ gcc -g -O2 -fsanitize=thread -pthread example_data_race.c

After a successful link phase, if we try to run the executable will result in the below output message. The below message indicate that there is a data race i.e. WARNING: ThreadSanitizer: data race (pid=12115) and the location of the data race is line no. 7 and line no. 12.

$ ./a.out
==================
WARNING: ThreadSanitizer: data race (pid=12115)
   Read of size 4 at 0x55a429f5d014 by thread T2:
    #0 Thread2 /home/sky/workspace/example_data_race.c:12 (a.out+0x135a)

  Previous write of size 4 at 0x55a429f5d014 by thread T1:
    #0 Thread1 /home/sky/workspace/example_data_race.c:7 (a.out+0x132f)

  Location is global 'Global' of size 4 at 0x55a429f5d014 (a.out+0x000000004014)

  Thread T2 (tid=12118, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79)
    #1 main /home/sky/workspace/example_data_race.c:19 (a.out+0x11b2)

  Thread T1 (tid=12117, finished) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79)
    #1 main /home/sky/workspace/example_data_race.c:18 (a.out+0x119f)
==================
ThreadSanitizer: reported 1 warnings

Using Thread Sanitizer we can detect different types of data race conditions. A few of them are listed below and this GitHub link discusses them in detail.

  • Normal data races

  • Races on C++ object vptr

  • Use after free races

  • Races on mutexes

  • Races on file descriptors

  • Races on pthread_barrier_t

  • Destruction of a locked mutex

  • Leaked threads

  • Signal-unsafe malloc/free calls in signal handlers

  • Signal handler spoils errno

  • Potential deadlocks (lock order inversions)


So far we've just discussed how to use TSan and its advantages but there are a few downsides as well. Since -fsanitize=thread option instruments the application to detect various data race conditions there are few overheads that vary from program to program. Typically the instrumentation causes memory usage to increase by 5 -10 times and execution time by 2 - 20 times. Having said that Thread Sanitizer is a better option to detect race conditions than other alternatives.


Conclusion

As discussed in the above article, debugging concurrent bugs is difficult using the traditional debugging method. Using Thread Sanitizer and similar technologies to detect data race conditions is a cost and time-effective solution.

1,140 views0 comments

Comments


bottom of page