top of page
  • Writer's pictureSunil Kumar Yadav

Understanding the runtime behaviors of C++ programs using uftrace tool

Updated: Oct 3, 2021



Whether you are an expert programmer or a freshman in university, it's always fascinating to learn how systems work under the hood. Since we work at a much higher abstraction level, many a time we don't bother what goes once an application run on a machine. There are many tools available that can give us detailed information about subsystems or underlying OS on which our application runs. I myself was not aware of excellent tools like ftrace or strcace until one day one of the recorder applications which was built by my company was not working properly. With the help of ftrace, we could get access to kernel space calls and signals, which helped us to identify bottlenecks in our application.


To put it very simply: ftrace or function trace is a Linux kernel feature that lets you trace Linux kernel function calls. Essentially, it lets you look into the Linux kernel and see what it’s doing. Why would you want to do that? Well, most of the time you wouldn’t. But when you do, it is an essential ability to have!


Userspace Function Tracing using uftrace

Similarly, for our userspace program, we've many such tools which can give us detailed information about our application. Depending upon your toolchain and application i.e. whether your application runs on bare metal or on full-fledged OS, you may have multiple options to inspect the internals of your application and tune up the performance. For example, in our last article, we've discussed how to use the GNU compiler's profiler to get useful metrics. Even though gprof gives us useful metrics it's not exhaustive as strace or ftrace counterpart of kernel space tracing.


uftrace is a function tracing program for userspace. I got familiar after watching the CppCon 2017 conference where Honggyu Kim gone through how to utilize this tool to understand the runtime behavior of the application. This utility could be directly installed from your favorite Linux distro but in case you could not find or want to build it from the source then you can refer to this Github repo.


So let's try to quickly go through how to use uftrace with the help of a small example. In order to capture trace information from the running application, the user needs to enable -pg or -finstrument-functions flag while compiling their code using the GNU compiler. For our simple example, we would use the below factorial example.


// factorial.cpp
#include<iostream>
#include <cstdlib>
using namespace std;

int factorial(int n)
{
	int fact{1};
	while(n>0)
	{
		fact*=n;
		n--;
	}
	return fact;
}

int main(int argc, char** argv)
{
	int num;
	if(argc >= 2) 
	{
		num=atoi(argv[1]);  
	}
	else
	{
		num=1;
	}
	cout<<num<<" factorial: "<<factorial(num)<<endl;
	return 0;
}

Compiling the above example with -pg will create an elf file with the default name a.out on which we can run uftrace.

$ g++ -pg factorial.cpp 

Once we execute the uftrace command without specifying any option it produces the below output which contains the function call graph along with the time taken to execute specifies subroutine/function.

$ uftrace ./a.out 10
10 factorial: 3628800
# DURATION     TID     FUNCTION
            [  3964] | _GLOBAL__sub_I_factorial() {
            [  3964] |   __static_initialization_and_destruction_0() {
  84.647 us [  3964] |     std::ios_base::Init::Init();
   0.313 us [  3964] |     __cxa_atexit();
  86.627 us [  3964] |   } /* __static_initialization_and_destruction_0 */
  87.213 us [  3964] | } /* _GLOBAL__sub_I_factorial */
            [  3964] | main() {
   0.482 us [  3964] |   atoi();
  10.862 us [  3964] |   std::basic_ostream::operator<<();
   4.025 us [  3964] |   std::operator<<();
   0.132 us [  3964] |   factorial();
   0.373 us [  3964] |   std::basic_ostream::operator<<();
  38.927 us [  3964] |   std::basic_ostream::operator<<();
  56.628 us [  3964] | } /* main */

As you can see by default uftrace push data onto stdout, which may work for smaller example like ours but in actual projects, it might not be a sound strategy. Hence better alternative is to use the record sub-command and then use replay sub-command to view the results.


$ uftrace record ./a.out 10
10 factorial: 3628800
$ ls
a.out  factorial.cpp  gmon.out  uftrace.data
$ uftrace replay
# DURATION     TID     FUNCTION
            [  3988] | _GLOBAL__sub_I_factorial() {
            [  3988] |   __static_initialization_and_destruction_0() {
 177.500 us [  3988] |     std::ios_base::Init::Init();
   0.343 us [  3988] |     __cxa_atexit();
 180.354 us [  3988] |   } /* __static_initialization_and_destruction_0 */
 181.167 us [  3988] | } /* _GLOBAL__sub_I_factorial */
            [  3988] | main() {
   0.427 us [  3988] |   atoi();
  15.000 us [  3988] |   std::basic_ostream::operator<<();
   5.835 us [  3988] |   std::operator<<();
   0.239 us [  3988] |   factorial();
   0.579 us [  3988] |   std::basic_ostream::operator<<();
 137.461 us [  3988] |   std::basic_ostream::operator<<();
 161.879 us [  3988] | } /* main */

Once we've recorded trace on our application we can run other sub-command to get desired results. For example, executing the report sub-command will print various statistics and a summary of the recorded trace data.

$ uftrace report 
  Total time   Self time       Calls  Function
  ==========  ==========  ==========  ====================
  125.888 us    1.821 us           1  main
  119.215 us  119.215 us           3  std::basic_ostream::operator<<
   83.681 us    0.855 us           1  _GLOBAL__sub_I_factorial
   82.826 us    1.726 us           1  __static_initialization_and_destruction_0
   80.855 us   80.855 us           1  std::ios_base::Init::Init
    4.332 us    4.332 us           1  std::operator<<
    0.347 us    0.347 us           1  atoi
    0.245 us    0.245 us           1  __cxa_atexit
    0.173 us    0.173 us           1  factorial

By running graph sub-command uftrace can print function call graph.

$ uftrace  graph
# Function Call Graph for 'a.out' (session: 38b0e25dae5f4c63)
========== FUNCTION CALL GRAPH ==========
# TOTAL TIME   FUNCTION
  209.569 us : (1) a.out
   83.681 us :  +-(1) _GLOBAL__sub_I_factorial
   82.826 us :  | (1) __static_initialization_and_destruction_0
   80.855 us :  |  +-(1) std::ios_base::Init::Init
             :  |  | 
    0.245 us :  |  +-(1) __cxa_atexit
             :  | 
  125.888 us :  +-(1) main
    0.347 us :     +-(1) atoi
             :     | 
  119.215 us :     +-(3) std::basic_ostream::operator<<
             :     | 
    4.332 us :     +-(1) std::operator<<
             :     | 
    0.173 us :     +-(1) factorial

Another interesting sub-command of uftrace is info. This sub-command prints the system info like CPU, OS details, command-line option passed the application, etc.

$ uftrace info 
# system information
# ==================
# program version     : v0.10 ( x86_64 luajit perf sched dynamic )
# recorded on         : Thu Sep 23 02:06:14 2021
# cmdline             : uftrace record ./a.out 15
# cpu info            : Intel(R) Core(TM) i5-9300H CPU @ 2.40GHz
# number of cpus      : 2 / 2 (online / possible)
# memory info         : 0.1 / 1.9 GB (free / total)
# system load         : 0.33 / 0.36 / 0.30 (1 / 5 / 15 min)
# kernel version      : Linux 5.11.0-34-generic
# hostname            : ubuntu
# distro              : "Ubuntu 20.04.3 LTS"
#
# process information
# ===================
# number of tasks     : 1
# task list           : 4157(a.out)
# exe image           : /home/sky/workspace/cpp/uftrace/a.out
# build id            : 1dd356a8961297905399f4860b4c62a2b596b085
# pattern             : regex
# exit status         : exited with code: 0
# elapsed time        : 0.005375767 sec
# cpu time            : 0.000 / 0.005 sec (sys / user)
# context switch      : 1 / 7 (voluntary / involuntary)
# max rss             : 4784 KB
# page fault          : 1 / 314 (major / minor)
# disk iops           : 0 / 8 (read / write)

uftrace also allows limited kernel tracing and users can enable this via -k option. In order to run tracing in this mode, we need to have superuser privileges. Hence we need to add sudo prefix before running the command.

$ sudo uftrace -k ./a.out 10
10 factorial: 3628800
# DURATION     TID     FUNCTION
            [  4029] | _GLOBAL__sub_I_factorial() {
            [  4029] |   __static_initialization_and_destruction_0() {
            [  4029] |     std::ios_base::Init::Init() {
   0.786 us [  4029] |       fault_in_kernel_space();
   0.634 us [  4029] |       down_read_trylock();
   0.993 us [  4029] |       _cond_resched();
   1.512 us [  4029] |       find_vma();
  26.528 us [  4029] |       handle_mm_fault();
   0.563 us [  4029] |       up_read();
   0.746 us [  4029] |       exit_to_user_mode_prepare();
   0.656 us [  4029] |       fault_in_kernel_space();
   0.639 us [  4029] |       down_read_trylock();
   0.743 us [  4029] |       _cond_resched();
   1.897 us [  4029] |       find_vma();
  19.806 us [  4029] |       handle_mm_fault();
   0.496 us [  4029] |       up_read();
   0.812 us [  4029] |       exit_to_user_mode_prepare();
   0.655 us [  4029] |       fault_in_kernel_space();
   0.644 us [  4029] |       down_read_trylock();
   0.875 us [  4029] |       _cond_resched();
   0.874 us [  4029] |       find_vma();
  24.613 us [  4029] |       handle_mm_fault();
   0.642 us [  4029] |       up_read();
   0.787 us [  4029] |       exit_to_user_mode_prepare();
   1.933 us [  4029] |       __x64_sys_futex();
   0.802 us [  4029] |       exit_to_user_mode_prepare();
   0.705 us [  4029] |       fault_in_kernel_space();
   0.614 us [  4029] |       down_read_trylock();
   1.163 us [  4029] |       _cond_resched();
   0.798 us [  4029] |       find_vma();
  24.491 us [  4029] |       handle_mm_fault();
   0.602 us [  4029] |       up_read();
   0.798 us [  4029] |       exit_to_user_mode_prepare();
   0.627 us [  4029] |       fault_in_kernel_space();
   0.614 us [  4029] |       down_read_trylock();
   0.680 us [  4029] |       _cond_resched();
   0.663 us [  4029] |       find_vma();
  53.540 us [  4029] |       handle_mm_fault();
   0.746 us [  4029] |       up_read();
   0.775 us [  4029] |       exit_to_user_mode_prepare();
   5.975 us [  4029] |       fault_in_kernel_space();
   0.708 us [  4029] |       irq_enter_rcu();
   3.410 us [  4029] |       __sysvec_irq_work();
   0.843 us [  4029] |       irq_exit_rcu();
   0.501 us [  4029] |       down_read_trylock();
   0.810 us [  4029] |       _cond_resched();
   0.806 us [  4029] |       find_vma();
  23.312 us [  4029] |       handle_mm_fault();
   0.707 us [  4029] |       up_read();
   0.849 us [  4029] |       exit_to_user_mode_prepare();
   0.656 us [  4029] |       fault_in_kernel_space();
   0.669 us [  4029] |       down_read_trylock();
   0.789 us [  4029] |       _cond_resched();
   0.861 us [  4029] |       find_vma();
  24.662 us [  4029] |       handle_mm_fault();
   0.759 us [  4029] |       irq_enter_rcu();
   2.671 us [  4029] |       __sysvec_irq_work();
   0.806 us [  4029] |       irq_exit_rcu();
   0.680 us [  4029] |       up_read();
   0.808 us [  4029] |       exit_to_user_mode_prepare();
   2.175 us [  4029] |       __x64_sys_futex();
   0.839 us [  4029] |       exit_to_user_mode_prepare();
 436.268 us [  4029] |     } /* std::ios_base::Init::Init */
   0.654 us [  4029] |     __cxa_atexit();
 440.760 us [  4029] |   } /* __static_initialization_and_destruction_0 */
 443.576 us [  4029] | } /* _GLOBAL__sub_I_factorial */
            [  4029] | main() {
   0.657 us [  4029] |   atoi();
            [  4029] |   std::basic_ostream::operator<<() {
   4.137 us [  4029] |     fault_in_kernel_space();
   0.767 us [  4029] |     irq_enter_rcu();
   2.917 us [  4029] |     __sysvec_irq_work();
   0.860 us [  4029] |     irq_exit_rcu();
   0.717 us [  4029] |     down_read_trylock();
   0.817 us [  4029] |     _cond_resched();
   1.030 us [  4029] |     find_vma();
  24.500 us [  4029] |     handle_mm_fault();
   0.665 us [  4029] |     up_read();
   0.829 us [  4029] |     exit_to_user_mode_prepare();
  14.539 us [  4029] |     __x64_sys_newfstat();
   0.815 us [  4029] |     exit_to_user_mode_prepare();
  79.476 us [  4029] |   } /* std::basic_ostream::operator<< */
            [  4029] |   std::operator<<() {
   0.767 us [  4029] |     fault_in_kernel_space();
   0.666 us [  4029] |     down_read_trylock();
   0.796 us [  4029] |     _cond_resched();
   2.156 us [  4029] |     find_vma();
   6.622 us [  4029] |     handle_mm_fault();
   0.649 us [  4029] |     up_read();
   0.839 us [  4029] |     exit_to_user_mode_prepare();
  23.466 us [  4029] |   } /* std::operator<< */
   0.294 us [  4029] |   factorial();
   0.840 us [  4029] |   std::basic_ostream::operator<<();
            [  4029] |   std::basic_ostream::operator<<() {
            [  4029] |     __x64_sys_write() {
  84.936 us [  4029] |       /* linux:schedule */
 159.048 us [  4029] |     } /* __x64_sys_write */
            [  4029] |     exit_to_user_mode_prepare() {
  24.570 us [  4029] |       /* linux:schedule */
  39.403 us [  4029] |     } /* exit_to_user_mode_prepare */
 213.680 us [  4029] |   } /* std::basic_ostream::operator<< */
 322.320 us [  4029] | } /* main */

From the above output, it's very interesting to know under the hood kernel does heavy lifting. There is another interesting command called dump which produces the raw output of each trace record and using --chrome option user can visualize trace data in the chrome browser. For example, below is the screenshot of clang compiling a small c++ template metaprogram using

$ uftrace dump --chrome



Users can get a detailed list of available options by running the below command:

$ uftrace --help

Below are few important subcommands available from uftrace with their description.

record Run a given command and save trace data in a data file or directory.
replay Print recorded function trace data with time durations.
live   Do live tracing.  Print function trace of the given command.
report Print various statistics and summary of the recorded trace data.
info   Print side-band information like OS version, CPU info, command line and so on.
dump   Print raw tracing data in the data files.
recv   Save tracing data sent to network
graph  Print function call graph
script Run a script for recorded function trace

As we can see using the uftrace tool we can get tons of information about our application and based on the results we can optimize the parts of the application to improve efficiency.







Recent Posts

See All

Comments


bottom of page