A few months ago, a friend noted that they saw a significant increase in the
time required to read /proc/<pid>/maps
in Linux due to a change from a few years ago.
A patch was introduced to the Linux
kernel in 2012 (> 3.2) that marked thread stacks
in /proc/<pid>/maps
output. Previously, these regions were indistinguishable
from other anonymous memory.
Unfortunately, with this additional output there is a high cost for applications that utilize maps. In the current implementation of maps, anonymous memory requires scanning every thread in the thread group to detect the possibility of the VMA serving as a thread stack (see source). This means that a process with n thread stacks will require n2 scans of the thread group list. This gets very slow very quickly.
The good news is that this logic is not applied to /proc/<pid>/task/<tid>/maps
,
since it does not mark thread stacks of other threads. You can use this to your
advantage with the bt
--map
option if you do not require distinguishing
all thread stacks and prefer not to use cached maps.
shell$ uname -r
3.15.4-x86_64-linode45
shell$ time bt $pid --thread 2821
[...]
real 0m0.858s
user 0m0.013s
sys 0m0.370s
shell$ time bt $pid --thread 2821 --map /proc/$pid/task/$pid/maps
[...]
real 0m0.075s
user 0m0.003s
sys 0m0.027s