A few months ago, a friend noted that they saw a significant increase in the
time required to read
/proc/<pid>/maps in Linux due to a change from a few years ago.
A patch was introduced to the Linux
kernel in 2012 (> 3.2) that marked thread stacks
/proc/<pid>/maps output. Previously, these regions were indistinguishable
from other anonymous memory.
Unfortunately, with this additional output there is a high cost for applications that utilize maps. In the current implementation of maps, anonymous memory requires scanning every thread in the thread group to detect the possibility of the VMA serving as a thread stack (see source). This means that a process with n thread stacks will require n2 scans of the thread group list. This gets very slow very quickly.
The good news is that this logic is not applied to
since it does not mark thread stacks of other threads. You can use this to your
advantage with the
--map option if you do not require distinguishing
all thread stacks and prefer not to use cached maps.
shell$ uname -r 3.15.4-x86_64-linode45 shell$ time bt $pid --thread 2821 [...] real 0m0.858s user 0m0.013s sys 0m0.370s shell$ time bt $pid --thread 2821 --map /proc/$pid/task/$pid/maps [...] real 0m0.075s user 0m0.003s sys 0m0.027s