Earlier this year we published a post titled Implementing A Debugger: The Fundamentals. This post gave an overview of debuggers, what they do, and how they work. In today’s post, we build upon this knowledge and talk about our journey of extending Backtrace’s debugger to support Go.

Intro

If you have the time and haven’t read Implement A Debugger: The Fundemantals, we highly recommend you do so. This post builds upon terms and knowledge discussed there. For the sake of those who don’t have the time, here are some of the high-level points:

  • Debuggers leverage debug information and a process’s state in memory to translate raw state into a form more easily understood by users, aka symbols. Think threads, callstack, variables, etc.
  • Stack winding gives us the callstack, which tells users the location in code the process is currently executing and the function calls that led up to it.
  • In order to synthesize variables, the debugger must evaluate debug information to understand characteristics of the variable (type, width, etc) and learn where variables are stored. This debug information is generated at compile time and is typically in a format called DWARF for Linux-based applications.

Finally, before we jump head first, a big thanks to everyone that helped complete this work and write this article.

Background

Backtrace’s journey in building a Go debugger didn’t start from square 0. We already had a full-fledged debugger built to take snapshots of C/C++ applications at the time of error. But as any gopher out there knows, existing C/C++ tools like GDB don’t work with Go out of the box. There were existing debuggers like Delve but these weren’t well suited for our use case, which was to quickly capture the state of the application at a point in time and send it into our turn-key debugging platform.

Go’s DWARF

Let’s start off by exploring some of the debug information (aka DWARF) generated by the Go compiler, gc, that caused our debugger to fail. Please note, this post focuses on gc and not gccgo (an interesting follow-up post.) For brevity, we’ll be referring to Backtrace’s debugger as BT for the rest of this article.

Offset encoding

From the start, BT had problems parsing the debug information generated by the Go compiler. While parsing individual DIEs, BT would throw the error: Invalid abbreviation code. Closer inspection of the abbreviation code table (readelf --debug-dump=abbrev) and debug info (readelf --debug-dump=info) didn’t show the abbreviation code in question.

This led us to the conclusion that we were traversing the DWARF information incorrectly, which is likely due to reading invalid offsets. Through a variety of methods to pinpoint where parsing went awry, we landed on DW_FORM_ref_addr, specifically our handling of it, as the culprit.

DW_FORM_ref_addr links to other locations in the .debug_info section (this is where DWARF is stored.) The data associated with this attribute stores the byte offset relative to the start of the .debug_info section. Debug information may use this attribute to link to type debug information common to many different variables (i.e., many variables will have a type field with a DW_FORM_ref_addr attribute, the values of which resolve to the same offset).

The interpretation of this attribute changed between DWARF2 and DWARF3:

References that use the attribute form DW_FORM_ref_addr are specified to be four bytes in the DWARF 32-bit format and eight bytes in the DWARF 64-bit format, while DWARF Version 2 specifies that such references have the same size as an address on the target system (see Sections 7.4 and 7.5.4). Source: DWARF3 Standard, Page 4

Previously, we were relying purely on the detected DWARF file format to determine offset sizes. For DWARF3, the 32-bit format has offsets that are 4 bytes in size; in the 64-bit format, offsets are 8 bytes. For DWARF2, these offsets were the same size as a pointer on the target system. Go emits DWARF2 in the 32-bit format.

Thus, our first fix was to read pointer-size bytes for DW_FORM_ref_addr if DWARF2 is detected.

$ readelf --debug-dump=info consul | head -n 7
Contents of the .debug_info section:

  Compilation Unit @ offset 0x0:
   Length:        0x34ac98 (32-bit)
   Version:       2
   Abbrev Offset: 0x0
   Pointer Size:  8

Example DWARF info from a go binary.

Thread detection

On Linux, Go doesn’t use the standard NPTL to create threads and instead directly uses the clone system call. This broke BT, which previously relied on libthread_db to find and iterate over the threads of a process. Luckily, Linux provides an alternate way to accomplish this via the /proc filesystem. By scanning /proc/<pid>/task, BT is able to discover the set of threads in a Go process and continue in the process of unwinding. This work also gave us an alternate thread listing in cases where libthread_db’s internal structures were corrupt.

Type dereferencing

Dereferencing usually involves a pointer/reference. In this case the type of said pointer/reference indicates the type of the variable being pointed to. Dereferencing isn’t as simple with Go, where some complex types contain dynamic arrays which may have debug information with multiple, indirect qualifiers (e.g. typedefs.) In this situation, the type of the element for these arrays is stored as an attribute in the DIE belonging to the complex type. Below is an example of such attributes for a slice type:

 <1><4223d>: Abbrev Number: 19 (DW_TAG_structure_type)
    <4223e>   DW_AT_name        : []int
    <42244>   DW_AT_byte_size   : 24
    <42245>   Unknown AT value: 2900: 23
    <42246>   Unknown AT value: 2902: <0x40586>

The unknown attributes are Go-specific attributes not included in the DWARF specifications; here, 0x2900 contains the Go type (ignoring its tag – i.e. a channel, a slice, etc.), and 0x2902 points to the element type of the underlying array.

This required us to add support for dereferencing arbitrary addresses using declared types rather than the type associated with a pointer variable.

Typename cache

We could stop there if we always had the type attribute available when dereferencing, but this isn’t always the case as is seen when dealing with Go interfaces.

A little background: In Go, variables can have an interface as its type. Such variables are called interface values. Interface values can be thought of as a pair: (interface type, concrete type). In gc, interface values are represented as:

  • A pointer to an entry in the interface table (itable), which contains the associated concrete type and a list of function pointers associated with the concrete type that satisfy the interface type.
  • Interface’s data (aka the concrete type).

Go also allows programmers to dynamically change/convert the interface associated with an interface value. This forces Go to generate the itable at runtime as it isn’t always tenable to precompute the set of (interface type, concrete type) pairs. If you are interested in learning more about the magic behind Go interfaces, we recommend reading Russ Cox’s article on them.

We’re sure this presented a particularly interesting challenge to the Go team when considering what debug information to generate for interface values. For gc, the debug type information for the associated interface value is represented by a plain string (not a DWARF type attribute). Note, aspects of this changed in 1.7 which is discussed in the Go 1.7 section further in this post.

We needed to combine our ability to deference data via type, described in the section above, and new functionality to map from the string representing the type to the DWARF type, if available, to be able to understand interfaces in BT. We called this map the typename cache and built it to leverage the way we parse DWARF data to avoid any additional passes on the debug information. We simply cache type DIEs as they are seen, and look them up whenever we encounter type strings.

Multidimensional arrays

For some Go types, Go emits the DWARF to describe a multidimensional array but does not follow the DWARF specification on multidimensional arrays. As specified by DWARF, a multidimensional array will be represented by an array type DIE with multiple child DW_TAG_subrange_type or DW_TAG_enumeration_type DIEs, each one representing a dimension of the array:

Each array dimension is described by a debugging information entry with either
the tag DW_TAG_subrange_type or the tag DW_TAG_enumeration_type. These entries
are children of the array type entry and are ordered to reflect the appearance
of the dimensions in the source program (i.e. leftmost dimension first, next
to leftmost second, and so on).

Source: DWARF2 Standard, Page 39

Instead of the array type DIE having a child DIE for each dimension, it has one subrange_type DIE for its leftmost dimension and links to an array type DIE of the subsequent dimension through its DW_AT_type attribute. This repeats until the base type of the array is reached. You can conceptualize this as representing a multidimensional array as a linked list of types instead of a type tree.

Thus, we had to add a special case to support this form of multidimensional array representation in BT. Below are some examples of multidimensional arrays in Go and C.

Example Go output

Multidimensional array variable declaration:

func main() {
	multi_d3_v := [3][2][2]int{
		{ {3, 4}, {1, 2} },
		{ {3, 4}, {3, 4} },
		{ {3, 4}, {5, 6} },
	}
}

Variable information:

 <2><7ff>: Abbrev Number: 4 (DW_TAG_variable)
    <800>   DW_AT_name        : multi_d3_v
    <80b>   DW_AT_location    : ...
    <811>   DW_AT_type        : <0x4808d>

Type information:

 <1><48070>: Abbrev Number: 12 (DW_TAG_array_type)
    <48071>   DW_AT_name        : [2]int
    <48078>   DW_AT_type        : <0x40586>
    <48080>   DW_AT_byte_size   : 16
    <48081>   Unknown AT value: 2900: 17
 <2><48082>: Abbrev Number: 9 (DW_TAG_subrange_type)
    <48083>   DW_AT_type        : <0x3ef88>
    <4808b>   DW_AT_count       : 2
 <2><4808c>: Abbrev Number: 0
 <1><4808d>: Abbrev Number: 12 (DW_TAG_array_type)
    <4808e>   DW_AT_name        : [3][2][2]int
    <4809b>   DW_AT_type        : <0x480b0>
    <480a3>   DW_AT_byte_size   : 96
    <480a4>   Unknown AT value: 2900: 17
 <2><480a5>: Abbrev Number: 9 (DW_TAG_subrange_type)
    <480a6>   DW_AT_type        : <0x3ef88>
    <480ae>   DW_AT_count       : 3
 <2><480af>: Abbrev Number: 0
 <1><480b0>: Abbrev Number: 12 (DW_TAG_array_type)
    <480b1>   DW_AT_name        : [2][2]int
    <480bb>   DW_AT_type        : <0x48070>
    <480c3>   DW_AT_byte_size   : 32
    <480c4>   Unknown AT value: 2900: 17
 <2><480c5>: Abbrev Number: 9 (DW_TAG_subrange_type)
    <480c6>   DW_AT_type        : <0x3ef88>
    <480ce>   DW_AT_count       : 2
 <2><480cf>: Abbrev Number: 0

Example C output

Multidimensional array variable declaration:

int
main()
{
	int a[3][2] = {
	    [0] = {0, 1},
	    [1] = {1, 2},
	    [2] = {2, 3}
	};

Variable information:

 <2><b1>: Abbrev Number: 6 (DW_TAG_variable)
    <b2>   DW_AT_name        : a
    <b6>   DW_AT_type        : <0xbe>
    <ba>   DW_AT_location    : ...

Type information:

 <1><be>: Abbrev Number: 7 (DW_TAG_array_type)
    <bf>   DW_AT_type        : <0x57>
 <2><c3>: Abbrev Number: 8 (DW_TAG_subrange_type)
    <c4>   DW_AT_type        : <0x65>
    <c8>   DW_AT_upper_bound : 2
 <2><c9>: Abbrev Number: 8 (DW_TAG_subrange_type)
    <ca>   DW_AT_type        : <0x65>
    <ce>   DW_AT_upper_bound : 1

Goroutines

Goroutines are not typical threads. They aren’t listed in the /proc file system. They don’t follow the common structure associated with pthreads. We could go on, but you get the point. Regardless of this difference, it was important for BT to identify all goroutines (including callstacks, variables) associated with a Go process; otherwise BT snapshots would be incomplete.

Step 1 in this process was to gather the complete list of goroutines stored in the global slice allgs. This slice stores all goroutines including goroutines created by the Go runtime (garbage collection, signal handling) and ones marked “dead.” Other sources, such as the global or per-processor run queues, only provide runnable and ready goroutines.

Once this is done, BT filters out “dead” goroutines and system goroutines (if configured to do so). For each of the remaining goroutines, BT:

  • unwinds the callstack
  • extracts the address range and size of the goroutine’s stack
  • extracts the Go statement that created the goroutine
  • collects the wait reason and wait duration for any goroutine in the “waiting” state.

This is not an exhaustive list, but covers some of the more important per-goroutine processing BT does.

Special-case unwinding

The first item when processing a go routine is to unwind the callstack. Unfortunately, even unwinding wasn’t straight forward. Some goroutines require special unwinding logic under certain conditions; one such condition is related to the garbage collector.

By default, Go’s garbage collector inserts a return PC (referred to as stack barriers) at expotentially-spaced frames. This stack barrier acts as a marker to the GC during stack scanning/mark-termination phase – only the portion of the stack further down from the latest valid stack barrier must be scanned.

Any unwinding must resolve these stack barriers to accurately parse deep stacks. If you happen run into the need to do this yourself, Go’s runtime.Callers(), which internally calls runtime.gentraceback (and is what you’d see by default when there’s an unhandled panic), handles this situation.

The solution here is straightforward. Stack barriers are stored at the base (uppermost region) of a goroutine’s stack, as can be seen in the following excerpt from runtime.stackalloc():

v := sysAlloc(round(uintptr(n), _PageSize), &memstats.stacks_sys)
if v == nil {
	throw("out of memory (stackalloc)")
}
top := uintptr(n) - nstkbar
stkbarSlice := slice{add(v, top), 0, maxstkbar}
return stack{uintptr(v), uintptr(v) + top}, *(*[]stkbar)(unsafe.Pointer(&stkbarSlice))

If the stackBarrierPC is encountered while unwinding, we access the appropriate stack barrier stored at the base of the goroutine stack and restore the original PC. We continue unwinding using the original PC.

There are other situations that require special unwinding, most of which are handled in runtime.gentraceback(). This is also a good time to shout out to the work being done by Austin Clements to remove these stack barriers after mark termination. Austin also has a new proposal to remove stack re-scanning which may remove stack barriers altogether.

Bonus: Pretty printing

At this point, we were able to interpret a Go program from BT but we knew we weren’t done. To truly inspect raw program state in a form that more closely resembled the source code written, we added pretty printers for complex Go types like channels, maps, and slices. These pretty printers took the data associated with these complex types and generated alternate forms that could be easily displayed by our web and terminal UIs. Data structures that allow hetergeneous types in certain members, like keys of a Go map, meant that the pretty printers needed to be composable and context-aware.

Go Channel Pretty Printing

#### Bonus: Scheduler Information

Go’s scheduler has the task of scheduling goroutines across a set of go runtime threads. Internally, it has a number of interesting data points that could prove incredibly valuable when investigating the state of a Go process, like the number of idle threads, whether the garbage collector is waiting, and global and per-processor run queues. BT also captures processor-specific information: the number of scheduler and syscall ticks, the state of the processor, etc. This information provides hints to things like imbalanced workloads, unusual number of syscalls, and unexpected goroutine spawning.

Go 1.7

The release of Go 1.7 changed a couple of things that BT relied on. First, the implementation of interfaces changed, with some fields changing in meaning and name. A more in-depth analysis will have to wait until a follow-up blog post. At a high-level, interfaces with types known at compile time have their type names stored in the global read-only moduledata. Types created at runtime are now stored on the heap and built on demand.

Second, in some cases, the kernel thread ID is not being recorded to the runtime.goroutine structure. BT depended on this ID to associate goroutines with kernel threads. This association was used to mark goroutines as relevant to root-cause investigation if a fault occured. We worked around this by inspecting a goroutine’s signal and capturing the goroutine that invoked our debugger (when using our library).

Outstanding issues

  • Variable location lists: Go’s currently emitted DWARF information does not properly indicate the scope of a variable. If a function has multiple variables of the same name with different scopes, any Go debugger will have multiple values for the same variable (unless it arbitrarily picks one). Variables not currently in scope according to the PC may have garbage values.

  • For Go1.7, gc sometimes generates dwarf entries for local variables that are not declared by the original source program. We do not have a good way to differentiate them from actual local variables in the original source. This issue is still being investigated

Conclusion

Several paragraphs later, Backtrace has a fully functioning debugger for Go. There are still improvements to be made, like capturing internal information from Go’s garbage collector but what we have today gives users the ability to gain deep introspection into their Go applications at the time of error, across their environments.

Our Go debugger is bundled with the enterprise version of Backtrace. If you’re interested in trying it out, sign up for a free trial.