-
-
[翻译]"Unstripping" binaries:在 GDB 中使用 Pwndbg 恢复调试信息
-
发表于: 2024-9-9 09:59 1589
-
By Jason An
GDB loses significant functionality when debugging binaries that lack debugging symbols (also known as “stripped binaries”). Function and variable names become meaningless addresses; setting breakpoints requires tracking down relevant function addresses from an external source; and printing out structured values involves staring at a memory dump trying to manually discern field boundaries.
当调试没有调试符号的二进制文件(也称为 “stripped binaries”)时,GDB 会失去重要的功能。函数和变量名变成了无意义的地址;设置断点需要从外部来源查找相关函数地址;打印出结构化值则涉及到盯着内存转储,试图手动辨别字段边界。
That’s why this summer at Trail of Bits, I extended Pwndbg—a plugin for GDB maintained by my mentor, Dominik Czarnota—with two new features to bring the stripped debugging experience closer to what you’d expect from a debugger in an IDE. Pwndbg now integrates Binary Ninja for enhanced GDB+Pwndbg intelligence and enables dumping Go structures for improved Go binary debugging.
这就是为什么今年夏天在 Trail of Bits,我扩展了Pwndbg——一个由我的导师Dominik Czarnota维护的 GDB 插件——增加了两个新特性,以使stripped调试体验更接近你在 IDE 中的调试器所期望的。Pwndbg 现在集成了 Binary Ninja,以增强 GDB+Pwndbg 的智能,并支持转储 Go 结构,以改善 Go 二进制文件的调试。
Binary Ninja integration
Binary Ninja集成
To help improve GDB+Pwndbg intelligence during debugging, I integrated Pwndbg with Binary Ninja, a popular decompiler with a versatile scripting API, by installing an XML-RPC server inside Binary Ninja, and then querying it from Pwndbg. This allows Pwndbg to access Binary Ninja’s analysis database, which is used for syncing symbols, function signatures, stack variable offsets, and more, recovering much of the debugging experience.
为了在调试过程中帮助提高 GDB+Pwndbg 的智能性,我通过在Binary Ninja内部安装一个 XML-RPC 服务器,并将 Pwndbg 与之集成,从而实现了这一点。Binary Ninja 是一个流行的反编译器,拥有多功能的脚本 API。这样,Pwndbg 就可以访问 Binary Ninja 的分析数据库,该数据库用于同步符号、函数签名、栈变量偏移等,从而恢复了大部分调试体验。
For the decompilation, I pulled the tokens from Binary Ninja instead of serializing them to text first. This allows for fully syntax-highlighted decompilation, configurable to use any of Binary Ninja’s 3 IL levels. The decompilation is shown directly in the Pwndbg context, with the current line highlighted, just like in the assembly view.
对于反编译,我直接从 Binary Ninja 中提取了令牌,而不是先将它们序列化为文本。这样可以完全实现语法高亮的反编译,并且可以配置为使用Binary Ninja 的 3 个 IL 级别中的任何一个。反编译直接在 Pwndbg 上下文中显示,当前行会高亮显示,就像在汇编视图中一样。
I also implemented a feature to display the current program counter (PC) register as an arrow inside Binary Ninja and a feature to set breakpoints from within Binary Ninja to reduce the amount of switching to and from Pwndbg involved.
我也在 Binary Ninja 中实现了一个功能,可以显示当前程序计数器(PC)寄存器作为箭头,以及一个从 Binary Ninja 内部设置断点的功能,以减少在 Pwndbg 之间来回切换的次数。
The most involved component of the integration is syncing stack variable names. Anywhere a stack address appears in Pwndbg, like in the register view, stack view, or function argument previews, the integration will check if it’s a named stack variable in Binary Ninja. If it is, it will show the proper label. It will even check parent stack frames so that variables from the caller will still be labeled properly.
集成中最复杂的部分是同步堆栈变量名。在 Pwndbg 中,无论在寄存器视图、堆栈视图还是函数参数预览中出现堆栈地址,集成都会检查它是否是 Binary Ninja 中的命名堆栈变量。如果是,它将显示正确的标签。它甚至会检查父堆栈帧,以便调用者的变量仍然正确标记。
The main difficulty in implementing this feature came from the fact that Binary Ninja only provides stack variables as an offset from the stack frame base, so the frame base needs to be deduced in order to compute absolute addresses. Most architectures, like x86, have a frame pointer register that points to the frame base, but most architectures, including x86, don’t actually need the frame pointer, so compilers are free to use it like any other register.
实现这一特性的主要困难来自于 Binary Ninja 仅提供相对于栈帧基址的栈变量偏移量,因此需要推断出栈帧基址以计算绝对地址。大多数架构,如 x86,都有一个指向栈帧基址的帧指针寄存器,但包括 x86 在内的大多数架构实际上并不需要帧指针,因此编译器可以像使用任何其他寄存器一样使用它。
Fortunately, Binary Ninja has constant value propagation, so it can tell if registers are a predictable offset from the frame base. So, my implementation will first check if the frame pointer is actually the frame base, and if it’s not, it will see if the stack pointer advanced a predictable amount (which is usually true with modern compilers); otherwise, it will check every other general-purpose register to try to find one with a consistent offset. Technically, this approach won’t work all the time, but in practice, it should almost never fail.
幸运的是,Binary Ninja 具有常量值传播功能,因此它可以判断寄存器是否与帧基有可预测的偏移量。因此,我的实现首先会检查帧指针是否实际上是帧基,如果不是,它会查看堆栈指针是否前进了可预测的数量(这在现代编译器中通常是正确的);否则,它将检查每个其他通用寄存器,试图找到一个具有一致偏移量的寄存器。从技术上讲,这种方法并不总是有效,但在实践中,它几乎永远不会失败。
Go debugging
调试 Go 程序
A common pain point when debugging executables compiled from non-C programming languages (and sometimes even C) is that they tend to have complex memory layouts that make it hard to dump values. A benign example is dumping a slice in Go, which requires one command to dump the pointer and length, and another to examine the slice contents. Dumping a map, on the other hand, can require over ten commands for a small map, and hundreds for larger ones, which is completely impractical for a human.
调试从非 C 编程语言(有时甚至是 C 语言)编译的可执行文件时常见的痛点是它们往往具有复杂的内存布局,这使得转储值变得困难。一个好例子是在 Go 中转储切片,这需要一个命令来转储指针和长度,另一个命令来检查切片内容。另一方面,转储一个映射可能需要超过十个命令用于一个小映射,对于更大的映射则需要数百个命令,这对于人类来说完全不切实际。
That’s why I created the go-dump
command. Using the Go compiler’s source code as a reference, I implemented dumping for all of Go’s built-in types, including integers, strings, complex numbers, pointers, slices, arrays, and maps. The built-in types are notated just like they are in Go, so you don’t need to learn any new syntax to use the command properly.
这就是我创建go-dump
命令的原因。以Go 编译器的源代码作为参考,我实现了对 Go 所有内置类型的转储,包括整数、字符串、复数、指针、切片、数组和映射。内置类型就像在 Go 中一样被标记,所以你不需要学习任何新语法就能正确使用这个命令。
The go-dump
command is also capable of parsing and dumping arbitrarily nested types so that every type can be dumped with just one command.go-dump
命令也能够解析和转储任意嵌套的类型,以便每种类型都可以用一个命令转储。
Parsing Go’s runtime types
解析 Go 的运行时类型
While Go-specific dumping is much nicer than manual memory dumping, it still poses many usability concerns. You need to know the full type of the value you’re dumping, which can be hard to determine and usually involves a lot of guesswork, especially when dealing with structs that have many fields or nested structs. Even if you have deduced the full type, some things are still unknowable because they have no effect on compilation, like struct field names and type names for user-defined types.
尽管特定于 Go 的转储比手动内存转储要好得多,但它仍然存在许多可用性问题。您需要知道您正在转储的值的完整类型,这可能很难确定,通常涉及大量的猜测,尤其是在处理具有许多字段或嵌套结构的结构时。即使您已经推断出完整的类型,有些事情仍然无法知晓,因为它们对编译没有影响,比如结构字段名称和用户定义类型的类型名称。
Conveniently, the Go compiler emits a runtime type object for every type used in the program (to be used with the reflect
package), which contains struct layouts for arbitrarily nested structs, type names, size, alignment, and more. These type objects can also be matched up to values of that type, as interface values store a pointer to the type object along with a pointer to the data, and heap-allocated values have their type object passed into their allocation function (usually runtime.newobject
).
方便的是,Go 编译器为程序中使用的每个类型生成一个运行时类型对象(用于与 reflect
包一起使用),其中包含任意嵌套的结构体布局、类型名称、大小、对齐等。这些类型对象还可以与该类型的值相匹配,因为接口值存储指向类型对象的指针以及指向数据的指针,并且堆分配的值会将它们的类型对象传递给它们的分配函数(通常是 runtime.newobject
)。
I wrote a parser capable of recursively extracting this information in order to process type information for arbitrarily nested types. This parser is exposed via the go-type
command, which displays information about a runtime type given its address. For structs, this information includes the type, name, and offset of every field.
我编写了一个解析器,能够递归地提取这些信息,以便处理任意嵌套类型的类型信息。这个解析器通过go-type
命令暴露出来,它显示了给定其地址的运行时类型的信息。对于结构体,这些信息包括每个字段的类型、名称和偏移量。
This can be used to dump values in two ways. The first, easier way only works for interface values, since the type pointer is stored along with the data pointer, making it easy to automatically retrieve. These can be dumped using Go’s any type for empty interfaces (ones with no methods), and the interface
type for non-empty interfaces. When dumping, the command will automatically retrieve and parse the type, leading to a seamless dump without having to enter any type information.
这可以通过两种方式来转储值。第一种,更简单的方法只适用于接口值,因为类型指针与数据指针一起存储,这使得自动检索变得容易。这些可以使用 Go 的 any 类型为空接口(没有方法的接口)转储,以及interface
类型为非空接口。在转储时,命令将自动检索并解析类型,从而无需输入任何类型信息即可实现无缝转储。
The second way works for all values but requires you to find and specify the pointer to the type for the value. In many cases, it is as easy as looking for the pointer passed into the function that allocated the value, but for global variables or variables whose allocation may be hard to find, some guesswork may be involved in finding the type. However, this method is generally still easier than trying to manually deduce the type layout and is capable of dumping even the most complex types. I tested it on a few large struct types in a stripped build of the Go compiler, which is one of the largest and most complex open-source Go codebases, and it was able to dump all of them with no problem.
第二种方法适用于所有值,但需要您找到并指定值的类型的指针。在许多情况下,这就像寻找传递给分配值的函数的指针一样简单,但对于全局变量或其分配可能难以找到的变量,可能需要一些猜测来找到类型。然而,这种方法通常仍然比尝试手动推断类型布局更容易,并且能够转储即使是最复杂的类型。我在 Go 编译器的一个stripped构建中测试了它,该构建是最大和最复杂的开源 Go 代码库之一,它能够毫无问题地转储所有这些类型。
Recap and looking forward
回顾与展望
This summer, I enhanced Pwndbg so it can be integrated with Binary Ninja to access its rich debugging information. I also added the go-dump
command for dumping Go values. All of this is available on the Pwndbg dev branch and its latest release.
今年夏天,我增强了 Pwndbg,使其可以与 Binary Ninja 集成,以访问其丰富的调试信息。我还添加了go-dump
命令,用于转储 Go 值。所有这些都可以在使用 Pwndbg 开发分支及其最新版本中找到。
Moving forward, there’s even more that can be done to improve the debugging experience. I developed my Binary Ninja integration with a modular design so that it would be easy to add support for more decompilers in the future. I think it would be amazing to fully support Ghidra (the current integration only syncs decompilation), as Ghidra is a free and open-source decompiler, making it accessible to everyone who wants to use the functionality.
展望未来,还有更多可以做的事情来改善调试体验。我开发了我的 Binary Ninja 集成,采用了模块化设计,以便将来容易添加对更多反编译器的支持。我认为完全支持Ghidra将是惊人的(当前的集成仅同步反编译),因为 Ghidra 是一个免费且开源的反编译器,这使得每个想要使用该功能的人都可以使用。
In terms of Go debugging, work can be done to add better support for displaying and working with goroutines, which is currently one of the major advantages of the Delve debugger (a debugger specialized for debugging Go) over GDB/Pwndbg. For example, Delve is capable of listing every goroutine and the instruction that created them and it also has a command to switch between goroutines.
在 Go 调试方面,可以进行工作以增加对显示和处理 goroutines 的更好支持,这目前是Delve 调试器(专门用于调试 Go 的调试器)相对于 GDB/Pwndbg 的一个主要优势。例如,Delve 能够列出每个 goroutine 以及创建它们的指令,它还有一个命令可以在 goroutines 之间切换。
Acknowledgments
致谢
Working at Trail of Bits this summer has been an absolutely amazing experience, and I would like to thank them for giving me the opportunity to work on Pwndbg. In particular, I would like to thank my manager, Dominik Czarnota, for being incredibly responsive about reviewing my code and giving me feedback and ideas about my work, and the Pwndbg community, as they have been incredibly helpful with answering any questions I had during the development process.
今年夏天在 Trail of Bits 工作是一次绝对惊人的经历,我要感谢他们给我机会在 Pwndbg 上工作。特别感谢我的导师 Dominik Czarnota,他非常积极地审查我的代码,给我反馈和关于我工作的点子,还有 Pwndbg 社区,他们在开发过程中回答了我所有的问题,非常有帮助。
评论链接:https://news.ycombinator.com/item?id=41481682
@bieganski:
sounds like an interesting direction, but I don't understand why should we have it coupled to specific tool (pwndbg)? Why not implement a BinaryNinja plugin to dump all user-defined names (function names, stack variables), together with an original (stripped) binary to the new ELF/.exe file, with symbol table and presumably with DWARF section?
听起来是一个有趣的方向,但我不明白为什么我们要将其与特定工具(pwndbg)耦合?为什么不实现一个 BinaryNinja 插件来转储所有用户定义的名称(函数名称、堆栈变量),连同原始(stripped)的二进制文件一起到新的 ELF/.exe 文件中,带有符号表,并且可能带有 DWARF 节?
@boricj:
I've developed a Ghidra extension that exports object files. I've considered generating debugging symbols in order to improve the debugging experience when reusing these object files in new programs, but I keep postponing that feature for various reasons.
我开发了一个 Ghidra 扩展,它可以导出对象文件。我考虑过生成调试符号,以改善在新程序中重用这些对象文件时的调试体验,但由于各种原因,我一直推迟这个功能。
Executable formats have at least one and often multiple debugging data formats which are very different from each other: ELF has STABS and DWARF version 1 to 5, MSVC has at least COFF symbols and PDB (which isn't documented)... Even discarding the old or obsolete stuff, there's no universal solution here. gdb+pwndbg seems to side-step this issue by integrating the debugger with Binary Ninja.
可执行格式至少有一种,通常有多种调试数据格式,它们之间差异很大:ELF 有 STABS 和 DWARF 版本 1 到 5,MSVC 至少有 COFF 符号和 PDB(没有文档记录)...即使丢弃旧的或过时的东西,这里也没有通用的解决方案。gdb+pwndbg 似乎通过将调试器与 Binary Ninja 集成来绕过这个问题。
Projecting reverse-engineered information into a debugging data format would also be a technical challenge once you go past global variables and type definitions. Debuggers already have a terrible user experience when stepping through functions in an optimized executable ; I doubt that reverse-engineered debugging data would be any better.
将逆向工程信息投射到调试数据格式中,一旦超出全局变量和类型定义,也将是一个技术挑战。调试器在优化的可执行文件中逐步执行函数时已经有一个糟糕的用户体验;我怀疑逆向工程的调试数据会好到哪里去。
Toolchains also don't do a lot of validation or diagnostics on their inputs and I can tell from experience that writing correct object files from scratch is already quite tricky. I expect that serializing correct and meaningful debugging data would be much harder than that.
工具链对输入的验证或诊断也不多,我的经验告诉我,从头开始编写正确的对象文件已经很棘手了。我认为序列化正确且有意义的调试数据会比那更难。
Doing this at the native executable level has the obvious advantage of working out of the box with standard tooling, but it would be a lot of work. I've already taken 2 1/2 years to make an object file exporter that's good enough for my needs and I'm still balking at generating DWARF debugging data every time I've considered it. I'm resigned to a terrible debugging experience and so far I've managed to muddle through it.
在本地可执行文件级别进行此操作具有与标准工具开箱即用的优势,但这将是一项大量的工作。我已经花了 2 年半的时间来制作一个对象文件导出器,以满足我的需求,而我仍然在考虑每次生成 DWARF 调试数据时犹豫不决。我已准备好接受糟糕的调试体验,到目前为止,我设法勉强应付过去。
[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课