首页
社区
课程
招聘
[翻译]二进制漏洞利用(四)进程内存和内存损坏
发表于: 2019-8-2 17:09 11283

[翻译]二进制漏洞利用(四)进程内存和内存损坏

2019-8-2 17:09
11283

本章是二进制漏洞利用的第四篇,原文地址: https://azeria-labs.com/process-memory-and-memory-corruption/

The prerequisite for this part of the tutorial is a basic understanding of ARM assembly (covered in the first tutorial series “ARM Assembly Basics“). In this chapter you will get an introduction into the memory layout of a process in a 32-bit Linux environment. After that you will learn the fundamentals of Stack and Heap related memory corruptions and how they look like in a debugger.

学习本教程的前提条件是,您已经基本理解了ARM汇编(第一部系列教程ARM汇编基础已经覆盖了这部分内容)。本章将初步向你介绍32位Linux环境中进程的内存布局。接着您将学习和和堆栈有关的内存损坏的原理,以及他们在调试器下的表现形式

The examples used in this tutorial are compiled on an ARMv6 32-bit processor. If you don’t have access to an ARM device, you can create your own lab and emulate a Raspberry Pi distro in a VM by following this tutorial: Emulate Raspberry Pi with QEMU. The debugger used here is GDB with GEF (GDB Enhanced Features). If you aren’t familiar with these tools, you can check out this tutorial: Debugging with GDB and GEF.

本教程所使用的例子是在ARMv6 32位处理器上编译的,如果你手头没有一个ARM设备,你可以创建自己的实验环境或是使用在虚拟机中创建的虚拟树莓派环境,你可以参考这个教程“使用QEMU模拟树莓派”,这里的调试器使用的GEF(GDB增强组件)加持下的GDB调试器,如果你仍不熟悉这些工具,你可以参考这篇教程:使用GDB和GEF调试

Every time we start a program, a memory area for that program is reserved. This area is then split into multiple regions. Those regions are then split into even more regions (segments), but we will stick with the general overview. So, the parts we are interested are:

每次我们启动一个程序时,系统会为该程序开辟一块内存区域。这块内存区域会被分割成多个区域,这些区域又会被分割成更多的自区域(段),但我们任然坚持用总体的视角看待这块区域。所以我们感兴趣的部分是:

1.Program Image

2.Heap

3.Stack

1. 程序镜像

2.  堆区

3.  栈区

In the picture below we can see a general representation of how those parts are laid out within the process memory. The addresses used to specify memory regions are just for the sake of an example, because they will differ from environment to environment, especially when ASLR is used.

下图中你可以看到这些部分是如何在进程内存中布局的,用于划定内存区域的内存地址只是一个示例,因为他们会随着环境的不同不同,尤其是正在使用ASLR时

Program Image region basically holds the program’s executable file which got loaded into the memory. This memory region can be split into various segments: .plt, .text, .got, .data, .bss and so on. These are the most relevant. For example, .text contains the executable part of the program with all the Assembly instructions, .data and .bss holds the variables or pointers to variables used in the application, .plt and .got stores specific pointers to various imported functions from, for example, shared libraries. From a security standpoint, if an attacker could affect the integrity (rewrite) of the .text section, he could execute arbitrary code. Similarly, corruption of Procedure Linkage Table (.plt) and Global Offsets Table (.got) could under specific circumstances lead to execution of arbitrary code.

程序镜像区主要保存被加载到内存中的程序的可执行文件。这些区域被划分成以下几个子区域:

.plt, .text, .got, .data, .bss以及其他段,这些是最相关的。比如.text段保存了包含了汇编指令的程序的可执行部分,.data和.bss段保存了变量或程序中被使用到的变量的指针,.plt和.got段存储了例如从共享库中导入的函数的函数指针,从安全的角度看,如果攻击者能影响到文件的完整性(重写).text段,我们就可以执行任意代码。同样的,进程连接表(.plt)和全局偏移表(.got)的损坏也能导致在特定环境下执行任意代码

The Stack and Heap regions are used by the application to store and operate on temporary data (variables) that are used during the execution of the program. These regions are commonly exploited by attackers, because data in the Stack and Heap regions can often be modified by the user’s input, which, if not handled properly, can cause a memory corruption. We will look into such cases later in this chapter.

应用程序使用栈和堆来存储和操作程序中使用的临时数据(变量)。这些区域经常被攻击者利用,因为堆栈中的数据经常会被用户的输入修改,因此,如果不能妥善处理,可能导致内存的崩溃。我们将在本章的后面部分研究这些情况。

In addition to the mapping of the memory, we need to be aware of the attributes associated with different memory regions. A memory region can have one or a combination of the following attributes: Read, Write, Execute. The Read attribute allows the program to read data from a specific region. Similarly, Write allows the program to write data into a specific memory region, and Execute – execute instructions in that memory region. We can see the process memory regions in GEF (a highly recommended extension for GDB) as shown below:

除了内存映射外,我们需要了解不同内存区域之间的关联属性。一块内存区域可以具有以下一个或多个属性:可读,可写,可执行。可读属性允许程序从特定的区域读数据,同样的,可写属性云溪程序写入特定的内存区域,而可执行属性-可以执行一块内存区域中的指令。我们可以在GEF(强烈推荐的gdb扩展)中看到进程内存区域,如下所示:

the Heap section in the vmmap command output appears only after some Heap related function was used. In this case we see the malloc function being used to create a buffer in the Heap region. So if you want to try this out, you would need to debug a program that makes a malloc call (you can find some examples in this page, scroll down or use find function).

Additionally, in Linux we can inspect the process’ memory layout by accessing a process-specific “file”:

只有使用了一些堆操作函数后才能使用vmmap命令输出的堆的部分。你会出现。本例中,我们看到malloc函数被用于在堆中创建缓冲区,如果你想实验一下,你需要调试这个带有malloc调用程序(你可以在这页中找到一些例子,你可以通过下拉或使用查找命令找到),另外在linux下,我们可以通过访问特定进程中的某些“文件”来达到监视内存布局的目的。

Most programs are compiled in a way that they use shared libraries. Those libraries are not part of the program image (even though it is possible to include them via static linking) and therefore have to be referenced (included) dynamically. As a result, we see the libraries (libc, ld, etc.) being loaded in the memory layout of a process. Roughly speaking, the shared libraries are loaded somewhere in the memory (outside of process’ control) and our program just creates virtual “links” to that memory region. This way we save memory without the need to load the same library in every instance of a program.

大部分程序是通过共享库的方式编译的。这些库并不是程序镜像的一部分(即使可以通过静态链接的方式将他们包括在内),因此这些共享库必须被动态的引用。因此,我们看到正在被加载到进程内存布局的共享库(如libc,ld)大致来说,共享库被加载到内存的中某一处(超出进程所能控制的地方),我们的程序只是创建了一个虚拟的“链接”连接到那块内存区域。通过这种方式,我们不需要在每个程序单例中都加载相同的共享库,从而节约了内存空间。

A memory corruption is a software bug type that allows to modify the memory in a way that was not intended by the programmer. In most cases, this condition can be exploited to execute arbitrary code, disable security mechanisms, etc. This is done by crafting and injecting a payload which alters certain memory sections of a running program. The following list contains the most common memory corruption types/vulnerabilities:

内存崩溃是一种软件的bug类型,他可以允许你修改内存,而该修改显然不是程序员希望的。大多数情况下,您可以利用此条件来执行任意代码、禁用安全机制等。这是通过创建进和注入有效载荷(payload)来实现的,这些载荷改变了程序运行时的特定内存区域。下表不包含了常见的内存损坏类型/漏洞

Buffer Overflows

Stack Overflow

Heap Overflow

Dangling Pointer (Use-after-free)

Format String

1.       缓冲区溢出

  a)         栈溢出

  b)         堆溢出

2.       指针悬挂(释放后再次利用UAF)

3.       格式化字符串

In this chapter we will try to get familiar with the basics of Buffer Overflow memory corruption vulnerabilities (the remaining ones will be covered in the next chapter). In the examples we are about to cover we will see that the main cause of memory corruption vulnerabilities is an improper user input validation, sometimes combined with a logical flaw. For a program, the input (or a malicious payload) might come in a form of a username, file to be opened, network packet, etc. and can often be influenced by the user. If a programmer did not put safety measures for potentially harmful user input it is often the case that the target program will be subject to some kind of memory related issue.

本章我们会试着让读者熟悉缓冲区内存崩溃漏洞的基础知识(剩余部分在下章介绍)。在我们将介绍的几个例子中我们将看到,内存崩溃是由不合适的用户输入,有时还伴随着逻辑缺陷所导致的。一个程序的输入(或恶意的负载)可能以用户名称,被打开的文件,网络包等形式出现,并经常受用户的影响。如果程序员不对潜在的用户输入采取安全措施,那么目标程序将受这类内存问题的影响。

Buffer overflows are one of the most widespread memory corruption classes and are usually caused by a programming mistake which allows the user to supply more data than there is available for the destination variable (buffer). This happens, for example, when vulnerable functions, such as gets, strcpy, memcpy or others are used along with data supplied by the user. These functions do not check the length of the user’s data which can result into writing past (overflowing) the allocated buffer. To get a better understanding, we will look into basics of Stack and Heap based buffer overflows.

缓冲区溢出是是传播最广泛的内存损坏类型之一,经常由于允许用户输入比目标变量(buffer)可使用的更多的实际数据所导致。比如当有缺陷的函数(如gets,strcpy,memcpy)和用户提供的数据一起使用时就会发生。这些函数并不检查用户输入的长度,这可能导致过度写入(溢出)已经分配好的内存,为了更好理解,我们看下堆栈溢出的基本概念。

Stack overflow, as the name suggests, is a memory corruption affecting the Stack. While in most cases arbitrary corruption of the Stack would most likely result in a program’s crash, a carefully crafted Stack buffer overflow can lead to arbitrary code execution. The following picture shows an abstract overview of how the Stack can get corrupted.

顾名思义,栈溢出是影响栈区的内存崩溃类型。虽然在大多数情况下,堆栈的任意损坏很可能导致程序崩溃,但精心设计的堆栈缓冲区溢出可能导致任意代码执行。下图是栈损坏的示意图

As you can see in the picture above, the Stack frame (a small part of the whole Stack dedicated for a specific function) can have various components: user data, previous Frame Pointer, previous Link Register, etc. In case the user provides too much of data for a controlled variable, the FP and LR fields might get overwritten. This breaks the execution of the program, because the user corrupts the address where the application will return/jump after the current function is finished.

如上图所示,栈帧(整个栈中用于特定功能的一小部分栈空间)可包含不同的部分,如用户数据,之前的栈帧指针,之前的连接寄存器等。本例中用户提供了过量的数据,导致FP和LR可能被覆盖重写。这回打断程序的执行,因为用户破坏了函数执行完成后需要返回/跳转到的地址

To check how it looks like in practice we can use this example:

用下面的例子检验

Our sample program uses the variable “buffer”, with the length of 8 characters, and a function “gets” for user’s input, which simply sets the value of the variable “buffer” to whatever input the user provides. The disassembled code of this program looks like the following:

我们的示例程序使用变量buffer:长度是8字节,用函数get获取用户输入,并简单将变量buffer设置成任意的用户输入。该程序的反汇编代码如下所示:

Here we suspect that a memory corruption could happen right after the function “gets” is completed. To investigate this, we place a break-point right after the branch instruction that calls the “gets” function – in our case, at address 0x0001043c. To reduce the noise we configure GEF’s layout to show us only the code and the Stack (see the command in the picture below). Once the break-point is set, we proceed with the program and provide 7 A’s as the user’s input (we use 7 A’s, because a null-byte will be automatically appended by function “gets”).

我们怀疑gets函数执行完成乎可能会导致内存损坏。为了研究这个问题,在gets函数的分支指令后下一个断点。本例中是在地址 0x0001043c处下断。为了减少干扰我们将GEF的布局设置为只显示代码和堆栈。当断点下好后,我们继续运行程序并输入7个A作为用户输入(我们使用7A,因为一个空字节将被函数“get”自动附加)

When we investigate the Stack of our example we see (image above) that the Stack frame is not corrupted. This is because the input supplied by the user fits in the expected 8 byte buffer and the previous FP and LR values within the Stack frame are not corrupted. Now let’s provide 16 A’s and see what happens.

观察示例代码的栈(图像下面),可知栈帧没有崩溃。因为用户提供的输入符合8字节预期,且之前的FP和LR的值并没有损坏,现在我们输入16个A看会发生什么

In the second example we see (image above) that when we provide too much of data for the function “gets”, it does not stop at the boundaries of the target buffer and keeps writing “down the Stack”. This causes our previous FP and LR values to be corrupted. When we continue running the program, the program crashes (causes a “Segmentation fault”), because during the epilogue of the current function the previous values of FP and LR are “poped” off the Stack into R11 and PC registers forcing the program to jump to address 0x41414140 (last byte gets automatically converted to 0x40 because of the switch to Thumb mode), which in this case is an illegal address. The picture below shows us the values of the registers (take a look at $pc) at the time of the crash.

第二个例子中,可见(上图)当我们使用gets函数提供 了过多的数据时,它并没有停止在目标缓冲区的边界处而是一直向栈下方写了下去,从而导致之前的FP值和LR值被损坏。当我们继续运行程序时会产生崩溃(导致“段错误”),因为当执行到当前函数的尾声阶段时,之前的FP和LR的值会被弹出栈分别进入R11和PC寄存器,从而强制让程序跳转到地址0x41414140(最后一个字节由于切换到Thumb模式,会自动转换成0x40),在本例中显然这是一个不合法的地址。这张提下面展示了崩溃时寄存器的值(看下PC)。

First of all, Heap is a more complicated memory location, mainly because of the way it is managed. To keep things simple, we stick with the fact that every object placed in the Heap memory section is “packed” into a “chunk” having two parts: header and user data (which sometimes the user controls fully). In the Heap’s case, the memory corruption happens when the user is able to write more data than is expected. In that case, the corruption might happen within the chunk’s boundaries (intra-chunk Heap overflow), or across the boundaries of two (or more) chunks (inter-chunk Heap overflow). To put things in perspective, let’s take a look at the following illustration.

首先,堆一块更复杂的内存区域,主要复杂在他的管理方式。为了简单起见,我们确信一个事实是:放在堆区的每个对象都被“打包”成一个“块”,这个块包含两部分:头部和用户数据(有时用户完全控制)。在堆中内存崩溃发生在用户写入超过预期的数据量时。本例中崩溃可能发生在块的边界以内(块内堆溢出)或者超出两个(或者更多)个块的边界(块间堆溢出)。为了更好的认识事物我们看一下下面的插图

As shown in the illustration above, the intra-chunk heap overflow happens when the user has the ability to supply more data to u_data_1 and cross the boundary between u_data_1 and u_data_2. In this way the fields/properties of the current object get corrupted. If the user supplies more data than the current Heap chunk can accommodate, then the overflow becomes inter-chunk and results into a corruption of the adjacent chunk(s).

如上图所示,当用户与能力向u_data_1 提供更多数据并跨越到u_data_1 和 u_data_2之间时,发生块间堆溢出。这样当前对象的字段/属性会损坏。如果用户提供的数据超过当前堆块能容纳的,那么溢出将变成块间溢出并且导致相邻块的损坏。

To illustrate how an intra-chunk Heap overflow looks like in practice we can use the following example and compile it with “-O” (optimization flag) to have a smaller (binary) program (easier to look through).

为了说明块内堆溢出的实际情况,我们使用下面的代码示例并将它使用-o参数(优化标志位)编译成一个更小一些(二进制)的程序(因为观察起来更容易)

The program above does the following:

Defines a data structure (u_data) with two fields

Creates an object (in the Heap memory region) of type u_data

Assigns a static value to the number’s field of the object

Prompts user to supply a value for the name’s field of the object

Prints a string depending on the value of the number’s field

上面的程序做了以下事情:

1.       定义了一个具有两个字段的结构体u_data

2.       在堆中创建了一个u_data类型的对象

3.       给对象的数字字段分配一个静态值

4.       提示用户给对象的名称字段提供一个值

5.       根据数字字段的的值打印字符串

So in this case we also suspect that the corruption might happen after the function “gets”. We disassemble the target program’s main function to get the address for a break-point.

因此本例中我们怀疑崩溃发生在get函数之后。我们将目标程序反汇编来获取断点的地址

In this case we set the break-point at address 0x00010498 – right after the function “gets” is completed. We configure GEF to show us the code only. We then run the program and provide 7 A’s as a user input.

本例中我们将断点设在地址0x00010498处-即gets函数完成后,我们将GEF设置成只显示代码,接着运行程序,输入7个A作为用户输入

Once the break-point is hit, we quickly lookup the memory layout of our program to find where our Heap is. We use vmmap command and see that our Heap starts at the address 0x00021000. Given the fact that our object (objA) is the first and the only one created by the program, we start analyzing the Heap right from the beginning.

一旦到达断点,我们快速查看程序的内存布局,找到堆在哪里。我们使用vmmap命令看到堆从地址0x00021000开始。由于我们的对象(objA)是第一个也是为一个一个程序创建的对象,我们从堆的开头开始分析堆。

The picture above shows us a detailed break down of the Heap’s chunk associated with our object. The chunk has a header (8 bytes) and the user’s data section (12 bytes) storing our object. We see that the name field properly stores the supplied string of 7 A’s, terminated by a null-byte. The number field, stores 0x4d2 (1234 in decimal). So far so good. Let’s repeat these steps, but in this case enter 8 A’s.


[招生]科锐逆向工程师培训(2024年11月15日实地,远程教学同时开班, 第51期)

最后于 2019-8-4 14:11 被r0Cat编辑 ,原因:
收藏
免费 2
支持
分享
打赏 + 5.00雪花
打赏次数 1 雪花 + 5.00
 
赞赏  junkboy   +5.00 2019/08/02 感谢分享~
最新回复 (1)
雪    币: 35775
活跃值: (7155)
能力值: ( LV3,RANK:20 )
在线值:
发帖
回帖
粉丝
2
原来翻译下就能被打赏啊,懂了。
2019-10-2 17:00
0
游客
登录 | 注册 方可回帖
返回
//