-
-
[翻译]理解池污染三部曲
-
发表于: 2018-3-20 20:28 6820
-
msdn上关于池污染的官方说明。
原文如下:
很早的文章了(13年底),翻译这个主要是因为fuzzySecurity的Windows exp开发系列教程中内核篇的池溢出一文中有提到,同步翻译此文作为参考。
详细见:Windows exploit开发系列教程第十六部分:内核利用程序之池溢出
Before we can discuss pool corruption we must understand what pool is. Pool is kernel mode memory used as a storage space for drivers. Pool is organized in a similar way to how you might use a notepad when taking notes from a lecture or a book. Some notes may be 1 line, others may be many lines. Many different notes are on the same page.
在我们讨论池污染之前必须先理解什么是池。池是内核模式下为驱动程序所需的存储空间而准备的内存。池的组织结构和我们日常使用记事本的方式很像,比如你从一本书或文献中摘取一段内容。某段内容可能只有一行,某段则可能有多行。多段不同的内容可以在同一页上。
Memory is also organized into pages, typically a page of memory is 4KB. The Windows memory manager breaks up this 4KB page into smaller blocks. One block may be as small as 8 bytes or possibly much larger. Each of these blocks exists side by side with other blocks.
内存也被组织成页,典型的页内存大小为4KB。Windows内存管理器把4K的页拆分成较小的块。每个块的大小至少是8字节,也可能更大一些。这些块彼此相互毗邻。
The !pool command can be used to see the pool blocks stored in a page.
!pool
命令可以列出单个页上存储的池块。
Because many pool allocations are stored in the same page, it is critical that every driver only use the space they have allocated. If DriverA uses more space than it allocated they will write into the next driver’s space (DriverB) and corrupt DriverB’s data. This overwrite into the next driver’s space is called a buffer overflow. Later either the memory manager or DriverB will attempt to use this corrupted memory and will encounter unexpected information. This unexpected information typically results in a blue screen.
因为大多数分配的池块都在同一页上,所以每个驱动程序只能使用自己分配的空间。如果DriverA使用了超出自身分配的内存空间,就有可能把数据写入到下一个驱动程序空间(DriverB)并污染DriverB的数据。这种写入到下一个驱动空间的行为被称为池溢出。此后内存管理器或DriverB可能会尝试使用这块被污染的内存并遭遇未曾料到的问题。该问题一个典型的表现就是引起蓝屏。
The NotMyFault application from Sysinternals has an option to force a buffer overflow. This can be used to demonstrate pool corruption. Choosing the “Buffer overflow” option and clicking “Crash” will cause a buffer overflow in pool. The system may not immediately blue screen after clicking the Crash button. The system will remain stable until something attempts to use the corrupted memory. Using the system will often eventually result in a blue screen.
Sysinternals工具集中的NotMyFault程序有一个强制触发缓冲区溢出的选项。它可以用来演示如何触发池污染。选择"Buffer overflow"选项并单击Crash按钮来引起池中的缓冲区溢出。按下Crash按钮后,系统不会立即蓝屏。系统会稳定运行一段时间,直到有什么尝试去使用这块被污染的内存。最终的结果一般是触发蓝屏。
Often pool corruption appears as a stop 0x19 BAD_POOL_HEADER or stop 0xC2 BAD_POOL_CALLER. These stop codes make it easy to determine that pool corruption is involved in the crash. However, the results of accessing unexpected memory can vary widely, as a result pool corruption can result in many different types of bugchecks.
通常池污染表现为停止码0x19 BAD_POOL_HEADER或0xC2 BAD_POOL_CALLER。这些停止码简化了本次崩溃是由于池污染这一原因的判断。然而,可观察的非期望内存一般每次都不同,毕竟池污染可以引起多种不同类型的bugcheck。
As with any blue screen dump analysis the best place to start is with !analyze -v. This command will display the stop code and parameters, and do some basic interpretation of the crash.
分析任何的蓝屏转储,一开始最有效的方法就是执行!analyze -v
命令。该命令会展示停止码以及参数,并做一些基本的崩溃解读。
In my example the bugcheck was a stop 0x3B SYSTEM_SERVICE_EXCEPTION. The first parameter of this stop code is c0000005, which is a status code for an access violation. An access violation is an attempt to access invalid memory (this error is not related to permissions). Status codes can be looked up in the WDK header ntstatus.h.
在我的例子中,bugcheck停止码是0x3B SYSTEM_SERVICE_EXCEPTION。停止码的第一个参数是c0000005,这是一个访问违例的状态码。访问违例是指尝试访问不合法的内存(该错误与权限许可相关)。状态码可以在WDK的头文件ntstatus.h中查到。
The !analyze -v command also provides a helpful shortcut to get into the context of the failure.
!analyze -v
命令也提供了获取失败上下文的有效捷径。
CONTEXT: fffff88004763560 -- (.cxr 0xfffff88004763560;r)
Running this command shows us the registers at the time of the crash.
运行该命令,显示崩溃时的寄存器。
From the above output we can see that the crash occurred in ExAllocatePoolWithTag, which is a good indication that the crash is due to pool corruption. Often an engineer looking at a dump will stop at this point and conclude that a crash was caused by corruption, however we can go further.
从上面的输出可以看到,崩溃发生在ExAllocatePoolWithTag中,这是个很好的标志用以说明本次崩溃和池污染有关。通常如果工程师分析转储时会在这里停下并大胆推断该崩溃是由污染引起,不过我们要更深入下去。
The instruction that we failed on was dereferencing rax+8. The rax register contains 4f4f4f4f4f4f4f4f, which does not fit with the canonical form required for pointers on x64 systems. This tells us that the system crashed because the data in rax is expected to be a pointer but it is not one.
引起错误的命令是去解引用rax+8。rax寄存器的值为0x4f4f4f4f4f4f4f4f,这和x64系统上的所需指针规格有所出入。也就是说系统崩溃是因为rax中的数据本应是个指针而这一次并不是。
To determine why rax does not contain the expected data we must examine the instructions prior to where the failure occurred.
想要了解rax为何没有包含一个期望的值,我们需要看看失败位置前面的指令。
The assembly shows that rax originated from the data pointed to by r8. The .cxr command we ran earlier shows that r8 is fffffa8001a1b820. If we examine the data at fffffa8001a1b820 we see that it matches the contents of rax, which confirms this memory is the source of the unexpected data in rax.
汇编代码显示了rax源于r8指向的数据。.cxr
命令可以显示出r8是fffffa8001a1b820。如果查看fffffa8001a1b820处的数据会发现它和rax值相同,这证实了该内存就是rax中非期望的数据源头。
To determine if this unexpected data is caused by pool corruption we can use the !pool command.
为了确定这一非期望数据是由池污染所引起,我们可以用!pool
命令。
The above output does not look like the !pool command we used earlier. This output shows corruption to the pool header which prevented the command from walking the chain of allocations.
上面的输出看起来和之前的!pool
不太像。这一输出展示了池头的污染,它阻断了遍历分配链的命令。
The above output shows that there is an allocation at fffffa8001a1b000 of size 810. If we look at this memory we should see a pool header. Instead what we see is a pattern of 4f4f4f4f`4f4f4f4f.
上面的输出显示,在fffffa8001a1b000处有一个810大小的分配。如果我们查看该内存地址的话,本应看到一个池头。然而,我们这里看到的却是4f4f4f4f`4f4f4f4f。
At this point we can be confident that the system crashed because of pool corruption.
到此,我们可以确认系统崩溃的原因是池污染。
Because the corruption occurred in the past, and a dump is a snapshot of the current state of the system, there is no concrete evidence to indicate how the memory came to be corrupted. It is possible the driver that allocated the pool block immediately preceding the corruption is the one that wrote to the wrong location and caused this corruption. This pool block is marked with the tag “None”, we can search for this tag in memory to determine which drivers use it.
由于污染发生在过去,而转储只是当前系统状态的一个快照,没有具体的证据来指出内存是如何被污染的。可能是驱动在分配池块前,有谁在错误的位置写了数据并引起污染。该池块被标记为"None"标签,我们可以在内存中搜索这一标签来找出哪个驱动程序用了它。
The file Pooltag.txt lists the pool tags used for pool allocations by kernel-mode components and drivers supplied with Windows, the associated file or component (if known), and the name of the component. Pooltag.txt is installed with Debugging Tools for Windows (in the triage folder) and with the Windows WDK (in \tools\other*platform*\poolmon). Pooltag.txt shows the following for this tag:
文件Pooltag.txt列举了Windows支持的内核模式组件和驱动所用到的池分配标签、关联的文件或组件(如果知道的话)以及组件的名称。Pooltag.txt内置在Debugging Tools中(在triage目录),在Windows WDK中也有(\tools\other*platform*\poolmon)。Pooltag.txt显示了该标签:
None - <unknown> - call to ExAllocatePool
Unfortunately what we find is that this tag is used when a driver calls ExAllocatePool, which does not specify a tag. This does not allow us to determine what driver allocated the block prior to the corruption. Even if we could tie the tag back to a driver it may not be sufficient to conclude that the driver using this tag is the one that corrupted the memory.
不幸的是,我们找到的标签是在驱动调用ExAllocatePool时不指定标签时所用。这让我们无法判断到底是哪个驱动分配了污染前的块。即使我们可以将该标签与驱动绑定,想要推断驱动使用的标签就是被污染的那一个,也缺少证据。
The next step should be to enable special pool and hope to catch the corruptor in the act. We will discuss special pool in our next article.
下一步我们会通过激活特殊池来抓到污染源。我们会在下篇文章中讨论特殊池。
In our previous article we discussed pool corruption that occurs when a driver writes too much data in a buffer. In this article we will discuss how special pool can help identify the driver that writes too much data.
在上文中我们讨论了池污染发生的场景——某个驱动写缓冲区时写了太多数据而越界。本文中我们会讨论特殊池是如何鉴别真凶的。
Pool is typically organized to allow multiple drivers to store data in the same page of memory, as shown in Figure 1. By allowing multiple drivers to share the same page, pool provides for an efficient use of the available kernel memory space. However this sharing requires that each driver be careful in how it uses pool, any bugs where the driver uses pool improperly may corrupt the pool of other drivers and cause a crash.
池经常被典型的组织成,允许多个驱动在同一内存页上保存数据,如下图。为了支持多个驱动共享同一页,池提供了一种可用内核内存空间的有效用法。然而这一共享需要每个驱动小心翼翼的使用自己的池,驱动程序的池相关bug会污染其他驱动的池并引起崩溃。
With pool organized as shown in Figure 1, if DriverA allocates 100 bytes but writes 120 bytes it will overwrite the pool header and data stored by DriverB. In Part 1 we demonstrated this type of buffer overflow using NotMyFault, but we were not able to identify which code had corrupted the pool.
如上图显示的池结构,如果DriverA分配了100字节但是却写入了200字节,他就会覆盖掉DriverB的池块的头部和数据。在第一部分中,我们用NotMyFault展示了这种缓冲区溢出,但是我们无法找出污染池的是哪些代码。
To catch the driver that corrupted pool we can use special pool. Special pool changes the organization of the pool so that each driver’s allocation is in a separate page of memory. This helps prevent drivers from accidentally writing to another driver’s memory. Special pool also configures the driver’s allocation at the end of the page and sets the next virtual page as a guard page by marking it as invalid. The guard page causes an attempt to write past the end of the allocation to result in an immediate bugcheck.
为了抓住污染池的真凶,我们可以使用特殊池。特殊池修改了池的组织结构以至于每个驱动的池块都分配在一个独立的内存页上。这阻止了驱动意外的写入到另一个驱动的内存空间。特殊池同时也配置了驱动会分配在页的尾部,同时会通过将下一个虚拟页标记为非法来将其设定为守护页。当写入的数据位置超过了分配尾端并试图向守护页写入时,会立即引起一个bugcheck。
Special pool also fills the unused portion of the page with a repeating pattern, referred to as “slop bytes”. These slop bytes will be checked when the page is freed, if any errors are found in the pattern a bugcheck will be generated to indicate that the memory was corrupted. This type of corruption is not a buffer overflow, it may be an underflow or some other form of corruption.
特殊池也同时用一个重复的pattern填充了页未使用的部分,作为"溢出字节"。这些溢出字节会在页被释放时检测到,如果发现了任何错误就会生成一个bugcheck来指示内存被污染了。这一污染不是缓冲区溢出,他可能是个下溢或者其他形式的污染。
Because special pool stores each pool allocation in its own 4KB page, it causes an increase in memory usage. When special pool is enabled the memory manager will configure a limit of how much special pool may be allocated on the system, when this limit is reached the normal pools will be used instead. This limitation may be especially pronounced on 32-bit systems which have less kernel space than 64-bit systems.
因为特殊池把自身的每个池分配都存储在自己的4K页上,它会引起内存使用上的增加。当特殊池被启用时,内存管理器会配置一个系统上可被分配的特殊池的上限,当上限值达到时,就会使用普通的池取而代之。这一限制尤其在32位系统上声明,它比64位系统的内核空间小太多了。
Now that we have explained how special pool works, we should use it.
现在我们解释过了特殊池是如何工作的,我们现在使用它。
There are two methods to enable special pool. Driver verifier allows special pool to be enabled on specific drivers. The PoolTag registry value described in KB188831 allows special pool to be enabled for a particular pool tag. Starting in Windows Vista and Windows Server 2008, driver verifier captures additional information for special pool allocations so this is typically the recommended method.
有两种方法来启用特殊池。Driver verifier可以在特定驱动上启用特殊池。在 KB188831中描述的PoolTag注册的值允许特殊池以一个特定的池标签来启用。从Windows Vista和Server 2008开始,driver verifier还会捕获特殊池分配额外的信息,所以这种方法极为推荐。
To enable special pool using driver verifier use the following command line, or choose the option from the verifier GUI. Use the /driver flag to specify drivers you want to verify, this is the place to list drivers you suspect as the cause of the problem. You may want to verify drivers you have written and want to test or drivers you have recently updated on the system. In the command line below I am only verifying myfault.sys. A reboot is required to enable special pool.
使用driver verifier启用特殊池要用下面的命令行,或者用verifier GUI配置选项。使用/driver
来指定驱动,它用于列举你怀疑可能引起问题的驱动程序。你可能想要核查你写过的驱动或者系统上最近更新过的驱动。在下面的命令行中我仅仅核查myfault.sys。激活特殊池需要重启。
verifier /flags 1 /driver myfault.sys
After enabling verifier and rebooting the system, repeat the activity that causes the crash. For some problems the activity may just be to wait for a period of time. For our demonstration we are running NotMyFault (see Part 1 for details).
激活并重启后,重复上文中引起崩溃的操作。由于一些原因,我们需要等待一段时间。为了演示我们运行NotMyFault(更多细节参考第一部分)。
The crash resulting from a buffer overflow in special pool will be a stop 0xD6, DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION.
在特殊池中由于缓冲区溢出而崩溃,停止码为0xD6 DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION。
We can debug this crash and determine that notmyfault.sys wrote beyond its pool buffer.
我们可以调试这一崩溃,判断notmyfault.sys执行了越界写。
The call stack shows that myfault.sys accessed invalid memory and this generated a page fault.
栈回溯显示了myfault.sys访问了非法的内存,导致了一个页错误。
The !pool command shows that the address being referenced by myfault.sys is special pool.
!pool
命令显示了myfault.sys引用的内存是特殊池。
The page table entry shows that the address is not valid. This is the guard page used by special pool to catch overruns.
页表项显示了这一地址是非法的。这是特殊池用于捕捉溢出的守护页。
The allocation prior to this memory is an 800 byte block of non paged pool tagged as “Wrap”. “Wrap” is the tag used by verifier when pool is allocated without a tag, it is the equivalent to the “None” tag we saw in Part 1.
该段内存前的800字节非分页内存池块被标记为"Wrap"。"Wrap"是verifier在分配未使用标签时,使用的标签,它和第一部分看到的"None"标签是等价的。
Special pool is an effective mechanism to track down buffer overflow pool corruption. It can also be used to catch other types of pool corruption which we will discuss in future articles.
特殊池是一个追溯缓冲区溢出污染的有效机制。他也可以用于捕捉其他类型的池污染,我们会在下文中说明。
In Part 1 and Part 2 of this series we discussed pool corruption and how special pool can be used to identify the cause of such corruption. In today’s article we will use special pool to catch a double free of pool memory.
在第一和第二部分我们讨论了池污染以及如何使用特殊池来侦测池污染的真凶。今天我们将使用特殊池来捕捉池内存的二次释放。
A double free of pool will cause a system to blue screen, however the resulting crash may vary. In the most obvious scenario a driver that frees a pool allocation twice will cause the system to immediately crash with a stop code of C2 BAD_POOL_CALLER, and the first parameter will be 7 to indicate “Attempt to free pool which was already freed”. If you experience such a crash, enabling special pool should be high on your list of troubleshooting steps.
池的二次释放会引起一个系统蓝屏,然而导致崩溃的结果多种多样。最明显的就是一个驱动释放了同一个池块两次,此时立即引起系统崩溃,停止码为0xC2 BAD_POOL_CALLER,第一个参数是7,表示“试图释放已经释放的池块”。如果你遇到过这样的崩溃,启用特殊池应该是定位问题的不二手段。
A less obvious crash would be if the pool has been reallocated. As we showed in Part 2, pool is structured so that multiple drivers share a page. When DriverA calls ExFreePool to free its pool block the block is made available for other drivers. If memory manager gives this memory to DriverF, and then DriverA frees it a second time, a crash may occur in DriverF when the pool allocation no longer contains the expected data. Such a problem may be difficult for the developer of DriverF to identify without special pool.
如果该池块被重新分配过了,那么崩溃现场不是那么显而易见。如在第二部分中描述的,池是一个被多个驱动共享同一页的结构。当DriverA调用ExFreePool来释放池块时,该池块对其他驱动来说就变成可用的。如果内存管理器把这段内存给了DriverF,而DriverA在此后二次释放它的话,就会引起DriverF的崩溃,该池块不再包含期望的数据。如果不用特殊池,这样一个问题对DriverF的开发者来说难以甄别原因。
Special pool will place each driver’s allocation in a separate page of memory (as discussed in Part 2). When a driver frees a pool block in special pool the whole page will be freed, and any access to a free page will cause an immediate bugcheck. Additionally, special pool will place this page on the tail of the list of pages to be used again. This increases the likelihood that the page will still be free when it is freed a second time, decreasing the likelihood of the DriverA/DriverF scenario shown above.
特殊池会把每个驱动的分配块都放在独立的内存页上(第二部分已讨论过)。当某驱动在特殊池释放池块时,整个页都会被释放,任何访问该页的操作都会立即引起bugcheck。此外,特殊池会把这些页放在页链表的最后以重复使用。这会增加页被二次释放时仍然处于空闲态的可能性,降低上面描述的DriverA/DriverF经典情景发生的可能性。
To demonstrate this failure we will once again use the Sysinternals tool NotMyFault. Choose the “Double free” option and click “Crash”. Most likely you will get the stop C2 bugcheck mentioned above. Enable special pool and reboot to get a more informative error.
为了演示我们再次使用Sysinternals工具NotMyFault。选择“Double free“选项并单击Crash。极大可能的,你会看到如上面描述般C2停止码的bugcheck。激活特殊池并重启以获取更有信息量的错误。
verifier /flags 1 /driver myfault.sys
Choosing the “Double free” option with special pool enabled resulted in the following crash. The bugcheck code PAGE_FAULT_IN_NONPAGED_AREA means some driver tried to access memory that was not valid. This invalid memory was the freed special pool page.
激活特殊池并选择"Double free"选项,崩溃结果如下。BugCheck码为PAGE_FAULT_IN_NONPAGED_AREA,这意味着某个驱动正在试图访问非法的内存。该非法内存时一个释放了的特殊池页。
Looking at the call stack we can see myfault.sys was freeing pool and ExFreePoolSanityChecks took a page fault that lead to the crash.
看看栈回溯,我们可以发现myfault.sys在释放池块,ExFreePoolSanityChecks引起了一个缺页异常并最终导致崩溃。
Using the address from the bugcheck code, we can verify that the memory is in fact not valid:
使用bugcheck码处的地址,可以核实该段内存是非法的:
So far we have enough evidence to prove that myfault.sys was freeing invalid memory, but how to we know this memory is being freed twice? If there was a double free we need to determine if the first or second call to ExFreePool was incorrect. To this so we need to determine what code freed the memory first.
到此我们已经有了足够的证据来证实myfault.sys在释放一段非法的内存,但是要如何知道这段内存被释放了两次呢?如果是二次释放的话我们需要判断第一次或第二次对ExFreePool的调用是有误的。因此我们需要判断该内存第一次释放时的代码。
Driver Verifier special pool keeps track of the last 0x10000 calls to allocate and free pool. You can dump this database with the !verifier 80 command. To limit the data output you can also pass this command the address of the memory you suspect was double freed.
Driver verifier特殊池保留了最后的0x10000个分配和释放池块调用的回溯。你可以通过!verifier 80
命令将其转储成数据库。你也可以通过传递该疑似二次释放的内存地址作为参数来限制输出的数据。
Don’t assume the address in the bugcheck code is the address being freed, go get the address from the function that called VerifierExFreePoolWithTag.
不要假定bugcheck码就是被释放的地址,从VerifierExFreePoolWithTag的调用处获取地址。
In the above call stack the call below VerifierExFreePoolWithTag is frame 9 (start counting with 0, or use kn).
上面的栈回溯中,VerifierExFreePoolWithTag下面的调用是帧9(从0开始数,或者使用kn命令)。
On x64 systems the first parameter is passed in rcx. The below assembly shows that rcx originated from rbx.
x64系统上第一个参数由rcx传递。下面的汇编显示了rcx从rbx获取。
Run !verifier 80 using the address from rbx:
运行!verifier 80
,使用rbx中的地址。
The above output shows the pool block being allocated by myfault.sys and then freed by myfault.sys. If we combine this information with the call stack leading up to our bugcheck we can conclude that the pool was freed once in MyfaultDeviceControl at offset 0x2f2, then freed again in MyfaultDeviceControl at offset 0x2fd.
上面的输出显示myfault.sys分配了池块,后来又释放了池块。如果我们组合栈回溯和bugcheck的信息,就可以推断出池块被第一次释放是在MyfaultDeviceControl偏移0x2f2处,第二次释放是在MyfaultDeviceControl偏移0x2fd处。
Now we know which driver is causing the problem, and if this is our driver we know which area of the code to investigate.
现在我们知道了到底是哪个驱动引起的问题,如果这是我们的驱动我们就知道该去调查哪段代码了。
[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课
赞赏
- [翻译]Windows 10 Segment Heap内部机理 19876
- [翻译]Windows 8堆内部机理 7157
- [翻译]深入理解LFH 7785
- [翻译]Bitmap轶事:Windows 10纪念版后的GDI对象泄露 9296
- [翻译]理解池污染三部曲 6821