首页
社区
课程
招聘
[翻译]《深入解析windows操作系统第6版下册》第10章:内存管理(第一部分)
发表于: 2015-9-16 23:38 48462

[翻译]《深入解析windows操作系统第6版下册》第10章:内存管理(第一部分)

2015-9-16 23:38
48462


缘起
前段时间买了一本中文版的《深入解析windows操作系统第6版上册》,价格不菲,128米,但是翻译的质量还是比较高,作者是大名鼎鼎的windows内核先驱Mark Russ,以及另外2位windows架构大师,译者也是有名的潘大(《windows内核原理与实现》一书作者)和范大;
在上册中文版的最后提到,下册中文版将于年内(当时是2014年)推出.可惜过了一年多,迟迟不见问市,
电子工业出版社网站也没有相关信息(只能搜索到上册中文版);
于是突发奇想自行翻译英文原版下册.我在网上找到了英文原版下册的PDF版,最近在尝试翻译第10章: 内存管理,由于之前学习了一些相关的知识,因此翻译起来还不算太吃力。有兴趣的朋友可以一起来翻译下册(借助搜索引擎可以找到下载资源,51cto以及csdn社区都有),提出你想要翻译的章节,然后在这个子板块发表。
众所周知,该系列书籍是探索 windows 内部机理的首选材料,我们可以在翻译的过程中学到很多知识,同时提高自身的水平,遇到翻译困难的部分也可以互相交流学习;

本帖为原创翻译,采用中英对照,译注是自行添加的说明,内容会持续更新,由于工作的关系,翻译速度大约是一天1~2页原文,本帖仅翻译下册第10章(内容较多,可能会发表后续部分),上册的所有章节已经有中文版实体书和PDF版,可以搜索相关信息;个人所学有限,译文的错误之处还请提出指正,不胜感激。第一部分译文预计翻译的内容如下:



此外,考虑到帖子中,能够作为附件插入的图片有其数量上限,所有图片都外链到我的51cto博客(http://shayi1983.blog.51cto.com/)的相应博文中,它们实际上是在那里被上传的,因此特定图片带有水印,并且原始博文的更新进度会比这里的慢一些,希望各位别介意。
下面是原文+译文



CHAPTER 10 Memory Management
In this chapter, you’ll learn how Windows implements virtual memory and how it manages the subset of virtual memory kept in physical memory. We’ll also describe the internal structure and components that make up the memory manager, including key data structures and algorithms. Before examining these mechanisms, we’ll review the basic services provided by the memory manager and key concepts such as reserved memory versus committed memory and shared memory.

第10章:内存管理
在本章中,你将学习windows如何实现虚拟内存,以及如何管理驻留在物理内存中的虚拟内存子集。我们也会描述组成“windows内存管理器”的内部结构和组件,包括关键数据结构和算法。
在考察这些机制之前,我们先回顾一下内存管理器提供的基础服务,以及诸如 reserved memory , committed memory ,shared memory 这些重要概念。


Introduction to the Memory Manager
By default, the virtual size of a process on 32-bit Windows is 2 GB. If the image is marked specifically as large address space aware, and the system is booted with a special option (described later in this chapter), a 32-bit process can grow to be 3 GB on 32-bit Windows and to 4 GB on 64-bit Windows.
The process virtual address space size on 64-bit Windows is 7,152 GB on IA64 systems and 8,192 GB on x64 systems. (This value could be increased in future releases.)

windows内存管理器简介
默认情况下,32位 windows上的一个进程的虚拟大小(地址空间)为2GB,如果该进程对应的二进制映像文件被特别标注了“large address space aware”(察觉到大地址空间),并且系统以特殊选项引导(本章稍后讨论),那么 32 位进程在 32 位 windows上的虚拟大小可达到 3GB;在64 位 windows 上的虚拟大小可达到 4GB;运行在 Intel IA-64 体系结构的 64 位 windows 上的 64 位进程的虚拟地址空间大小为 7152GB;在 x64 体系结构上则为 8192 GB;(在处理器硬件和操作系统软件的后续发布版中,这些值可能会增大)
(译注:用 visual studio 打开任意解决方案或工程文件,打开要进行编译的 .cpp 或其它格式的源文件,从主菜单中选择 “Project” -> “xxx Prooerties”,xxx是你的软件项目名称,将打开如下图所示的属性配置界面,从右侧的树型结构中选择“Linker” ->“System”,然后就可以配置链接器在生成该二进制文件时,标记启用大地址空间)



As you saw in Chapter 2, “System Architecture,” in Part 1 (specifically in Table 2-2), the maximum amount of physical memory currently supported by Windows ranges from 2 GB to 2,048 GB, depending on which version and edition of Windows you are running. Because the virtual address space
might be larger or smaller than the physical memory on the machine, the memory manager has two primary tasks:
■ Translating, or mapping, a process’s virtual address space into physical memory so that when a thread running in the context of that process reads or writes to the virtual address space, the correct physical address is referenced. (The subset of a process’s virtual address space that is
physically resident is called the working set. Working sets are described in more detail later in this chapter.)
■ Paging some of the contents of memory to disk when it becomes overcommitted—that is, when running threads or system code try to use more physical memory than is currently available— and bringing the contents back into physical memory when needed.

在本书上册第二章“系统架构”中的表2-2提到,windows 当前支持的最大物理内存从 2 GB 到 2048 GB 不等,这取决于你运行的 windows版本和位数;由于虚拟地址空间可能比机器上安装的物理内存总量要大,也有可能比它小;
(译注:在第三部分译文中会讲到:在写作原文的时间点上,Intel 与 AMD 的 x64 体系结构仅使用了 64 位地址总线中的 48 位,因此 x64 处理器当前仅支持最多 256 TB 的虚拟内存—— 2 的 48 幂;而 64 位的 Windows 在此基础上,进一步限制了只能使用 16 TB 虚拟内存——用户空间与内核空间各占 8 TB,如前文所述。在这种情况下,64 位 Windows 支持的虚拟内存上限,即
16 TB——就比它支持的物理内存上限—— 2 TB—— 还要大;另外一种情况,即虚拟内存上限小于物理内存上限,大家应该都非常熟悉了——在你的 32 位 Windows 上,没有启用 PAE 和前述的“large address space aware”,那么无论你购置了多少 RAM 条,进程的虚拟内存上限顶多就 2 GB)

因此内存管理器的二个主要任务为:
■ 将一个进程的虚拟地址空间翻译或映射成物理内存,从而使运行在该进程上下文中的线程读写虚拟地址空间时,能够引用正确的物理地址。
( windows 将驻留在物理内存中的进程虚拟地址空间子集称为“工作集”,它的更多细节将在本章后面描述)
■ 当物理内存过载时(例如,当运行的线程或内核代码尝试请求比当前可用内存更多的时候),将其中部分内容换出至磁盘,以及在需要时将这些内容换回(入)物理内存。


In addition to providing virtual memory management, the memory manager provides a core set of services on which the various Windows environment subsystems are built. These services include memory mapped files (internally called section objects), copy-on-write memory, and support for applications using large, sparse address spaces. In addition, the memory manager provides a way for a process to allocate and use larger amounts of physical memory than can be mapped into the process virtual address space at one time (for example, on 32-bit systems with more than 3 GB of physical memory). This is explained in the section “Address Windowing Extensions” later in this chapter.
除了提供虚拟内存管理服务外,内存管理器还为构建在它上层的各种 windows 环境子系统提供一组核心服务;这些服务包括内存映射文件(在 windows 中叫做 section 对象 ),写时复制内存,以及支持应用程序使用大规模,稀疏(非连续)的地址空间。此外,内存管理器提供了一种方法,给进程一次性分配和使用大量的物理内存,可以超过映射入进程虚拟地址空间的上限(例如,在 32 位系统上,可以分配多于3GB 的物理内存,这就超过了进程的 3GB 虚拟内存上限)。在本章后面的“Address Windowing Extensions”(地址窗口扩展?)部分,将对其进行解释。

Note
There is a Control Panel applet that provides control over the size, number, and locations of the paging files, and its nomenclature suggests that “virtual memory” is the same thing as the paging file. This is not the case. The paging file is only one aspect of virtual memory. In fact, even if you run with no page file at all, Windows will still be using virtual memory. This distinction is explained in more detail later in this chapter.

注意:
在控制面板中有一项提供对页面文件的大小,数量,以及位置的控制(译注:即“系统”选项卡-> 高级系统设置 ->高级选项卡 -> 单击性能栏目的“设置”,再切换到“高级选项卡”,单击“虚拟内存”栏目的“更改”);其命名法指出“virtual memory”(虚拟内存)与分页文件是相同的概念;情况并非如此。分页文件仅仅是虚拟内存的一个方面(或子集)。实际上,即便你设置成完全不使用页面文件,windows 将依旧使用虚拟内存。本章后面将详细解释这之间的区别。


Memory Manager Components
The memory manager is part of the Windows executive and therefore exists in the file Ntoskrnl.exe.
No parts of the memory manager exist in the HAL. The memory manager consists of the following components:
■ A set of executive system services for allocating, deallocating, and managing virtual memory,most of which are exposed through the Windows API or kernel-mode device driver interfaces
■ A translation-not-valid and access fault trap handler for resolving hardware-detected memory management exceptions and making virtual pages resident on behalf of a process

内存管理器的组件
内存管理器是 windows 执行体的一部分,因此它存在于文件 Ntoskrnl.exe 之中。内存管理器没有任何部分是位于 HAL(译注:硬件抽象层)中的。内存管理器由下列组件构成:
■ 一组用于分配,释放,以及管理虚拟内存的执行体系统服务,其中多数通过Windows API或者内核模式设备驱动接口对外暴露;
■ 一个翻译无效和访问错误陷阱处理程序,用于解决硬件探测到的内存管理异常,以及代表一个进程,让它的部分虚拟页面驻留在物理内存中;(这句翻译的不好,原文是A translation-not-valid and access fault trap handler for resolving hardware-detected memory management exceptions and making virtual pages resident on behalf of a process。谁知道怎么翻译较好?)


■ Six key top-level routines, each running in one of six different kernel-mode threads in the System process (see the experiment “Mapping a System Thread to a Device Driver,” which shows how to identify system threads, in Chapter 2 in Part 1):
■  6个关键的顶级(译注:这里应该是指线程调度的优先级,数值越大,优先级越高)例程,每个例程作为 System 进程中6种不同的内核模式线程之一运行(回顾上册第2章的实验“将一个系统线程映射到一个设备驱动”,该实验展示了辨别系统线程的方法):
(译注:System 进程的 PID 为 4,它是一种特殊线程的宿主,这种特殊线程只能在内核模式下运行,称为“内核模式系统线程”,这意味着,它们没有用户空间地址,所以这些线程临时申请任何存储空间时,通常会从内核模式堆/系统内存池中分配,具体参考第二部分译文。此外,各种执行体组件中的例程,以及设备驱动程序,都可以通过由执行体组件——进程/线程管理器——导出的 PsCreateSystemThread() 例程,来在 System 进程中创建系统线程——这仅仅是默认的行为,换言之,该例程支持指定其它进程作为要创建的系统线程宿主。通过 sysinternal 的进程浏览器工具,查看 System 进程的属性,选择“Thread”标签,可以看到当前运行在其中的所有系统线程,正常情况下应该有100多个左右;按照每次采样间隔的上下文切换次数排序。即 CSwitch Delta,可以列出因上下文切换导致占用最多 CPU 时钟周期的系统线程;如下图所示,线程的启动地址以 ntkrnlpa.exe! 开头的,一般就是各种执行体组件通过 PsCreateSystemThread() 例程创建的系统线程;而线程的启动地址以 .sys 后缀开头的,表示由相应的设备驱动程序创建的系统线程。 )


(译注:需要特别指出,由于某种原因,你使用进程浏览器查看 System 进程中的系统线程时,可能无法找到下文描述的 6 个系统线程,例如平衡集管理器,即 KeBalanceSetManager() ,如果你要进一步研究这个线程的内部逻辑,来验证原文对它的描述是否正确,则可以使用 KD.exe/Windbg.exe 的 uf 调试器命令,反汇编该例程中的机器代码,当然,也可以使用 IDA PRO 打开 ntkrnlpa.exe 映像,然后分析相应的位置,如下图所示


另外一种更直观的方法是,用内核调试器的 !process 0 0 扩展命令列出当前系统上运行的所有进程,在其中找到 System 进程的 EPROCESS 结构的虚拟地址,然后用这个地址作为前述扩展命令的第一个参数;用 0xf 作为其第二个参数再次执行该命令,按照这种方式,能够列出进程浏览器不可见的所有 6 个下文描述的系统线程,包括进程浏览器无法查询的调用栈信息,下图展示了这种方法。)




如何理解上图中的线程启动地址“ntkrnlpa.exe!ObfDereferenceObjectWithTag+0xa9”?如果我们反汇编 ObfDereferenceObjectWithTag() 例程,可以发现地址 84295d7f 被标记为 nt!ObfDereferenceObjectWithTag+0xa7 ,该地址处的指令为 0x90,即 nop,或称空指令;
而紧接其后的地址——84295d80——也就是 KeBalanceSetManager() 例程的第一条机器指令的地址,按照这种偏移法来表达,就是“ntkrnlpa.exe!ObfDereferenceObjectWithTag+0xa8”,由此可见,进程浏览器的线程启动地址信息基本正确,只有一字节的误差,并且也验证了“ntkrnlpa.exe!ObfDereferenceObjectWithTag+0xa9”这个启动地址确实位于 KeBalanceSetManager() 例程起始处,因此这个系统线程就是平衡集管理器线程,参考下图。



1.  The balance set manager (KeBalanceSetManager, priority 16). It calls an inner routine, the working set manager (MmWorkingSetManager), once per second as well as when free memory falls below a certain threshold. The working set manager drives the overall memory management policies, such as working set trimming, aging, and modified page writing.
1.  balance set manager(KeBalanceSetManager,平衡集管理器,优先级16)。它调用一个内部例程,叫做  
working set manager
(MmWorkingSetManager,工作集管理器),后者每秒被调用一次;
(注意例程名称前缀的暗示:平衡集管理器以 Ke 开头,表明它是在更底层的内核中实现的;工作集管理器以 Mm 开头,表明它是在执行体组件——内存管理器——中实现的)。
此外,当可用内存低于某个阈值时,它也会被调用。工作集管理器会驱动整体的内存管理策略,例如工作集大小微调(裁剪),增加页面的使用年龄(译注:用于页面置换算法确定最近最少使用的页面,作为牺牲页换出内存),并且,如果牺牲页已被修改过,先将原修改数据写回磁盘上的交换空间(分页文件),然后再将该页面用于存储新数据。这个过程由 MiModifiedPageWriter() 例程负责,参见下文


2.  The process/stack swapper (KeSwapProcessOrStack, priority 23) performs both process and kernel thread stack inswapping and outswapping. The balance set manager and the thread-scheduling code in the kernel awaken this thread when an inswap or outswap operation needs to take place.
2.  process/stack swapper(KeSwapProcessOrStack,进程/栈交换器,优先级23)。执行进程与内核线程栈的换入换出操作。平衡集管理器与内核中的线程调度代码(译注:即操作系统调度器;在 Linux 和UNIX上,任务调度通常以进程为单位,而 windows 则支持线程粒度的调度)在需要发生进程或者内核线程栈的换入和换出操作时,就会唤醒这个线程。(译注:在 UNIX 变体如 4.3 BSD 上,执行相同任务的是一个叫做
swapper 的系统进程,通常在系统的空闲物理页框即物理内存不足,或者某些进程长时间没有获得调度从而变成非活动进程时,swapper 进程和 windows 的
process/stack swapper 线程被唤醒,将这些进程换出内存,从而释放空间。KeSwapProcessOrStack() 例程的操作会影响到线程的调度状态改变,例如,假设一个线程准备好执行,但它的内核栈[内核模式线程的情况]或调用栈/所属进程[用户模式线程的情况],被换出了内存,则该线程进入转换状态。一旦这些栈或所属进程被换回至内存中,该线程进入就绪状态。关于对线程调度状态的讨论,请参考本书上册第五章的 5.7 节——线程调度)


3.  The modified page writer (MiModifiedPageWriter, priority 17) writes dirty pages on the modified list back to the appropriate paging files. This thread is awakened when the size of the modified list needs to be reduced.
3.  modified page writer (MiModifiedPageWriter,已修改页面写回器,优先级 17)。将已修改页列表中的“脏”页写回适当的分页文件。(译注:通常由处理器的组件 MMU——内存管理单元——在向一个 PTE 页表项负责的 4KB 地址空间中某个地址写入数据时设置该 PTE 的修改位,Dirty bit ,即脏位;如果设置了脏位,内核在将该页面用于存储新数据前,应该首先将原数据写回硬盘上的分页文件,以反映修改结果;Intel x86/64体系结构中,提供了一条特权指令供内核清除该位) 。当需要减小已修改页列表的尺寸时,该线程就会被唤醒。

4.  The mapped page writer (MiMappedPageWriter, priority 17) writes dirty pages in mapped files to disk (or remote storage). It is awakened when the size of the modified list needs to be reduced or if pages for mapped files have been on the modified list for more than 5 minutes. This second modified page writer thread is necessary because it can generate page faults that result in requests for free pages. If there were no free pages and there was only one modified page writer thread, the system could deadlock waiting for free pages.
4.  mapped page writer (MiMappedPageWriter,映射页面写入器,优先级 17)。将位于内存中的“映射文件”的“脏”页写回磁盘(或远程存储),以更新修改结果。当需要减小已修改页列表的大小,或者映射文件中的脏页面位于已修改页列表超过 5 分钟,该线程就会被唤醒。
(译注:WRK1.2 版源码中的 mminit.c 模块包含内存管理器系统的初始化逻辑,其中第 171~178 行定义了这个 5 分钟时限,节录如下:)
[COLOR="Black"][B]// Default is a 300 second life span for modified mapped pages -
// This can be overridden in the registry.
//

[FONT="微软雅黑"]ULONG MmModifiedPageLifeInSeconds = 300;[/FONT][/B][/COLOR]


源码及其注释证实了,位于已修改页列表中的脏页的生存期就是 300 秒,如果超时将被写回磁盘
,注释还提到,该值可以通过编辑注册表的对应项来覆盖,但是没有给出具体的键路径。因为
MmModifiedPageLifeInSeconds 是一个全局变量,它可以被映射页面写入器访问,并据此作为页面回写磁盘的超时标准。

(这第二个)已修改页面写入器线程是必须的;如果只有一个已修改页面写回器线程,并且当前没有空闲页面,该线程可能因请求空闲页面而生成页面错误,此时该线程将被阻塞在等待产生空闲页面事件,但是又没有第二个能够将已修改页面写入磁盘,从而释放出空闲页面的线程可供调度,于是整个系统会进入死锁状态来等待空闲页面。(译注:这段译文经过自行润色,原文直译不好理解,希望没有偏离作者要表达的意思)


(译注:我们分析 WRK 1.2 版源码中的 modwrite.c 模块,它实现将已修改页面或已修改的映射文件写回磁盘的逻辑,在
其中可以学习到如何正确地初始化事件,等待事件变成有信号,然后继续执行;同时还可以检验原文讲到的相关内容,对内核工作机制有更直观的理解。
首先,在 MiModifiedPageWriter() 这个系统线程中(modwrite.c 的 2943 行),调用 KeInitializeEvent() 初始化全局的 MmMappedPageWriterEvent 事件,注意,此次使用一个通知事件(因此我们将在后面看到,需要在 KeSetEvent() 后调用 KeClearEvent() 清除事件的有信号状态),且初始化状态为无信号。然后它创建 MiMappedPageWriter() 系统线程,这就是原文提到的“第二个已修改页面写入器线程”——用于写入映射文件页面(第一个则是
MiModifiedPageWriter() 自身)。并且初始化一个由 LIST_ENTRY 组成的全局双向链表头部,如下所示:





由 LIST_ENTRY 组成的双向链表示意图如下,在实践中通常会把 LIST_ENTRY 内嵌在一个更大的母结构体内部,每个这样的母结构体称为链表中的“表项”;一个表项可以通过自身的 LIST_ENTRY 成员引用前一个和后一个表项:



而在 MiMappedPageWriter() 中(modwrite.c 的 4397 行),它会调用并阻塞在 KeWaitForSingleObject() 内部,等待其它函数处理这个链表,如下所示:



MiModifiedPageWriter() 在创建 MiMappedPageWriter() 线程后,它会调用
MiModifiedPageWriterWorker(),后者进一步调用 MiGatherMappedPages()——它在返回前调用 InsertTailList(),在全局的双向链表尾部插入表项,然后调用 KeSetEvent() 把 MmMappedPageWriterEvent 事件设置为有信号,从而“唤醒”MiMappedPageWriter() 线程(从 KeWaitForSingleObject() 中返回),如下所示:



MiGatherMappedPages() 返回到它的调用者后,MiModifiedPageWriterWorker() 显式调用 KeClearEvent() 来清除事件。回忆一下,前面说到通知事件需要在设置后手动清除,如下所示:



另一方面,前几张的截图显示出,MiMappedPageWriter() 线程被唤醒后,会调用 IsListEmpty() 例程来判断 MiGatherMappedPages() 处理过的全局双向链表是否为空,然后再进行相应的处理。
再次强调:本系列书籍作者由于其职位敏感性,不能总是把源码摆出台面上来分析系统机制,
只能用一些技术性较强的句子来描述,这又造成了我们翻译的困难,因此当你对原文或译文中某些内容感觉云里雾里时,像我这样搜索 WRK 源码中的相关实现逻辑,甚至结合内核调试器来动态分析,对理解幕后原理和设计思想都会很有帮助,还能够学习这些内核例程,数据结构的用法,提升你的驱动编程水平。)


5.  The segment dereference thread (MiDereferenceSegmentThread, priority 18) is responsible for cache reduction as well as for page file growth and shrinkage. (For example, if there is no virtual address space for paged pool growth, this thread trims the page cache so that the paged pool used to anchor it can be freed for reuse.)
5.  segment dereference thread (MiDereferenceSegmentThread,暂译为“内存段解引用线程”,优先级 18)。负责减少系统高速缓存数量(译注:这里不是CPU芯片内的硬件 L1~L3 cache,应该是指 windows 将磁盘上的分页文件用作物理内存的高速缓存这一概念)以及负责分页文件的增长和收缩。(例如,若没有虚拟地址空间可用于分页池增长,该线程将裁剪页面缓存的大小——解引用部分页面缓存——这样就能够释放页面缓存占用的虚拟内存并重用于分页池)(译注:这句也不太好翻译,原文是For example, if there is no virtual address space for paged pool growth, this thread trims the page cache so that the paged pool used to anchor it can be freed for reuse )
(译注:有条件获取 WRK1.2 版源码的童鞋,可以参考 sectsup.c 源文件中,从第 805 行开始的 MiDereferenceSegmentThread() 例程定义,sectsup.c 源文件中的 MiSectionInitialization() 例程用于初始化“Section”对象类型——MmSectionObjectType ,并且它会调用 PsCreateSystemThread() ,后者在 System 进程中创建 MiDereferenceSegmentThread 线程,鉴于 WRK 的源码许可协议规定每次引用的代码量不能大于 50 行,下面节录了最相关的代码片段: )


[FONT="微软雅黑"][SIZE="4"][COLOR="Black"]HANDLE ThreadHandle;
 if (!NT_SUCCESS(PsCreateSystemThread (&ThreadHandle,
                                          THREAD_ALL_ACCESS,
                                          &ObjectAttributes,
                                          0,               [B] //第 4 个参数(HANDLE  ProcessHandle)指明要在哪个进程中创建系统线程,如果此参数为 NULL,就在 System 进程中创建系统线程。[/B]
                                          NULL,
                                          MiDereferenceSegmentThread,
                                          NULL))) {
        return FALSE;
    }
  ZwClose (ThreadHandle);[/COLOR][/SIZE][/FONT]

(上面代码片段中,开始处定义的局部变量 ThreadHandle ,此刻为到 MiDereferenceSegmentThread() 线程的句柄。如原文所述,它是一个重要的系统线程,在必要时会释放(解引用)“系统缓存”类型的内核虚拟地址空间,从而保证其它类型的内核虚拟地址空间——例如可分页池——有足够的虚拟内存可用。所以该线程必须一直运行,这里关闭它的句柄,避免其它内核组件误操作而终止该线程。)

6.  The zero page thread (MmZeroPageThread, base priority 0) zeroes out pages on the free list so that a cache of zero pages is available to satisfy future demand-zero page faults.
Unlike the other routines described here, this routine is not a top-level thread function but is called by the top-level thread routine
Phase1Initialization. MmZeroPageThread never returns to its caller, so in effect the Phase 1 Initialization thread becomes the zero page thread by calling this routine. Memory zeroing in some cases is done by a faster function called MiZeroInParallel. See the note in the section “Page List Dynamics” later in this chapter.
Each of these components is covered in more detail later in the chapter.

6.  zero page thread (译注:MmZeroPageThread,零页线程,基础优先级为 0,该线程属于可变优先级类线程,此类线程的当前优先级以基础优先级为下限,可以动态变化,例如:一,如果此类线程由于等待I/O事件而被挂起,内核在调度运行其它就绪线程前,将提升此类线程的当前优先级;二,如果此类线程用完了本次分配给它的时间片而被挂起,内核降低其当前优先级,并用于下一轮调度时判断的标准)。
零页线程将空闲页列表(译注:可能通过类似单向链表的数据结构实现)中的页全部用 0 填充,然后换出内存,从而使得分页文件缓存中有全 0 的页面可用于满足将来的“零页需求”类型的页面错误,并且能够被换入内存。与这里描述的其它 5 类例程不同,此例程并非顶级线程函数,但是它会被一个叫做 Phase1Initialization 的顶级线程例程调用;
(译注: 系统引导分成第一阶段内核,执行体初始化,以及第二阶段的执行体初始化。在第一阶段的执行体初始化中的函数调用流程如下:
ExpInitializeExecutive() -> PsInitSystem() -> PspInitPhase0()
PspInitPhase0() 首先调用 PsGetCurrentProcess(),将自身转变为 Idle 进程,然后创建 system 进程,并在其中创建系统线程 Phase1Initialization,最后执行上下文切换将控制转交给 Phase1Initialization 线程,由后者负责第二阶段的执行体初始化;
而 Phase1Initialization 线程在最后则通过调用 MmZeroPageThread() ,将自身转变为零页线程。下面这张引用了《Windows 内核设计思想》一书中的图片,比较全面地概括了系统启动的流程:)



零页线程从不返回(到它的调用者),因此实际上,Phase1 初始化线程函数(在最后)通过调用此例程变成零页线程。内存清零操作有时候通过一个叫“MiZeroInParallel”的函数完成,其速度更快。更多细节请查看本章后面的“Page List Dynamics”(动态页列表?直译。。。);
本章后面会涵盖上述 6 个内存管理器组件的更多细节。
(译注:为了避免只见树木不见林的弊端,下图给出内存管理器在整个Windows系统架构中的位置。注意,为求简洁,这张系统架构图省略了许多用户模式子系统DLL,内核模式组件,TCP/IP协议栈如tcpip.sys等。重点突出函数调用的轨迹,以及模块之间的依赖性。完整的系统架构图,各位可以参考本书上册第2章“关键的系统组件”一节)




Internal Synchronization
Like all other components of the Windows executive, the memory manager is fully reentrant and supports simultaneous execution on multiprocessor systems—that is, it allows two threads to acquire resources in such a way that they don't corrupt each other's data. To accomplish the goal of being fully reentrant, the memory manager uses several different internal synchronization mechanisms, such as spinlocks, to control access to its own internal data structures. (Synchronization objects are discussed in Chapter 3, “System Mechanisms,” in Part 1.)
Some of the systemwide resources to which the memory manager must synchronize access include:
■ Dynamically allocated portions of the system virtual address space
■ System working sets
■ Kernel memory pools
■ The list of loaded drivers
■ The list of paging files
■ Physical memory lists
■ Image base randomization (ASLR) structures
■ Each individual entry in the page frame number (PFN) database
Per-process memory management data structures that require synchronization include the working set lock (held while changes are being made to the working set list) and the address space lock (held whenever the address space is being changed). Both these locks are implemented using pushlocks.

内部同步
正如windows执行体的所有其它组件一样,内存管理器是完全可重入的,并且支持在多处理器系统上同时执行——换句话说,以这样的方式能够允许2个线程在不损坏彼此数据的情况下获取资源。为了实现可完全重入这一目标,内存管理器使用了几种不同的内部同步机制,例如自旋锁,用于控制对系统自身内部数据结构的访问(同步对象在本书上册第3章“系统机制”中讨论)
内存管理器必须对其访问进行同步化的一些系统范围资源包括:
■  系统虚拟地址空间的动态分配部分;
■ 系统工作集;
■ 内核内存池;
■ 已加载驱动程序的列表;
■ 分页文件列表;
■ 物理内存列表;
■ 地址空间布局随机化(ASLR)使用的相关结构;
■ 页框号(PFN)数据库中,每个单独的条目;(译注:这里的页框号数据库,类似操作系统维护的页表;页框号即物理页号;页表中的每个条目称为页表项,即PTE。因此 PFN 中单一条目的作用就类似于 PTE)
内存管理事务涉及的每进程数据结构中,需要同步的包括:工作集锁(当工作集列表正在变更时持有该锁),地址空间锁(每当地址空间正被改变时持有该锁)。这些锁都使用推锁(pushlocks)实现。


Examining Memory Usage
The Memory and Process performance counter objects provide access to most of the details about system and process memory utilization. Throughout the chapter, we'll include references to specific performance counters that contain information related to the component being described. We've included relevant examples and experiments throughout the chapter.
One word of caution, however:
different utilities use varying and sometimes inconsistent or confusing names when displaying memory information. The following experiment illustrates this point. (We'll explain the terms used in this example in subsequent sections.)

审查内存使用
“内存与进程性能计数器对象”提供对绝大多数与系统和进程内存使用率细节相关的访问。贯穿本章,我们将引用特定性能计数器,这些计数器包含与本章描述的内存管理器组件有关的信息,我们也涵盖了相应的例子与实验。然而,需要提醒一下:当显示内存信息时,不同的工具使用不同的——有时是不一致或让人困惑的名称。下面的实验说明了这一点。(我们将在后续部分解释这个例子中使用的术语)


EXPERIMENT: Viewing System Memory Information
The Performance tab in the Windows Task Manager, shown in the following screen shot, displays basic system memory information. This information is a subset of the detailed memory information available through the performance counters. It includes data on both physical and virtual memory usage.

实验:查看系统内存信息
如下的屏幕截图所示,windows任务管理器中的性能标签,显示基本的系统内存信息,这个信息仅是性能计数器提供的详细内存信息的一组子集,它包含物理内存和虚拟内存使用率相关的数据。
(原文使用EN-US语系的系统截图,我把它替换成自己机器上的ZH-CN语系截图,主要是方便大家对照下面的表格来理解图中每个术语的含义)



The following table shows the meaning of the memory-related values.
下表解释任务管理器中使用的内存相关术语的含义:

Memory bar histogram
内存的条柱形图
Bar/chart line height shows physical memory in use by Windows (not available as a performance counter). The remaining height of the graph is equal to the Available counter in the Physical Memory section,
described later in the table. The total height of the graph is equal to the Total counter in that section. This represents the total RAM usable by the operating system, and does
not include BIOS shadow pages, device memory, and so on.

该条柱形图的行高显示windows使用的物理内存情况(亮绿色区域,该区域没有相应的性能计数器)。该图中剩余的高度(暗绿色区域)相当于“物理内存(MB)”栏位中的“可用”计数器,后续的表格会讲到。该图的总高度相当于栏位中的“总数”计数器;总数表示操作系统能够使用的物理内存总量,并且不包含BIOS shadow pages(直译为BIOS影子页面,也就是将一些外围硬件设备自带的 BIOS ROM 映射到系统内存)与设备内存等(将一些外围硬件设备自带的存储器或缓存映射到系统内存)。

Physical Memory (MB): Total
物理内存(以MB,百万字节为单位):总数
Physical memory usable by Windows
即windows可用的物理内存,如前所述,等于内存条形图的总高;

Physical Memory (MB): Cached
已缓存
Sum of the following performance counters in the Memory object:
Cache Bytes, Modified Page List Bytes, Standby Cache Core Bytes,
Standby Cache Normal Priority Bytes, and Standby Cache Reserve Bytes (all in Memory object)

内存对象中的一些性能计数器总合,包括Cache Bytes,Modified Page List Bytes,Standby Cache Core Bytes,Standby Cache Normal Priority Bytes,以及 Standby Cache Reserve Bytes(译注:这里保持原文,避免翻译引起的语义准确性争议)

Physical Memory (MB):Available
可用
Amount of memory that is immediately available for use by the operating system, processes, and drivers. Equal to the combined size of the standby, free, and zero page lists.
可以由操作系统,进程,驱动程序立即使用的物理内存数量,它等于备用(standby),空闲(free),以及零页列表(zero page lists)三者之和。这三者相加应等于前述的内存柱形图中的暗绿色(可用物理内存)区域。(译注:打开任务管理器中的资源监视器 ,在“物理内存”栏目中通过简单的加法即可验证,需要注意,简体中文语系windows 7 客户机系列的翻译出了一点小错误:最右边的方格图例应该是“空闲”,而非“可用”)

Physical Memory (MB): Free
空闲
Free and zero page list bytes
空闲页和零页列表中的页面总字节(译注:系统自身给出的解释为“不包含任何有价值数据<零页?>,以及当进程,驱动程序,操作系统需要更多内存时将首先使用的内存”)

Kernel Memory (MB): Paged
内核内存(以MB,百万字节为单位):分页数
Pool paged bytes. This is the total size of the pool, including both free and allocated regions
分页池的总字节,包含空闲和已分配区域;

Kernel Memory (MB): Nonpaged
未分页数
Pool nonpaged bytes. This is the total size of the pool, including both free and allocated regions
不可分页池的总字节,包含空闲和已分配区域;

System: Commit (two numbers shown)
系统栏位中的“提交”(以GB,十亿字节为单位)
Equal to performance counters Committed Bytes and Commit Limit, respectively
前后显示2个数字,分别等于Committed Bytes和Commit Limit这2个性能计数器;

To see the specific usage of paged and nonpaged pool, use the Poolmon utility, described in the “Monitoring Pool Usage” section.
使用在“监控页面池使用率”小节中讨论的工具Poolmon,可以查看分页池和非分页池的具体使用情况。

The Process Explorer tool from Windows Sysinternals (http://www.microsoft.com/technet/sysinternals) can show considerably more data about physical and virtual memory. On its main screen, click View and then System Information, and then choose the Memory tab. Here is an example display from a 32-bit Windows system:
来自Windows Sysinternals (http://www.microsoft.com/technet/sysinternals) 的Process Explorer(进程浏览器或进程资源管理器)能够显示更多有关物理内存和虚拟内存的数据。在其主界面中,单击View菜单->System Information,在打开的界面中选择Memory选项卡即可查看。下面这个显示的例子来自一个32位的windows系统:


We will explain most of these additional counters in the relevant sections later in this chapter.
Two other Sysinternals tools show extended memory information:
■ VMMap shows the usage of virtual memory within a process to an extremely fine level of detail.
■ RAMMap shows detailed physical memory usage.
These tools will be featured in experiments found later in this chapter.
Finally, the !vm command in the kernel debugger shows the basic memory management information available through the memory-related performance counters. This command can be useful if you're looking at a crash dump or hung system. Here's an example of its output from a 4-GB Windows client system:

我们将在本章后续相关部分解释这些附加的计数器。
另外2个Sysinternals工具能够显示扩展的内存信息:
■ VMMap将一个进程内的虚拟内存使用情况显示到一个极端细致的水平;
■ RAMMap显示物理内存使用情况的细节;
本章后续将通过实验来展示这些工具的特色。
最后,内核调试器中的 !vm 命令通过内存相关的性能计数器显示可用的基本内存管理信息。如果你正检查一个崩溃转储或挂掉的系统,该命令可能有用。下面的例子来自于一个4GB物理内存的windows客户机系统上的输出:


[FONT="微软雅黑"][SIZE="4"]1: kd> !vm
*** Virtual Memory Usage ***
Physical Memory: 851757 ( 3407028 Kb)
Page File: \??\C:\pagefile.sys
Current: 3407028 Kb Free Space: 3407024 Kb
Minimum: 3407028 Kb Maximum: 4193280 Kb
Available Pages: 699186 ( 2796744 Kb)
ResAvail Pages: 757454 ( 3029816 Kb)
Locked IO Pages: 0 ( 0 Kb)
Free System PTEs: 370673 ( 1482692 Kb)
Modified Pages: 9799 ( 39196 Kb)
Modified PF Pages: 9798 ( 39192 Kb)
NonPagedPool Usage: 0 ( 0 Kb)
NonPagedPoolNx Usage: 8735 ( 34940 Kb)
NonPagedPool Max: 522368 ( 2089472 Kb)
PagedPool 0 Usage: 17573 ( 70292 Kb)
PagedPool 1 Usage: 2417 ( 9668 Kb)
PagedPool 2 Usage: 0 ( 0 Kb)
PagedPool 3 Usage: 0 ( 0 Kb)
PagedPool 4 Usage: 28 ( 112 Kb)
PagedPool Usage: 20018 ( 80072 Kb)
PagedPool Maximum: 523264 ( 2093056 Kb)
Session Commit: 6218 ( 24872 Kb)
Shared Commit: 18591 ( 74364 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 2151 ( 8604 Kb)
PagedPool Commit: 20031 ( 80124 Kb)
Driver Commit: 4531 ( 18124 Kb)
Committed pages: 179178 ( 716712 Kb)
Commit limit: 1702548 ( 6810192 Kb)
Total Private: 66073 ( 264292 Kb)
0a30 CCC.exe 11078 ( 44312 Kb)
0548 dwm.exe 6548 ( 26192 Kb)
091c MOM.exe 6103 ( 24412 Kb)[/SIZE][/FONT]


We will describe many of the details of the output of this command later in this chapter.
我们将在本章稍后描述该命令输出的众多细节。

Services Provided by the Memory Manager
The memory manager provides a set of system services to allocate and free virtual memory, share memory between processes, map files into memory, flush virtual pages to disk, retrieve information about a range of virtual pages, change the protection of virtual pages, and lock the virtual pages into memory.
Like other Windows executive services, the memory management services allow their caller to supply a process handle indicating the particular process whose virtual memory is to be manipulated.
The caller can thus manipulate either its own memory or (with the proper permissions) the memory of another process. For example, if a process creates a child process, by default it has the right to manipulate the child process’s virtual memory. Thereafter, the parent process can allocate, deallocate, read, and write memory on behalf of the child process by calling virtual memory services and passing a handle to the child process as an argument. This feature is used by subsystems to manage the memory of their client processes. It is also essential for implementing debuggers because debuggers must be able to read and write to the memory of the process being debugged.

内存管理器提供的服务
内存管理器提供一组系统服务用于分配和释放虚拟内存,在进程间共享内存,将磁盘文件映射至内存,将虚拟页刷新到磁盘,检索一系列有关虚拟页的信息,更改虚拟页的保护属性,以及将虚拟页锁在内存中。与其它Windows执行体服务一样,内存管理服务允许它们的调用者提供一个进程句柄,用于指明要被操控虚拟内存的特定进程;调用者因而能够操控其自身内存,或者以适当的权限操纵其它进程的内存;例如,一个进程创建了一个子进程,默认情况下,父进程有权操控它的子进程的虚拟内存;随后,父进程可以通过调用虚拟内存服务并且传递一个该子进程的句柄作为参数,从而能够代表该子进程分配,释放,以及读写内存。这个特性被子系统用来管理它们“客户进程”的内存。这个特性对于实现调试器也是必需的,因为调试器必须能够读写被调试进程的内存


Most of these services are exposed through the Windows API. The Windows API has three groups of functions for managing memory in applications: heap functions (Heapxxx and the older interfaces Localxxx and Globalxxx, which internally make use of the Heapxxx APIs), which may be used for allocations
smaller than a page; virtual memory functions, which operate with page granularity (Virtualxxx); and memory mapped file functions (CreateFileMapping, CreateFileMappingNuma, MapViewOfFile, MapViewOfFileEx, and MapViewOfFileExNuma). (We’ll describe the heap manager later in this
chapter.)

The memory manager also provides a number of services (such as allocating and deallocating physical memory and locking pages in physical memory for direct memory access [DMA] transfers) to other kernel-mode components inside the executive as well as to device drivers. These functions begin with the prefix Mm. In addition, though not strictly part of the memory manager, some executive
support routines that begin with Ex are used to allocate and deallocate from the system heaps (paged and nonpaged pool) as well as to manipulate look-aside lists. We’ll touch on these topics later in this chapter in the section “Kernel-Mode Heaps (System Memory Pools).”

这些服务绝大多数通过Windows API对外暴露。Windows API中有3组函数用于管理应用程序内存:
堆函数(Heapxxx 以及旧版接口 Localxxx 与 Globalxxx,后2者在内部利用 Heapxxx APIs),它们可能被用来分配小于一页的内存;
虚拟内存函数(Virtualxxx),它们可用来操作页面的粒度(详情参考“大页面与小页面”一节);
内存映射文件函数(CreateFileMapping,CreateFileMappingNuma,MapViewOfFile,MapViewOfFileEx,以及 MapViewOfFileExNuma)(本章稍后会介绍堆管理)
内存管理器也向执行体内的其它内核模式组件,以及设备驱动程序,提供了若干服务(例如分配和释放物理内存;锁住物理内存中的页面,用于直接存储器访问[DMA]的信号传输),这些函数名称以 Mm为前缀。此外,尽管不属于严格意义上的内存管理器一部分,一些以前缀 Ex 开头的执行体支持例程用于从系统堆(分页和非分页池)分配和释放内存,以及操控后备列表。
我们将在本章后面的“Kernel-Mode Heaps (System Memory Pools)”(内核模式堆[系统内存池])部分中,讨论这些主题。


Large and Small Pages
The virtual address space is divided into units called pages. That is because the hardware memory management unit translates virtual to physical addresses at the granularity of a page. Hence, a page is the smallest unit of protection at the hardware level. (The various page protection options are described in the section “Protecting Memory” later in the chapter.) The processors on which Windows runs support two page sizes, called small and large. The actual sizes vary based on the processor
architecture, and they are listed in Table 10-1.

大页面与小页面
虚拟地址空间被划分为叫做页面的单元。这是由于硬件内存管理单元(译注:即MMU)以页面为粒度将虚拟地址翻译成物理地址。于是,在硬件级别,一个页就是最小的保护单元。
(本章后续的“Protecting Memory”部分,将描述各种页面保护选项)。运行Windows的处理器支持2种页面尺寸,叫做小页和大页。页面的实际大小根据处理器体系结构而有所不同,表10-1列出了这些值:
TABLE 10-1 Page Sizes(页面大小)



Note IA64 processors support a variety of dynamically configurable page sizes, from 4 KB up to 256 MB. Windows on Itanium uses 8 KB and 16 MB for small and large pages, respectively, as a result of performance tests that confirmed these values as optimal. Additionally, recent x64 processors support a size of 1 GB for large pages, but Windows does not use this feature.
注意   IA64处理器支持各种动态可配置的页面大小——从4KB到最大256MB。运行在Itanium处理器上的Windows使用8 KB小页和16 MB大页,这是由于一些性能测试确认了这些值是最优的。
同时,最近的x64处理器支持1GB的大页尺寸,但Windows没有使用该特性。


The primary advantage of large pages is speed of address translation for references to other data within the large page. This advantage exists because the first reference to any byte within a large page will cause the hardware’s translation look-aside buffer (TLB, described in a later section) to have in its cache the information necessary to translate references to any other byte within the large page.
If small pages are used, more TLB entries are needed for the same range of virtual addresses, thus increasing recycling of entries as new virtual addresses require translation. This, in turn, means having to go back to the page table structures when references are made to virtual addresses outside the scope of a small page whose translation has been cached. The TLB is a very small cache, and thus large pages make better use of this limited resource.

大页面的主要优势是加快引用页面内其它数据的地址翻译速度。存在这个优势是因为首次引用一个大页面内的任意字节,将导致CPU内部的TLB(转换后援缓冲,后面会讲到)硬件缓存必要的信息,用于翻译对该页面中其它字节的引用。
如果我们使用小页面,需要更多的TLB条目来缓存相同范围的虚拟地址空间翻译结果(译注:例如,对于一个4MB地址空间范围,需要1024个4KB小页面;只需要1个4MB大页面就能覆盖),由于TLB条目数量是固定的,这导致可用于缓存其它范围虚拟地址空间翻译结果的条目越少。换言之,当引用的虚拟地址不在任何TLB条目缓存的小页面负责的范围内时,不仅需要回到页表结构中查找(译注:这就需要更多的CPU时钟周期,因为页表通常在内存中,内存访问比TLB访问至少慢上2个数量级),而且需要回收更多旧的TLB条目用于缓存新的翻译结果。TLB是一种非常小的缓存,在其中缓存大页面能够更高效使用这个有限的资源。
(译注:TLB一般可以存储64个PTE条目,如果使用每个PTE映射4KB地址空间的小页面,那么总共能缓存4*64 = 256KB的物理地址空间,如果使用4MB的大页面,则可以缓存4*64=256MB的物理地址空间,后者明显可以降低TLB未命中从而需要访问内存查询页表的几率,关于TLB缓存地址翻译结果的工作原理,可以参考下面这个视频解说:https://youtu.be/95QpHJX55bM)


To take advantage of large pages on systems with more than 2 GB of RAM, Windows maps with large pages the core operating system images (Ntoskrnl.exe and Hal.dll) as well as core operating system data (such as the initial part of nonpaged pool and the data structures that describe the state of each physical memory page). Windows also automatically maps I/O space requests (calls by
device drivers to MmMapIoSpace) with large pages if the request is of satisfactory large page length and alignment. In addition, Windows allows applications to map their images, private memory, and page-file-backed sections with large pages. (See the MEM_LARGE_PAGE flag on the VirtualAlloc, VirtualAllocEx, and VirtualAllocExNuma functions.) You can also specify other device drivers to be mapped with large pages by adding a multistring registry value to HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\LargePageDrivers and specifying the names of the
drivers as separately null-terminated strings.

为充分利用多于2GB物理内存系统上的大页面,Windows在核心操作系统映像(Ntoskrnl.exe 与 Hal.dll)以及核心操作系统数据(例如,非分页池的初始部分,描述每个物理内存页状态的数据结构)上,使用大页面映射。
如果I/O地址空间请求(通过设备驱动程序调用MmMapIoSpace函数)能够符合大页面长度和对齐要求,Winodws也将自动使用大页面映射。此外,Windows还允许应用程序对它们的磁盘映像文件,私有内存(private memory),以及page-file-backed (暂译成“页面文件备份”)部分,使用大页面映射。(通过在VirtualAlloc,VirtualAllocEx,以及VirtualAllocExNuma函数参数中指定MEM_LARGE_PAGE标志)
你也可以通过向注册表路径HKLM\SYSTEM\CurrentControlSet\Control\SessionManager\Memory Management\  
添加一个名为 “LargePageDrivers”的多字符串值,然后指定以空字符(译注:0x00)结尾并分隔的驱动程序名称字符串作为其数据,从而配置这些设备驱动程序使用大页面映射。(译注:参照原文,本句有更好的译法请提出,不胜感激)


Attempts to allocate large pages may fail after the operating system has been running for an extended period, because the physical memory for each large page must occupy a significant number (see Table 10-1) of physically contiguous small pages, and this extent of physical pages must furthermore
begin on a large page boundary. (For example, physical pages 0 through 511 could be used as a large page on an x64 system, as could physical pages 512 through 1,023, but pages 10 through 521 could not.) Free physical memory does become fragmented as the system runs. This is not a problem for allocations using small pages but can cause large page allocations to fail.

在操作系统已经运行了一段较长时间后,尝试分配大页面可能会失败,因为用于每个大页面的物理内存必须占据为数众多,物理上连续相邻的小页面(请回顾表10-1),并且物理页面的范围必须进一步按照一个大页面的边界作为起点(例如,在x64系统上,物理页面0~511可能被用作一个大页面,物理页面512~1023按前一个大页面的边界开始,用作另一个大页面,但是,对于页面范围10~521的分配请求会失败,因为跨越了2个大页面,没有按照边界对齐。)
在这种情况下,系统运行期间,空闲物理内存确实会变得碎片化。使用小页面来分配就不会有问题;使用大页面有可能造成分配失败。


It is not possible to specify anything but read/write access to large pages. The memory is also always nonpageable, because the page file system does not support large pages. And, because the memory is nonpageable, it is not considered part of the process working set (described later). Nor are large page allocations subject to job-wide limits on virtual memory usage.
除了对于大页面的读/写访问外,无法指定其它操作。大页面内存也总是不可分页的,因为分页文件系统不支持大页面。这使得它不被认为是进程工作集的一部分(稍后解释),因而在进程虚拟内存使用方面,也没有大页面分配问题导致的工作范围限制。(译注:原文为 Nor are large page allocations subject to job-wide limits on virtual memory usage,如有更准确的译法请提出)

There is an unfortunate side effect of large pages. Each page (whether large or small) must be
mapped with a single protection that applies to the entire page (because hardware memory protection is on a per-page basis). If a large page contains, for example, both read-only code and read/write data, the page must be marked as read/write, which means that the code will be writable. This means that device drivers or other kernel-mode code could, as a result of a bug, modify what is supposed to be read-only operating system or driver code without causing a memory access violation.
If small pages are used to map the operating system’s kernel-mode code, the read-only portions of Ntoskrnl.exe and Hal.dll can be mapped as read-only pages. Using small pages does reduce efficiency of address translation, but if a device driver (or other kernel-mode code) attempts to modify a readonly part of the operating system, the system will crash immediately with the exception information pointing at the offending instruction in the driver. If the write was allowed to occur, the system would likely crash later (in a harder-to-diagnose way) when some other component tried to use the corrupted data.

使用大页面有个令人遗憾的副作用。每个页面(不论大或小)必须通过应用到整个页面的单一保护策略进行映射(因为硬件保护机制是在每一页的基础上进行)。举例来讲,如果一个大页面包含只读代码与可读写的数据,该页面必须被标记为可读写,这意味着代码将是可写的。也就是说,这将导致一个bug:设备驱动程序或者其它内核模式代码能够修改本应该是只读的操作系统或驱动程序代码,而不会造成一个违法的内存访问。如果小页面用于映射操作系统内核模式代码,Ntoskrnl.exe 与 Hal.dll 的只读部分可以被映射成只读页。使用小页面确实会降低地址翻译的效率,但是,如果一个设备驱动程序(或者其它内核模式代码)尝试修改一个操作系统的只读部分,系统将立即崩溃,附带指向驱动程序中违规指令的异常信息。另一方面,如果允许对其进行写操作,稍后当一些其它内核组件试图使用已损坏的数据时,系统将可能崩溃,并且难以被诊断调试。

If you suspect you are experiencing kernel code corruptions, enable Driver Verifier (described later in this chapter), which will disable the use of large pages.
如果怀疑自己遇到了内核代码损坏,启用 Driver Verifier(直译为“驱动验证器”)(本章后面解释),它将禁用大页面。

Reserving and Committing Pages
Pages in a process virtual address space are free, reserved, committed, or shareable. Committed and shareable pages are pages that, when accessed, ultimately translate to valid pages in physical memory.
Committed pages are also referred to as private pages. This reflects the fact that committed pages cannot be shared with other processes, whereas shareable pages can be (but, of course, might be in use by only one process).

保留的与提交的页面
一个进程虚拟地址空间中的页面可能属于下列一种:free(空闲的), reserved(保留的), committed(提交的), shareable(可共享的)。
其中,提交和共享的页面在被访问时,最终将转换(翻译)成物理内存中合法有效的页面。提交的页面也简称为private(私有的)页面。这反映出一个事实,即提交的页面不能被其它进程共享,而可共享的页面则可以(当然,只能同时被一个进程使用)。


Private pages are allocated through the Windows VirtualAlloc, VirtualAllocEx, and VirtualAllocExNuma functions. These functions allow a thread to reserve address space and then commit portions of the reserved space. The intermediate “reserved” state allows the thread to set aside a range of contiguous virtual addresses for possible future use (such as an array), while consuming negligible system resources, and then commit portions of the reserved space as needed as the application runs. Or, if the size requirements are known in advance, a thread can reserve and commit in the same function call. In either case, the resulting committed pages can then be accessed by the thread. Attempting to access free or reserved memory results in an exception because the page isn’t mapped to any storage that can resolve the reference.
私有页面是通过Windows函数VirtualAlloc, VirtualAllocEx, 以及 VirtualAllocExNuma分配的。这些函数允许一个线程预留出地址空间,然后提交部分的预留空间(作为私有页面使用)。这中间的“预留”状态允许该线程留出一系列连续相邻的虚拟地址,以备将来使用(例如数组),同时消耗的系统资源可以忽略不计,然后根据应用程序运行时的需求,提交部分的预留空间使用。或者,如果事先知道要求分配的大小,线程可以在相同的函数调用中申请预留并提交使用。不论哪种情况,由此产生的提交页面稍后可被该线程访问。试图访问空闲的(free)和保留的(reserved)内存会导致一个异常,因为这些虚拟页面没有映射到任何能够解析该引用的存储器位置。(译注:此句经过自行润色,原文为“because the page isn't mapped to any storage that can resolve the reference.”)

If committed (private) pages have never been accessed before, they are created at the time of first access as zero-initialized pages (or demand zero). Private committed pages may later be automatically written to the paging file by the operating system if required by demand for physical memory.
“Private” refers to the fact that these pages are normally inaccessible to any other process.

如果提交的(私有的)页面之前从未被访问过,在首次访问时,作为zero-initialized(初始化为零)的页面(或demand zero)来创建。
如果有对物理内存的需求,私有提交页面随后将自动被操作系统写入分页文件(从而释放内存空间)。“私有的”指一事实:即这些页面对任何其它进程而言,通常是不可访问的。


Note There are functions, such as
ReadProcessMemory and WriteProcessMemory, that apparently permit cross-process memory access, but these are implemented by running kernel-mode code in the context of the target process (this is referred to as attaching to the process). They also require that either the security descriptor of the target process grant the accessor the PROCESS_VM_READ or PROCESS_VM_WRITE right, respectively, or that
the accessor holds SeDebugPrivilege, which is by default granted only to members of the Administrators group.

注意   诸如ReadProcessMemory 和 WriteProcessMemory此类函数,表面上允许跨进程的内存访问,但这是通过在目标进程的上下文中运行内核模式代码实现的(这被称为“附加到该进程”)此类函数要么需要目标进程的安全描述符授予访问者PROCESS_VM_READ或者PROCESS_VM_WRITE权限,要么访问者必须持有SeDebugPrivilege权限,后者默认仅授予管理员组中的成员帐户。

Shared pages are usually mapped to a view of a section, which in turn is part or all of a file, but may instead represent a portion of page file space. All shared pages can potentially be shared with other processes. Sections are exposed in the Windows API as file mapping objects.
When a shared page is first accessed by any process, it will be read in from the associated mapped file (unless the section is associated with the paging file, in which case it is created as a zero-initialized page). Later, if it is still resident in physical memory, the second and subsequent processes accessing it can simply use the same page contents that are already in memory. Shared pages might also have been prefetched by the system.

共享页面通常映射到“一节”视图,该节又是一个磁盘文件的一部分或者全部,而不是代表磁盘分页文件的一部分。(译注:这句行文非常怪异,原文为“Shared pages are usually mapped to a view of a section, which in turn is part or all of a file, but may instead represent a portion of page file space”)。所有共享页面都有可能被其它进程共享。其中一部分共享页面通过Windows API,作为“文件映射对象“对外暴露。
当一个共享的页面首次被任何进程访问时,将从(磁盘上)关联的映射文件读入内存(直到该部分与页面文件关联,在此情况下,它被作为一个初始化为零的页面[zero-initialized page]创建)。
随后,如果此共享页面仍旧驻留在物理内存中,第二个以及后续的进程访问时,可以简单地使用已经在内存中的页面内容(与磁盘上的相同)。共享的页面也可能已经被系统预取(prefetched)进内存,甚至不需要等待进程首次访问才读入。


Two upcoming sections of this chapter, “Shared Memory and Mapped Files” and “Section Objects,” go into much more detail about shared pages. Pages are written to disk through a mechanism called modified page writing. This occurs as pages are moved from a process’s working set to a systemwide list called the modified page list; from there, they are written to disk (or remote storage). (Working
sets and the modified list are explained later in this chapter.) Mapped file pages can also be written back to their original files on disk as a result of an explicit call to FlushViewOfFile or by the mapped page writer as memory demands dictate.

本章即将讨论的两部分:“共享内存和映射文件”与“Section Objects”(译注:暂译为“节对象”,欢迎指正),涉及更多有关共享页面的细节。页面通过一种叫做modified page writing的机制被写入磁盘。这种情况发生在页面被从一个进程的工作集中移动到一个叫做modified page list(已修改页面列表)的系统级列表中的时候;该列表中的页面被写入磁盘(或远程存储)。(本章稍后将解释工作集和已修改列表)。用于映射文件的页面也可以被写回它们在磁盘上的原始文件,这通过两种方法实现:显式调用FlushViewOfFile函数;或者通过映射页面回写器(参考前面讲到的6个顶级内核模式线程)根据系统当前的内存需求情况决定是否写入。

You can decommit private pages and/or release address space with the VirtualFree or VirtualFreeEx function. The difference between decommittal and release is similar to the difference between reservation and committal—decommitted memory is still reserved, but released memory has been freed;
it is neither committed nor reserved.

你可以通过VirtualFree 或 VirtualFreeEx函数,回收私有页面并且/或者释放地址空间。“回收的”与“释放”之间的区别,类似于“保留的”与“提交的”之间的区别——回收的内存仍然是被保留的,而释放的内存就已经被释放了(has been freed);它既不属于提交的,也不属于保留的。

Reserving memory is a relatively inexpensive operation because it consumes very little actual memory. All that needs to be updated
or constructed is the relatively small internal data structures that represent the state of the process address space. (We’ll explain these data structures, called page tables and virtual address descriptors, or VADs, later in the chapter.)

保(预)留内存是相对廉价的操作,因为它只消耗非常少的内存。所有需要更新或构建的只是比较小的,用于表示进程地址空间状态的内部数据结构。(我们将在本章后面解释这些被称为“页表”和“虚拟地址描述符,VADs”的数据结构)

One extremely common use for reserving a large space and committing portions of it as needed is the user-mode stack for each thread. When a thread is created, a stack is created by reserving a contiguous portion of the process address space. (1 MB is the default; you can override this size with the CreateThread and CreateRemoteThread function calls or change it on an imagewide basis by using the /STACK linker flag.) By default, the initial page in the stack is committed and the next page is marked as a guard page (which isn’t committed) that traps references beyond the end of the committed portion of the stack and expands it.
对于保留一大段地址空间并且按需提交其中一部分来使用,有一个极其常见的例子——每个线程的用户模式栈。当一个线程被创建时,通过在进程地址空间中预留出一段连续相邻的部分来创建栈。(默认是1MB,你可以通过CreateThread 与 CreateRemoteThread 函数调用覆盖默认值,或者使用 /STACK 链接器标志,在整个映像文件的基础上更改默认值)。默认情况下,栈中的初始页面是提交的,并且接下来的页面被标记为保护页(非提交的),引用超出栈中提交部分结尾的页面时,通过陷阱分发机制执行特定的异常处理程序来扩展当前提交的页面。(译注:本句请参考原文,有更好的译法欢迎提出或指正)

EXPERIMENT: Reserved vs. Committed Pages
The TestLimit utility (which you can download from the Windows Internals book webpage) can be used to allocate large amounts of either reserved or private committed virtual memory, and the difference can be observed via Process Explorer. First, open two Command Prompt windows.
Invoke TestLimit in one of them to create a large amount of reserved memory:

实验:保留的与提交的页面
TestLimit工具(您可以从本书的web页面下载)可以用于分配大量的保留,或者私有提交虚拟内存,这两者的差异可以通过进程浏览器(Process Explorer)来观察。首先,打开2个命令行提示符窗口。在其中一个调用TestLimit命令创建一段大量的保留内存:

[FONT="微软雅黑"][SIZE="4"]C:\temp>testlimit -r 1 -c 800
Testlimit v5.2 - test Windows limits
Copyright (C) 2012 Mark Russinovich
Sysinternals - wwww.sysinternals.com
Process ID: 1544
Reserving private bytes 1 MB at a time ...
Leaked 800 MB of reserved memory (800 MB total leaked). Lasterror: 0
The operation completed successfully.[/SIZE][/FONT]

In the other window, create a similar amount of committed memory:
在另一个窗口,创建相同数量的提交内存:
[FONT="微软雅黑"][SIZE="4"]C:\temp>testlimit -m 1 -c 800
Testlimit v5.2 - test Windows limits
Copyright (C) 2012 Mark Russinovich
Sysinternals - wwww.sysinternals.com
Process ID: 2828
Leaking private bytes 1 KB at a time ...
Leaked 800 MB of private memory (800 MB total leaked). Lasterror: 0
The operation completed successfully.[/SIZE][/FONT]


Now run Task Manager, go to the Processes tab, and use the Select Columns command on the View menu to include Memory—Commit Size in the display. Find the two instances of TestLimit in the list. They should appear something like the following figure.
现在,打开Windows任务管理器,切换到进程选项卡,在主菜单的“查看”->“选择列”,勾选“提交大小”来显示提交的内存。在进程列表中找到2个TestLimit实例,应该如下图所示:


Task Manager shows the committed size, but it has no counters that will reveal the reserved memory in the other TestLimit process.
Finally, invoke Process Explorer. Choose View, Select Columns, select the Process Memory tab, and enable the Private Bytes and Virtual Size counters. Find the two TestLimit processes in the main display:

任务管理器显示出提交大小,但是没有计数器揭示关于保留内存的信息。最后,我们调用进程浏览器,在主菜单的“查看”->“选择列”,切换到进程内存选项卡,然后勾选Private Bytes与Virtual Size计数器的复选框。在主界面中找出2个TestLimit进程:


Notice that the virtual sizes of the two processes are identical, but only one shows a value for Private Bytes comparable to that for Virtual Size. The large difference in the other TestLimit process (process ID 1544) is due to the reserved memory.
注意,2个TestLimit进程的Virtual Size完全相同,但是只有其中一个显示出与Virtual Size大小接近的Private Bytes值。进程ID为1544的TestLimit进程的Private Bytes与Virtual Size差别如此之大,是因为此进程创建保留的内存,因为还没有实际提交使用,因此它不会被Private Bytes计数器考虑在内。
(译注:前文提到,“保[预]留内存是相对廉价的操作,因为它只消耗非常少的内存”,也许就是它的Private Bytes只有2.8M,而不是822M的原因。而对于Virtual Size计数器,不论是保留的还是提交的,都会将其考虑在内,这一点是需要注意的。下面提供在我自己机器上的实验截图,结果基本与上面描述的类似:)






The same comparison could be made in Performance Monitor by looking at the Process | Virtual Bytes and Process | Private Bytes counters.
可以在性能监视器中,查看 Process | Virtual Bytes and Process | Private Bytes 计数器,作出同样的比较。

Commit Limit
On Task Manager’s Performance tab, there are two numbers following the legend Commit. The memory manager keeps track of private committed memory usage on a global basis, termed commitment or commit charge; this is the first of the two numbers, which represents the total of all committed virtual memory in the system.
There is a systemwide limit, called the system commit limit or simply the commit limit,  This limit corresponds to the current total size of all paging files, plus the amount of RAM that is usable by the operating system.
This is the second of the two numbers displayed as Commit on Task Manager’s Performance tab. The memory manager can increase the commit limit automatically by expanding one or more of the paging files, if they are not already at their configured maximum size.
Commit charge and the system commit limit will be explained in more detail in a later section.

提交限制
在任务管理器的性能选项卡,“提交(GB)”后面跟着2个数值。内存管理器在全局基础上,跟踪记录私有提交内存的使用情况,这被称为commitment 或 commit charge,即第一个数值,它代表系统中所有提交的虚拟内存总合。还有一个系统级限制,叫做system commit limit 或者简称 commit limit,这个限制对应当前的所有分页文件总大小,增加可被操作系统使用的物理内存总量,即第二个数值。内存管理器可以通过扩展一个或多个分页文件,自动增加commit limit上限。(如果它们尚未配置最大上限)。本章后面部分将详细解释Commit charge 与 system commit limit。
(译注:经过实践发现,能为进程分配的虚拟内存大小受限于“system commit limit”的大小,即系统探测到的物理内存大小加上分页文件的大小,例如,在32位系统上,假设识别到的物理内存总量为3GB,并且设置了初始大小为2GB的分页文件,那么system commit limit的值为5GB;如果所有当前运行的进程被分配的虚拟内存总合达到这个限制,新的进程将无法运行,windows会给出页面文件太小,无法执行应用程序的错误提示;例如,运行vmware至少需要2GB的空闲system commit,以前面的5GB上限为例,如果当前的system commit已经使用了3GB,那么将无法运行vmware,即便勉强启动程序,系统的响应速度也会变得异常缓慢。此时可以通过增大页面文件的默认最小值来扩大“交换区”的大小,好让更多当前用不到的物理内存页能够交换到磁盘[因为磁盘的页面文件交换区已经扩大],从而给vmware的进驻内存准备更多的空间。配置的方法:在计算机上右击 ->属性 -> 高级系统设置 ->高级选项卡 -> 单击性能栏目的“设置”,再切换到“高级选项卡”,单击“虚拟内存”栏目的“更改”。虚拟内存的用词可能会产生误导,实际上它就是用来配置磁盘页面文件大小的。
使用sysinternal的进程浏览器,单击最上方工具栏右侧的第二个矩形区域,就能显示system commit的统计数据。另外还要注意一点:
由于32位windows的内核代码限制了系统识别的物理内存上限为4GB,再扣掉为硬件寻址[CPU芯片内的核心显卡共享系统内存,外围I/O设备自带的硬件缓冲区映射系统内存等等]保留的地址空间,实际能使用的仅有3GB左右的物理内存,下图给出为了寻址各种总线,总线控制器,以及充当集成显卡显存,而保留的内存范围例子:



因此无法通过添加物理内存的容量来提高system commit上限,只能通过增大页面交换文件的方式,而推荐设置的页面文件大小通常是识别出的物理内存的1.5~2倍;
这意味着32位windows 的页面文件大小为4.5~6GB,则system commit limit的值就为7.5~9GB;相反,64位windows内核代码支持CPU的48位地址总线模式,可以寻址16TB地址空间[Intel X64体系结构当前使用48位地址总线,理论上支持256TB地址空间,但是64位windows仅使用其中的44位,因此只能寻址16TB物理内存],以及通过各种非官方渠道开启物理地址扩展[PAE补丁]的32位windows,可以使用36位地址空间,也就是能够识别64GB物理内存,所以在64位windows和32位 PAE windows上,可以通过添加物理内存的容量来提高system commit上限,从而能够“同时”运行更多应用程序;一般而言,增大物理内存会比增大页面交换文件来的有效。)


Locking Memory
In general, it’s better to let the memory manager decide which pages remain in physical memory.
However, there might be special circumstances where it might be necessary for an application or device driver to lock pages in physical memory. Pages can be locked in memory in two ways:
■ Windows applications can call the VirtualLock function to lock pages in their process working set. Pages locked using this mechanism remain in memory until explicitly unlocked or until the process that locked them terminates. The number of pages a process can lock can’t exceed its minimum working set size minus eight pages. Therefore, if a process needs to lock more pages, it can increase its working set minimum with the SetProcessWorkingSetSizeEx function (referred to in the section “Working Set Management”).
■ Device drivers can call the kernel-mode functions MmProbeAndLockPages, MmLockPagableCodeSection, MmLockPagableDataSection, or
MmLockPagableSectionByHandle. Pages locked using this mechanism remain in memory until explicitly unlocked. The last three of these APIs enforce no quota on the number of pages that can be locked in memory because the resident available page charge is obtained when the driver first loads; this ensures that it can never
cause a system crash due to overlocking. For the first API, quota charges must be obtained or the API will return a failure status.

锁定内存
一般而言,最好让内存管理器决定哪些页面保留在物理内存中。然而,可能出现应用程序或设备驱动程序有必要在物理内存中锁定页面的特殊情况。有2种途径可以将页面锁在内存中:
■ Windows应用程序可以调用VirtualLock函数,在它们的进程工作集中锁定页面。页面锁定使用这种机制保留在物理内存中,直到显式解锁,或者直到锁定的进程终止。一个进程可以锁定的页面数量不能超出它的最小工作集大小减去8个页。因此,如果一个进程需要锁定更多页面,它可以通过SetProcessWorkingSetSizeEx函数,增加它的最小工作集(在“工作集管理”部分会提到)。
■ 对于设备驱动程序,可以调用内核模式函数MmProbeAndLockPages,MmLockPagableCodeSection,MmLockPagableDataSection,或者 MmLockPagableSectionByHandle。
页面锁定使用这种机制保留在物理内存中,直到显式解锁。最后3个APIs 强制无配额可被锁在内存中的页面数量,因为当驱动首次加载时,就获取了驻留在内存中的可用页面“装量”;强制无法配额可以确保驱动程序绝不会因为锁定过多的页面导致系统崩溃。对于第一个API(函数),必须获得配额装量,否则该函数会返回一个失败状态。(译注:本段译文不尽理想,有更好的译法请提出或指正)


Allocation Granularity
Windows aligns each region of reserved process address space to begin on an integral boundary defined by the value of the system allocation granularity, which can be retrieved from the Windows GetSystemInfo or GetNativeSystemInfo function. This value is 64 KB, a granularity that is used by the memory manager to efficiently allocate metadata (for example, VADs, bitmaps, and so on) to support various process operations. In addition, if support were added for future processors with larger page
sizes (for example, up to 64 KB) or virtually indexed caches that require systemwide physical-to-virtual page alignment, the risk of requiring changes to applications that made assumptions about allocation alignment would be reduced.

分配粒度
Windows将每个保留的进程地址空间区域,按照一个完整的边界为起始进行对齐,这由系统分配粒度值定义,该值可以通过Windows函数GetSystemInfo 或 GetNativeSystemInfo 检索。
这个值的大小为64KB,是被内存管理器使用的一个粒度,用于高效分配元数据(例如VADs,位图等等)来支持各种进程操作。此外,如果往后的处理器增加了对大页尺寸的支持(例如,最多到64KB),或者增加虚拟索引缓存(要求全系统的物理页对齐虚拟页),将减少变成要求应用程序来假设分配对齐(粒度)的风险。


Note Windows kernel-mode code isn’t subject to the same restrictions; it can reserve memory on a single-page granularity (although this is not exposed to device drivers for the reasons detailed earlier). This level of granularity is primarily used to pack TEB allocations more densely, and because this mechanism is internal only, this code can easily be changed if a future platform requires different values. Also, for the purposes of supporting 16-bit and MS-DOS applications on x86 systems only, the memory manager provides the MEM_DOS_LIM flag to the MapViewOfFileEx API, which is used to force the use of single page
granularity.

注意   Windows内核模式代码不受相同的限制;内核代码可以在一个单页粒度上保留内存(尽管由于稍早详细说明过的一些原因,该机制并没有暴露给设备驱动程序)。这个级别的粒度主要用于将TEB分配封装得更密集,并且由于该机制仅用于内部,如果往后的平台需要使用不同的值,机制相关的代码可以很容易地改变。同时,为了仅在x86系统上支持16-bit 和 MS-DOS 应用程序这一目的,内存管理器在 MapViewOfFileEx API 函数中,提供了MEM_DOS_LIM标志,用于强制使用单页粒度。

Finally, when a region of address space is reserved, Windows ensures that the size and base of the region is a multiple of the system page size, whatever that might be. For example, because x86 systems use 4-KB pages, if you tried to reserve a region of memory 18 KB in size, the actual amount reserved on an x86 system would be 20 KB. If you specified a base address of 3 KB for an 18-KB
region, the actual amount reserved would be 24 KB. Note that the VAD for the allocation would then also be rounded to 64-KB alignment/length, thus making the remainder of it inaccessible. (VADs will be described later in this chapter.)

最后,当地址空间的一个区域为保留的,Windows确保该区域的大小和基址是一个多重系统页面大小,无论它是多少。举例来说,由于x86系统使用4KB页面,如果你尝试保留一个18KB的内存区域,在x86系统上,实际的保留数量将会是20KB。如果你为一个18KB大小的区域指定3KB的基址,实际的保留数量将会是24KB。注意,对于分配的VAD,随后也将舍入到64KB对齐/长度,从而使得其剩余部分无法访问。

Shared Memory and Mapped Files
As is true with most modern operating systems, Windows provides a mechanism to share memory among processes and the operating system. Shared memory can be defined as memory that is visible to more than one process or that is present in more than one process virtual address space. For example, if two processes use the same DLL, it would make sense to load the referenced code pages for that DLL into physical memory only once and share those pages between all processes that map the DLL, as illustrated in Figure 10-1.

共享内存与文件映射
与多数现代操作系统一样,Windows 提供一种机制用于在进程和操作系统之间共享内存。共享内存可以被定义为:多于一个进程可见的,或者存在于多个进程虚拟地址空间中的内存。举例来说,如果 2 个进程使用相同的 DLL,合理的做法是,只需要将该 DLL 中被引用的代码页加载进物理内存一次,并且在所有映射该 DLL 的进程间共享那些页面,如图 10-1 所示:



Each process would still maintain its private memory areas in which to store private data, but the DLL code and unmodified data pages could be shared without harm. As we’ll explain later, this kind of sharing happens automatically because the code pages in executable images (.exe and .dll files, and
several other types like screen savers (.scr), which are essentially DLLs under other names) are mapped as execute-only and writable pages are mapped as copy-on-write. (See the section “Copy-on-Write” for more information.)
The underlying primitives in the memory manager used to implement shared memory are called section objects, which are exposed as file mapping objects in the Windows API. The internal structure and implementation of section objects are described in the section “Section Objects” later in this chapter.
This fundamental primitive in the memory manager is used to map virtual addresses, whether in main memory, in the page file, or in some other file that an application wants to access as if it were in memory. A section can be opened by one process or by many; in other words, section objects don’t necessarily equate to shared memory.

每个进程将仍旧维护自己的私有内存区域,在其中存储私有数据,但是 DLL 代码页面和未修改数据页面(意味着可写)可以被无损害地共享。正如我们稍后将解释的,这种形式的共享会自动发生,因为可执行映像(.exe,.dll文件,以及几种其它类型文件,例如屏幕保护文件[.scr],包含其它必需的DLLs名称)中的代码页被映射为仅执行;可写页被映射为写时复制。(更多内容参见“写时复制”小节)
内存管理器中用来实现共享内存的底层原语叫做 section objects(暂译为"section 对象" ),它作为(或通过)Windows API 中的文件映射对象对外暴露。
本章后面的“Section Objects”小节将解释 section 对象的内部结构与实现。
内存管理器中的这个基本原语用于映射虚拟地址,无论是在主存中,分页文件中,或者在应用程序要访问的一些其它文件中,通过 section 对象,在进程看来就好像在内存中一样。一个 section 可以被一个或多个进程开启;换句话说,section 对象不一定等同于共享内存


A section object can be connected to an open file on disk (called a mapped file) or to committed memory (to provide shared memory). Sections mapped to committed memory are called page-filebacked sections because the pages are written to the paging file (as opposed to a mapped file) if demands on physical memory require it. (Because Windows can run with no paging file, page-filebacked
sections might in fact be “backed” only by physical memory.) As with any other empty page that is made visible to user mode (such as private committed pages), shared committed pages are always zero-filled when they are first accessed to ensure that no sensitive data is ever leaked.

一个 section 对象可以被连接到一个打开的磁盘文件(叫做映射文件)或连接到提交的内存(用于提供共享内存)后者叫做 page-file-backed sections(页面文件备份 sections),这是由于,如果有对物理内存的需求,该提交页将被写入页面文件(而不是映射文件)。(由于 Windows 可以完全不使用页面文件,页面文件备份 sections 实际上可能仅“备份”在物理内存中)正如任何其它用户模式可见的空页面(例如私有提交页面),当共享提交页面被首次访问时,总是用零填充(以前的页面内容),确保没有敏感数据被泄漏。

To create a section object, call the Windows CreateFileMapping or CreateFileMappingNuma function, specifying the file handle to map it to (or INVALID_HANDLE_VALUE for a page-file-backed section) and optionally a name and security descriptor. If the section has a name, other processes
can open it with OpenFileMapping. Or you can grant access to section objects through either handle inheritance (by specifying that the handle be inheritable when opening or creating the handle) or handle duplication (by using DuplicateHandle). Device drivers can also manipulate section objects with the ZwOpenSection, ZwMapViewOfSection, and ZwUnmapViewOfSection functions.

使用 Windows 函数 CreateFileMapping 或 CreateFileMappingNuma 创建 section 对象,指定要映射到的文件句柄(或者,对于页面文件备份 sections ,指定 INVALID_HANDLE_VALUE 参数),以及一个可选的名称,一个安全描述符。如果有节名称,其它进程可以通过 OpenFileMapping 函数将其打开。或者,你可以使用句柄继承(通过在打开或创建句柄时,指定要被继承的句柄)和句柄复制(通过使用 DuplicateHandle 函数)授予对该节对象的访问。设备驱动程序也可以使用 ZwOpenSection,ZwMapViewOfSection,以及 ZwUnmapViewOfSection 等函数操纵 section 对象
(译注:CreateFileMapping 函数原型如下:
[FONT="微软雅黑"][SIZE="4"][COLOR="Black"]HANDLE CreateFileMapping(
  HANDLE hFile,
  LPSECURITY_ATTRIBUTES lpFileMappingAttributes,
  DWORD flProtect,
  DWORD dwMaximumSizeHigh,
  DWORD dwMaximumSizeLow,
  LPCTSTR lpName 
);[/COLOR][/SIZE][/FONT]

第一个参数就是要映射到的文件句柄,其值为 INVALID_HANDLE_VALUE,即无效的句柄值时,实际上是创建了实现共享内存的内核对象(section 对象),因为没有任何文件句柄与一个打开的磁盘文件关联;第二个参数是一个 SECURITY_ATTRIBUTES 结构类型的变量;该结构定义如下:
[FONT="微软雅黑"][SIZE="4"][COLOR="Black"]typedef struct _SECURITY_ATTRIBUTES {
  DWORD  nLength;
  LPVOID lpSecurityDescriptor;      //只有该成员与安全性有关
  BOOL   bInheritHandle;
} SECURITY_ATTRIBUTES, *PSECURITY_ATTRIBUTES, *LPSECURITY_ATTRIBUTES;[/COLOR][/SIZE][/FONT]

其中第二个成员就是原文提及的“安全描述符”;如果我们想对创建的内核对象施加访问控制,就必须创建一个 lpSecutrtyDescripter,分配并初始化 SECURITY_ATTRIBUTES 结构:
[FONT="微软雅黑"][SIZE="4"][COLOR="Black"]SECURITY_ATTRIBUTES sa;
sa.nLength = sizeof(sa);
sa.lpSecutrtyDescripter = pSD;
sa.bInheritHandle = FALSE;[/COLOR][/SIZE][/FONT]

调用创建内核对象的函数时,将 SECURITY_ATTRIBUTES 结构变量的地址作为第二个实参传入:
[FONT="微软雅黑"][SIZE="4"][COLOR="Black"]HANDLE hFileMapping = CreateFileMapping(INVALID_HANDLE_VALUE, &sa, PAGE_READWRITE, 0, 1024, TEXT("MyFileMapping"));[/COLOR][/SIZE][/FONT]

最后一个参数就是原文提及的可选的 section 名称,使用 TEXT 宏的目的在于根据编译器的UNICODE 设置情况,自动转换名称为 ASCII/ANSI 或 unicode 字符串。原文提到,如果有section 名称,其它进程可以通过 OpenFileMapping 函数将其打开,也就是如下调用:
[FONT="微软雅黑"][SIZE="4"][COLOR="Black"]HANDLE hFileMapping = OpenFileMapping(FILE_MAP_READ, FALSE, TEXT("MyFileMapping"));[/COLOR][/SIZE][/FONT]

将 FILE_MAP_READ 作为第一个参数传给 OpenFileMapping,表明在获得对这个内核对象的访问权后,要从中读取数据。OpenFileMapping 函数在返回一个有效的句柄值前,会先执行一次安全检查。如果当前登录的用户或以特定用户帐户身份执行的进程被允许访问该文件映射内核对象(section 对象),OpenFileMapping 返回一个有效的句柄值;如果访问被拒绝,则返回 NULL;此时调用 GetLastError,返回值为 5(ERROR_ACCESS_DENIED)。同样的,如果利用返回的有效句柄调用其它 Windows API,但被调函数需要的权限不是 FILE_MAP_READ,也会发生拒绝访问错误。
下面的例子中,父进程在创建一个互斥量内核对象时,使用了句柄继承:
[FONT="微软雅黑"][SIZE="4"][COLOR="Black"]SECURITY_ATTRIBUTES sa;
sa.nLength = sizeof(sa);
sa.lpSecutrtyDescripter = NULL;
sa.bInheritHandle = TRUE;

HANDLE hMutex = CreateMutex(&sa, FALSE, NULL);[/COLOR][/SIZE][/FONT]

根据 MSDN 上相关内容的解释,在调用需要 SECURITY_ATTRIBUTES 结构实例作为参数的函数时(多数创建内核对象的函数都要求此参数),如果将其设置为 NULL,或者将其 lpSecutrtyDescripter 成员设置为 NULL(如上述代码所示),那么创建的对象具有默认安全性——来自于调用或创建者的安全令牌(例如 ACLs ,访问控制列表)。
其次,如果将 SECURITY_ATTRIBUTES 结构的 bInheritHandle 成员设置为 TRUE,然后向 CreateMutex() 传入这个结构实例的地址,那么该函数返回的互斥量句柄值是可继承的。如此一来,在父进程的内核对象句柄表中将添加一项互斥量句柄值,其“可继承的标志位” = 1。
接着,父进程调用 CreateProcess() 创建子进程时,将前面的句柄值作为 CreateProcess() 的第2个参数 pszCommandLine 传递,并且将第5个参数 bInheritHandles 的值设为 TURE(表明“主动”请求继承;反之如果 bInheritHandles = FALSE ,则无论 SECURITY_ATTRIBUTES 结构的 bInheritHandle 成员值为何,都不会继承 ),这样子进程就会继承父进程句柄表中,可继承的标志位 = 1 的句柄。然后父子进程可以使用相同的句柄值访问同一个互斥量对象。CreateProcess() 函数原型如下:
[FONT="微软雅黑"][SIZE="4"][COLOR="Black"]BOOL WINAPI CreateProcess(
  _In_opt_       LPCTSTR                                     lpApplicationName,
  _Inout_opt_  LPTSTR                                       lpCommandLine,
  _In_opt_       LPSECURITY_ATTRIBUTES           lpProcessAttributes,
  _In_opt_       LPSECURITY_ATTRIBUTES           lpThreadAttributes,
  _In_              BOOL                                          bInheritHandles,
  _In_              DWORD                                      dwCreationFlags,
  _In_opt_       LPVOID                                       lpEnvironment,
  _In_opt_       LPCTSTR                                     lpCurrentDirectory,
  _In_              LPSTARTUPINFO                         lpStartupInfo,
  _Out_            LPPROCESS_INFORMATION        lpProcessInformation
);
[/COLOR][/SIZE][/FONT]

CreateProcess() 是一个复杂的函数,它拥有 10 个参数,每个参数都足够复杂,因此想要用好 CreateProcess() 并不容易。详细用法可以参考 MSDN 网站上的原文;日后有机会再发一帖该原文的原创翻译。
句柄继承导致增加内核对象的使用计数,因为父子进程引用相同的对象;仅当父子进程都调用 CloseHandle() 关闭引用该对象的句柄,或者父子进程都退出,让使用计数 = 0,该对象才会实际在内核空间中被销毁。强调一下,如果此后父进程又创建新的内核对象并将其设置为可继承的,现存子进程并不会自动继承新的对象,句柄继承仅发生在 CreateProcess() 调用并传入了相关参数时。


A section object can refer to files that are much larger than can fit in the address space of a process.
(If the paging file backs a section object, sufficient space must exist in the paging file and/or RAM to contain it.) To access a very large section object, a process can map only the portion of the section object that it requires (called a view of the section) by calling the MapViewOfFile, MapViewOfFileEx, or MapViewOfFileExNuma function and then specifying the range to map. Mapping views permits processes to conserve address space because only the views of the section object needed at the time must be mapped into memory.

一个 section 对象可以引用远大于一个进程地址空间能够容纳的文件。(如果在分页文件中备份一个 section 对象,分页文件以及/或者物理内存中必须预留足够的空间包含它,如前所述,这是由于取决于系统的内存使用情况,section 对象需要在两者间进行读写操作)如果进程要访问一个非常大的 section 对象,可以仅映射实际需要的一部分(称为 section 视图),这可以通过调用 MapViewOfFile,MapViewOfFileEx,或者MapViewOfFileExNuma 函数,并且指定要映射的范围。这种机制允许进程节约地址空间,因为只有在需要该 section 对象的部分内容时,才被映射进内存。

Windows applications can use mapped files to conveniently perform I/O to files by simply making them appear in their address space. User applications aren’t the only consumers of section objects: the image loader uses section objects to map executable images, DLLs, and device drivers into memory, and the cache manager uses them to access data in cached files. (For information on how
the cache manager integrates with the memory manager, see Chapter 11, “Cache Manager.”) The implementation of shared memory sections, both in terms of address translation and the internal data structures, is explained later in this chapter.

Windows 应用程序可以使用映射文件,通过简单地让它们出现在调用进程地址空间中,方便对其执行 I/O 操作。用户应用程序并不是唯一的 section 对象“消费者”:映像加载器也使用 section 对象来将可执行映像,DLLs,以及设备驱动程序映射进内存,还有缓存管理器也使用 section 对象访问缓存文件中的数据。(更多有关缓存管理器如何与内存管理器集成在一起的信息,参见第11章“缓存管理”)
本章后面会解释共享内存部分的实现,包括地址翻译和内部数据结构两方面。


EXPERIMENT: Viewing Memory Mapped Files
You can list the memory mapped files in a process by using Process Explorer from Sysinternals.
To view the memory mapped files by using Process Explorer, configure the lower pane to show the DLL view. (Click on View, Lower Pane View, DLLs.) Note that this is more than just a list of DLLs—it represents all memory mapped files in the process address space. Some of these are DLLs, one is the image file (EXE) being run, and additional entries might represent memory mapped data files.
For example, the following display from Process Explorer shows a WinDbg process using
several different memory mappings to access the memory dump file being examined. Like most Windows programs, it (or one of the Windows DLLs it is using) is also using memory mapping to access a Windows data file called Locale.nls, which is part of the internationalization support in Windows. You can also search for memory mapped files by clicking Find, DLL. This can be useful when trying to determine which process(es) are using a DLL or a memory mapped file that you are trying to replace.

实验:查看内存映射文件
你可以通过使用来自 Sysinternals 的进程浏览器,列出一个进程中的内存映射文件。要这么做,将进程浏览器的下窗格配置成显示“DLL view”(点击主菜单中的”View“->“Lower Pane View”
->“DLLs”。)注意,DLL view 不仅仅是一份 DLLs 列表——它代表该进程地址空间中的所有内存映射文件。其中一些是 DLLs,有一个是正在运行进程的对应磁盘映像文件(EXE),其它项目可能代表着内存映射数据文件。举例来说,下面的进程浏览器截图显示出一个 WinDbg 进程使用几种不同的内存映射来访问被审查的内存转储文件。如同多数 Windows 程序,该进程(或者它正使用的其中一个 Windows DLL)亦使用内存映射来访问一个叫做 Locale.nls 的 Windows 数据文件;Locale.nls 是 Windows 中支持国际化的组成部分。你也可以通过点击主菜单“Find”->“DLL”,来搜索内存映射文件。在试图判断那个或哪些进程正使用一个 DLL,或者尝试替换一个内存映射文件时,这可能会有所帮助。




Protecting Memory
As explained in Chapter 1, “Concepts and Tools,” in Part 1, Windows provides memory protection so that no user process can inadvertently or deliberately corrupt the address space of another process or of the operating system. Windows provides this protection in four primary ways.
First, all systemwide data structures and memory pools used by kernel-mode system components can be accessed only while in kernel mode—user-mode threads can’t access these pages. If they attempt to do so, the hardware generates a fault, which in turn the memory manager reports to the thread as an access violation.

内存保护
如同在本书上册第一章“概念和工具”中解释的,Windows 提供内存保护机制, 以至于没有用户进程可以无意或刻意损坏另一个进程或操作系统的地址空间。Windows 通过4种主要方式提供内存保护。
第一,所有内核模式系统组件使用的系统级别数据结构和内存池,只能够在内核模式下被访问——用户模式线程不能够访问这些页面。如果试图这么做,硬件生成一个错误,从而内存管理器反过来向该线程报告一个非法访问。


Second, each process has a separate, private address space, protected from being accessed by any thread belonging to another process. Even shared memory is not really an exception to this because each process accesses the shared regions using addresses that are part of its own virtual address space. The only exception is if another process has virtual memory read or write access to the process
object (or holds SeDebugPrivilege) and thus can use the ReadProcessMemory or WriteProcessMemory function. Each time a thread references an address, the virtual memory hardware, in concert with the memory manager, intervenes and translates the virtual address into a physical one. By controlling how virtual addresses are translated, Windows can ensure that threads running in one process don’t
inappropriately access a page belonging to another process.
Third, in addition to the implicit protection virtual-to-physical address translation offers, all processors supported by Windows provide some form of hardware-controlled memory protection (such as read/write, read-only, and so on); the exact details of such protection vary according to the processor.
For example, code pages in the address space of a process are marked read-only and are thus protected from modification by user threads.
Table 10-2 lists the memory protection options defined in the Windows API. (See the VirtualProtect, VirtualProtectEx, VirtualQuery, and VirtualQueryEx functions.)

第二,每个进程有一个独立,私用的地址空间,防止被属于另一进程的任何线程访问。甚至共享内存也不例外,因为每个进程使用其自身虚拟地址空间的一部分地址,访问该共享区域。唯一的例外是,如果另一个进程有该进程对象的虚拟内存读或写访问需求(或者持有SeDebugPrivilege权限),可以使用ReadProcessMemory 或 WriteProcessMemory函数。每当一个线程引用一个地址,虚拟内存硬件(译注:通常是处理器内置的MMU,即内存管理单元)与操作系统内存管理器配合,介入其中并且将虚拟地址翻译物理地址。Windows通过控制虚拟地址被翻译的方式,可以确保一个进程中运行的线程不会有对属于另一个进程页面的不当访问。
第三,除了提供虚拟到物理地址翻译的隐式保护之外,所有Windows支持的处理器提供某种形式的“硬件控制内存保护机制”(例如读/写,只读等等);这种保护机制的具体细节因处理器而异。
例如,一个进程地址空间中被标记为只读的代码页因而能够防止被用户线程修改。
表10-2列出了Windows API 中定义的内存保护选项(参见 VirtualProtect,VirtualProtectEx,VirtualQuery,以及VirtualQueryEx函数的说明文档)






And finally, shared memory section objects have standard Windows access control lists (ACLs) that are checked when processes attempt to open them, thus limiting access of shared memory to those processes with the proper rights. Access control also comes into play when a thread creates a section
to contain a mapped file. To create the section, the thread must have at least read access to the underlying file object or the operation will fail.
Once a thread has successfully opened a handle to a section, its actions are still subject to the memory manager and the hardware-based page protections described earlier. A thread can change the page-level protection on virtual pages in a section if the change doesn’t violate the permissions in the ACL for that section object. For example, the memory manager allows a thread to change the pages of a read-only section to have copy-on-write access but not to have read/write access. The copy-on-write access is permitted because it has no effect on other processes sharing the data.

最后,共享内存节对象有标准的Windows访问控制列表(ACLs),当进程尝试打开它们时,将检查相应的ACL,从而给予那些进程适当的权限来限制对共享内存的访问。当一个线程创建一个节对象来包含一个映射文件时,访问控制也能够发挥作用。该线程必须至少有对底层文件对象的读访问权限,否则操作将失败。
一旦线程成功的打开到一个节的句柄,其行为仍旧受到内存管理器与基于硬件的页面保护限制。一个线程可以更改节对象中虚拟页的页级保护属性,前提是这个更改没有违反该节对象ACL中的权限。举例来说,内存管理器允许一个线程将一个只读节中的页面改变为写时复制访问,但是不能改变为可读写访问。允许改变为写时复制访问是因为这不会影响其它进程共享只读的数据。(译注:即前文讲过的,在线程请求写入时,创建一个只有该线程可见的私有副本页,其对进行写操作不会改变原始的只读共享页)


No Execute Page Protection
No execute page protection (also referred to as data execution prevention, or DEP) causes an attempt to transfer control to an instruction in a page marked as “no execute” to generate an access fault.
This can prevent certain types of malware from exploiting bugs in the system through the execution of code placed in a data page such as the stack. DEP can also catch poorly written programs that don’t correctly set permissions on pages from which they intend to execute code. If an attempt is made in kernel mode to execute code in a page marked as no execute, the system will crash with the ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY bugcheck code. (See Chapter 14, “Crash Dump Analysis,” for an explanation of these codes.) If this occurs in user mode, a STATUS_ACCESS_VIOLATION (0xc0000005) exception is delivered to the thread attempting the illegal reference.
If a process allocates memory that needs to be executable, it must explicitly mark such pages by specifying the PAGE_EXECUTE, PAGE_EXECUTE_READ,
PAGE_EXECUTE_READWRITE, or PAGE_EXECUTE_WRITECOPY flags on the page granularity memory allocation functions.

不可执行页保护
不可执行页保护(也被称为数据执行保护,或DEP)导致试图转移(对CPU的)控制到一个标记为“不可执行”页中的指令时,生成一个访问错误。这可以防止某些类型的恶意软件通过将可执行代码放置在比如栈这种数据页中,从而利用系统中的缺陷或漏洞。DEP也能够捕捉到那些编写不当的程序,这些程序没有为它们打算从中执行代码的页面设置正确的权限。假设在内核模式下尝试在一个标记为不可执行的页中执行代码,系统会崩溃,并且错误检查码为ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY。(参见第14章,“崩溃转储分析”,其中有对这些代码的解释)
如果上述情况发生在用户模式,会传递一个STATUS_ACCESS_VIOLATION(0xc0000005)异常到尝试非法引用的线程。如果一个进程申请分配的内存需要能够被执行,进程必须通过在页面粒度内存分配函数中指定PAGE_EXECUTE,PAGE_EXECUTE_READ,PAGE_EXECUTE_READWRITE,或者PAGE_EXECUTE_WRITECOPY标志,来显式标记此类页面。


On 32-bit x86 systems that support DEP, bit 63 in the page table entry (PTE) is used to mark a page as nonexecutable. Therefore, the DEP feature is available only when the processor is running in Physical Address Extension (PAE) mode, without which page table entries are only 32 bits wide. (See the
section “Physical Address Extension (PAE)” later in this chapter.) Thus, support for hardware DEP on 32-bit systems requires loading the PAE kernel (%SystemRoot%\System32\Ntkrnlpa.exe), even if that system does not require extended physical addressing (for example, physical addresses greater than 4 GB). The operating system loader automatically loads the PAE kernel on 32-bit systems that support hardware DEP. To force the non-PAE kernel to load on a system that supports hardware DEP, the BCD option nx must be set to AlwaysOff, and the pae option must be set to ForceDisable.

在支持DEP的32位x86系统上,页表条目(PTE)中的比特位63(以及页目录项PDE中的比特位63),用于标记一个页(对于PDE而言,则是该PDE引用的页表中的所有页)是否为不可执行。故而仅当处理器以物理地址扩展(PAE)模式运行时,DEP功能才可用;对于32位宽的页表条目,则不支持DEP。(参见本章后面的“物理地址扩展”部分。)于是,在32位系统上支持硬件DEP,需要加载PAE内核(即 %SystemRoot%\System32\Ntkrnlpa.exe),即使该系统并不需要扩展物理寻址(例如大于4GB的物理地址)。在支持硬件DEP的32位系统上,操作系统加载器会自动载入PAE内核。要在支持硬件DEP的系统上强制加载非PAE内核,BCD 选项 nx 必须设置成 AlwaysOff,并且 pae 选项必须设置成 ForceDisable。( 译注:在CMD命令行提示符下切换到 C:\Windows\System32,执行 bcdedit 命令,即可显示当前的BCD 选项 nx 的值,并对其进行更改;另外,该命令也可以强制启用或禁用OS的PAE功能,如下图所示)

此外,还需要检查注册表项HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management,其右侧名称为 PhysicalAddressExtension 的 REG_DWORD 类型值是否为16进制的1,1表示启用PAE,0表示禁用PAE。最后,当IA32_EFER 寄存器的 NXE 标志置1,并且PDE/PTE 的比特位63(XD,第64位)置1时,就无法从被引用的页中取指令,这些页可能用于栈,数据段,以及堆。


On 64-bit versions of Windows, execution protection is always applied to all 64-bit processes and device drivers and can be disabled only by setting the nx BCD option to AlwaysOff. Execution protection for 32-bit programs depends on system configuration settings, described shortly. On 64-bit Windows, execution protection is applied to thread stacks (both user and kernel mode), usermode
pages not specifically marked as executable, kernel paged pool, and kernel session pool (for a description of kernel memory pools, see the section “Kernel-Mode Heaps (System Memory Pools).”
However, on 32-bit Windows, execution protection is applied only to thread stacks and user-mode pages, not to paged pool and session pool.

在64位版本的Windows上,所有的64位进程和设备驱动程序总是应用了执行保护机制,并且只能通过将 nx BCD 选项设置成 AlwaysOff 来禁用。而对于32位程序的执行保护则取决于系统配置的设置,这将在稍后描述。在64位Windows上,执行保护被应用于线程栈(包括用户与内核模式),没有明确标记为可执行的用户模式页面(译注:请参考表10-2),内核可分页池,以及内核会话池(参见本章后面的“内核模式堆[系统内存池]”小节。)然而,在32位Windows上,执行保护仅被应用于线程栈与用户模式页面,对于内核可分页池以及会话池,则没有应用执行保护。

The application of execution protection for 32-bit processes depends on the value of the BCD nx option. The settings can be changed by going to the Data Execution Prevention tab under Computer,
Properties, Advanced System Settings, Performance Settings. (See Figure 10-2.) When you configure no execute protection in the Performance Options dialog box, the BCD nx option is set to the appropriate value. Table 10-3 lists the variations of the values and how they correspond to the DEP settings tab. The registry lists 32-bit applications that are excluded from execution protection under the key
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\Layers, with the value name being the full path of the executable and the data set to “DisableNXShowUI”.

对于32位进程的应用程序执行保护,取决于 BCD nx 选项的值。此设置可以通过如下操作来更改:在桌面上右击“计算机”图标->“属性”->“高级系统设置”->点击“高级”选项卡的“性能”栏目的“设置”->“数据执行保护”选项卡。(参见下图)
当您在性能选项对话框中配置不可执行保护时,BCD nx 选项会被设置成相应的值。表10-3列出了它的取值,以及相应的DEP设置。在注册表键 HKLM\SOFTWARE\Microsoft\Windows NT
\CurrentVersion\AppCompatFlags\Layers 下,列出了排除在执行保护外的32位应用程序,其键值包含被排除在外的可执行文件完整路径,并且其“数值数据”被设置为“DisableNXShowUI”。








On Windows client versions (both 64-bit and 32-bit) execution protection for 32-bit processes is configured by default to apply only to core Windows operating system executables (the nx BCD option is set to OptIn) so as not to break 32-bit applications that might rely on being able to execute code in pages not specifically marked as executable, such as self-extracting or packed applications.
On Windows server systems, execution protection for 32-bit applications is configured by default to apply to all 32-bit programs (the nx BCD option is set to OptOut).

对于Windows的客户机版本(包括64位与32位),默认情况下,32位进程的执行保护被配置为,仅应用于核心的Windows操作系统可执行文件(nx BCD 选项被设置为 OptIn),以至于不会打断一些32位应用程序的执行,它们可能需要在没有明确标记为可执行的页中执行代码,例如自解压的,以及加壳的应用程序。(译注:查看其PE文件结构即可知道,某些加壳程序需要运行的脱壳代码可能就没有显式标记为可执行,而默认配置在方便一些正常自解压程序的同时,也会增加恶意代码被成功执行的风险);对于Windows的服务器版本,默认情况下,32位应用程序的执行保护被配置为应用于所有32位程序(nx BCD 选项被设置为 OptOut)。

Note To obtain a complete list of which programs are protected, install the Windows Application Compatibility Toolkit (downloadable from www.microsoft.com) and run the Compatibility Administrator Tool. Click System Database, Applications, and then Windows Components. The pane at the right shows the list of protected executables.
注意    要获得一份受保护程序的完整列表,请安装从www.microsoft.com下载的Windows Application Compatibility Toolkit,然后运行Compatibility Administrator Tool,点击 “System Database”->“Applications”->“Windows Components”,右侧面板显示出受保护的可执行文件列表。

Even if you force DEP to be enabled, there are still other methods through which applications can disable DEP for their own images. For example, regardless of the execution protection options that are enabled, the image loader (see Chapter 3 in Part 1 for more information about the image loader)
will verify the signature of the executable against known copy-protection mechanisms (such as SafeDisc and SecuROM) and disable execution protection to provide compatibility with older copyprotected software such as computer games.

即便你强制启用DEP,应用程序仍然可以通过其它办法为其自身映像禁用DEP。举例来说,无论是否启用执行保护选项,映像加载器(参见本书上册第3章)都将针对已知的防拷贝机制(例如SafeDisc 和 SecuROM)验证可执行文件的签名,并且禁用执行保护,提供对较早的防拷贝软件,如计算机游戏等的兼容性。(译注:一个广为人知的例子是即时战略游戏“星际争霸”的硬盘版,如果没有破解补丁去除它的防拷贝机制,运行时会提示插入光盘,也就不是完美的硬盘版)

EXPERIMENT: Looking at DEP Protection on Processes
Process Explorer can show you the current DEP status for all the processes on your system,including whether the process is opted in or benefiting from permanent protection. To look at the DEP status for processes, right-click any column in the process tree, choose Select Columns, and then select DEP Status on the Process Image tab. Three values are possible:
■  DEP (permanent)   This means that the process has DEP enabled because it is a “necessary Windows program or service.”
DEP   This means that the process opted in to DEP. This may be due to a systemwide
policy to opt in all 32-bit processes, an API call such as SetProcessDEPPolicy, or setting the linker flag /NXCOMPAT when the image was built.
■  Nothing   If the column displays no information for this process, DEP is disabled, either because of a systemwide policy or an explicit API call or shim.

实验    查看进程的DEP信息
进程浏览器可以显示您系统上所有进程的当前DEP状态,包括进程是否已选择加入,或受益于永久性DEP。要查看进程的DEP状态,在进程浏览器主界面进程数列表的任意列上右击,“选择列”,在打开的对话框中切换到“Process Image”选项卡,勾选其中的“DEP Status”复选框即可。可能的取值有3种:
■  DEP(永久性)    表示该进程启用了DEP,因为它是一个“必要的”Windows程序或服务。
■  DEP    表示该进程“选择加入”DEP,这可能是由于选择了所有32位进程的系统级策略导致的,例如对SetProcessDEPPolicy API函数的调用(动态DEP配置),或者在构建该进程对应的映像文件时,设置了/NXCOMPAT 链接器标志(静态DEP配置)。(译注:设置方法参考下图)


■  Nothing    如果进程的DEP状态列没有显示任何信息,表示禁用了DEP,这要么是因为一个系统级的策略,要么是一个显式的 API 调用导致的。(译注:最后的单词shim不知作何解释)


The following Process Explorer window shows an example of a system on which DEP is set to OptOut, Turn On DEP For All Programs And Services Except Those That I Select. Note that two processes running in the user’s login, a third-party sound-card manager and a USB port monitor,
show simply DEP, meaning that DEP can be turned off for them via the dialog box shown in Figure 10-2. The other processes shown are running Windows in-box programs and show DEP (Permanent), indicating that DEP cannot be disabled for them.

下面的进程浏览器截图显示一个DEP设置成OptOut(即“为除选定程序之外的所有程序和服务启用DEP”)的系统上的例子。注意到在user1用户登录后运行的2个进程:一个第三方声卡管理器,以及一个USB端口监视器,它们的DEP状态仅为简单的DEP,这意味着可以通过前面那个设置DEP的对话框来关闭它们的DEP(选择“仅为基本Windows程序和服务启用DEP”即可)。其余进程的DEP状态为永久的,这表明不能禁用它们的DEP。



Additionally, to provide compatibility with older versions of the Active Template Library (ATL) framework (version 7.1 or earlier), the Windows kernel provides an ATL thunk emulation environment.
This environment detects ATL thunk code sequences that have caused the DEP exception and emulates the expected operation. Application developers can request that ATL thunk emulation not be applied by using the latest Microsoft C++ compiler and specifying the /NXCOMPAT flag (which sets the IMAGE_DLLCHARACTERISTICS_NX_COMPAT flag in the PE header), which tells the system that the executable fully supports DEP. Note that ATL thunk emulation is permanently disabled if the AlwaysOn value is set.

此外,为提供对较早版本的活动模板库(ATL)框架的兼容性(7.1或更早版本),Windows内核提供了一个ATL thunk 模拟环境。该环境检测导致DEP异常的ATL thunk代码序列,并且模拟预期的操作。应用程序开发人员可以通过最新的Microsoft C++ 编译器,指定 /NXCOMPAT 标志(前文提到过,并且这将在PE文件头中设置 IMAGE_DLLCHARACTERISTICS_NX_COMPAT 标志),用于告知系统这个可执行文件完全支持DEP;如此就能够向内核请求不对该程序应用ATL thunk模拟环境。需要注意,如果BCD nx 值被设置为AlwaysOn,将永久禁用ATL thunk模拟。

Finally, if the system is in OptIn or OptOut mode and executing a 32-bit process, the SetProcessDEPPolicy function allows a process to dynamically disable DEP or to permanently enable it. (Once enabled through this API, DEP cannot be disabled programmatically for the lifetime of the process.)
This function can also be used to dynamically disable ATL thunk emulation in case the image wasn’t compiled with the /NXCOMPAT flag. On 64-bit processes or systems booted with AlwaysOff or AlwaysOn, the function always returns a failure. The GetProcessDEPPolicy function returns the 32-bit per-process DEP policy (it fails on 64-bit systems, where the policy is always the same—enabled),
while GetSystemDEPPolicy can be used to return a value corresponding to the policies in Table 10-3.

最后,如果系统处于OptIn 或 OptOut 模式(也就是启用动态DEP配置),并且正执行一个32位进程,函数SetProcessDEPPolicy允许该进程动态禁用DEP,或者永久启用DEP。(译注:请参考表10-3)(一旦通过此API启用,在该进程生命周期内,DEP无法通过编程方式禁用。)此函数也可用于动态禁用ATL thunk 模拟,防止没有使用 /NXCOMPAT 标志来编译生成映像文件。SetProcessDEPPolicy() 的原型声明在 Winbase.h 头文件中:
[FONT="微软雅黑"][SIZE="4"][COLOR="Black"]BOOL WINAPI SetProcessDEPPolicy(__in  DWORD dwFlags);[/COLOR][/SIZE][/FONT]

如果 dwFlags 的值为1,则在进程生命周期内永久启用DEP。
对于64位进程,或者系统以AlwaysOff/AlwaysOn 选项启动,则该函数总是返回失败。函数GetProcessDEPPolicy返回每个32位进程的DEP策略(在64位系统上调用此函数会失败,因为此时DEP策略总是“启用”),而函数GetSystemDEPPolicy可用于返回(查询)与表10-3中列出策略相对应的值。


Software Data Execution Prevention
For older processors that do not support hardware no execute protection, Windows supports limited software data execution prevention (DEP). One aspect of software DEP reduces exploits of the exception handling mechanism in Windows. (See Chapter 3 in Part 1 for a description of structured exception handling.) If the program’s image files are built with safe structured exception handling (a feature in the Microsoft Visual C++ compiler that is enabled with the /SAFESEH flag), before an exception is dispatched, the system verifies that the exception handler is registered in the function table
(built by the compiler) located within the image file.

软件数据执行保护
对于不支持硬件不可执行保护的较早处理器,Windows也提供了有限的软件数据执行保护(DEP)。软件DEP的一个方面是减少对Windows中异常处理机制的漏洞利用可能性。(参见本书上册第3章对结构化异常处理的介绍。)如果程序的二进制映像文件通过安全的结构化异常处理(Microsoft Visual C++ 编译器中的一个功能,通过 /SAFESEH 标志启用)构建,在分发一个异常前,系统会验证该异常处理程序是否在位于映像文件内的函数表(由编译器构建)中注册。
(译注:下图给出在 Visual Studio 2010 中,对欲构建的PE文件,配置启用/SAFESEH 标志的方法。)


在 VS 2015 中,改选项的中文解释大致相同:





The previous mechanism depends on the program’s image files being built with safe structured exception handling. If they are not, software DEP guards against overwrites of the structured exception handling chain on the stack in x86 processes via a mechanism known as Structured Exception Handler
Overwrite Protection (SEHOP). A new symbolic exception registration record is added on the stack when a thread first begins user-mode execution. The normal exception registration chain will lead to this record. When an exception occurs, the exception dispatcher will first walk the list of exception handler registration records to ensure that the chain leads to this symbolic record. If it does not, the exception chain must have been corrupted (either accidentally or deliberately), and the exception dispatcher
will simply terminate the process without calling any of the exception handlers described on the stack. Address Space Layout Randomization (ASLR) contributes to the robustness of this method by making it more difficult for attacking code to know the location of the function pointed to by the symbolic exception registration record, and so to construct a fake symbolic record of its own.

前述的机制依赖于程序的映像文件是否通过安全的结构化异常处理来构建。如果不是,软件DEP保护针对x86进程栈上的结构化异常处理链,阻止其被覆写,这是通过一个被称为Structured Exception Handler Overwrite Protection(结构化异常处理程序覆写保护,SEHOP)的机制完成的。
当一个线程首次开始在用户模式执行时,一个新的符号异常注册记录被加入到栈上。普通的异常注册链将指向(lead to)此记录。当发生一个异常时,异常分发器将首先遍历(walk)异常处理程序注册记录列表,确保该链连接到此符号记录。如果不是这样,该异常链必定已损坏(不是无意就是有意),此时异常分发器将仅仅终止进程,不会调用任何栈上描述的异常处理程序。地址空间布局随机化(ASLR)技术,通过符号异常注册记录让攻击代码更难于得知函数指向的位置,以及构造一个自身的假符号记录。这有助于此方法(译注:即异常分发器检查符号记录决定是否调用异常处理程序)的健壮性。


To further validate the SEH handler when /SAFESEH is not present, a mechanism called Image Dispatch Mitigation ensures that the SEH handler is located within the same image section as the function that raised an exception, which is normally the case for most programs (although not necessarily,
since some DLLs might have exception handlers that were set up by the main executable, which is why this mitigation is off by default). Finally, Executable Dispatch Mitigation further makes sure that the SEH handler is located within an executable page—a less strong requirement than Image Dispatch Mitigation, but one with fewer compatibility issues.

为了在没有指定/SAFESEH 标志时,进一步验证SEH处理程序,一个叫做Image Dispatch Mitigation(映像调度/分发缓解)的机制确保SEH处理程序位在与引发异常的函数相同的映像节中,对于多数程序而言通常是这样(尽管这不是必须的,因为某些DLLs可能包含由主要可执行文件设置的异常处理程序,这就是默认情况下,此缓解机制关闭的原因——.exe中的函数引发异常,而SEH处理程序在其它DLL映像中)。最后,Executable Dispatch Mitigation(可执行调度/分发缓解)更进一步保证了SEH处理程序位于一个可执行页内——比映像调度/分发缓解更少的限制,更少的兼容性问题。

Two other methods for software DEP that the system implements are stack cookies and pointer encoding.
The first relies on the compiler to insert special code at the beginning and end of each potentially exploitable function. The code saves a special numerical value (the cookie) on the stack on entry and validates the cookie’s value before returning to the caller saved on the stack (which would have now been corrupted to point to a piece of malicious code). If the cookie value is mismatched, the application is terminated and not allowed to continue executing. The cookie value is computed for each boot when executing the first user-mode thread, and it is saved in the KUSER_SHARED_DATA structure. The image loader reads this value and initializes it when a process starts executing in user mode. (See Chapter 3 in Part 1 for more information on the shared data section and the image loader.)

系统实现的其余2种软件DEP方法是栈cookies与pointer encoding(指针编码)。第一种方法依赖于编译器在每个可能被漏洞利用的函数起始和结尾处插入特殊代码。(这些函数在内部可能调用了诸如 strcpy() 等不安全的字符串复制的 C运行时库函数,从而将函数栈上的缓冲区溢出为 shellcode),该代码在栈上入口处保存一个特殊的数值(cookie),然后在返回保存在栈上的调用者前,验证该cookie的值(译注:从编译器生成的汇编代码中可以看出,程序从数据段取出一个叫做“__security_cookie”的4字节双字,复制到 EAX 寄存器中,然后与 EBP 寄存器中的值,即当前栈底地址,进行 XOR 异或运算,再将结果值复制到栈上作为实际的 cookie,这个 cookie 在函数栈内的布局位于数组等本地缓冲区的前端,即靠近内存高址方向,这样缓冲区从内存低址方向往高址方向溢出,就会覆盖前面的 cookie,从而达到验证的目的。当被编译器在其栈上加入 cookie 的函数返回时,再次将 cookie 与栈底地址进行异或运算,然后调用例程 __security_check_cookie() 进行验证)。如果cookie值不匹配,表明发生溢出,应用程序被终止(__security_check_cookie() -> report_failure() -> __security_error_handler()),不允许继续执行。
每次系统启动,执行首个用户模式线程时,都计算该cookie值,并且保存在KUSER_SHARED_DATA结构体中。当一个进程开始在用户模式中执行时,映像加载器读取该值并对其进行初始化。(参见本书上册第3章有关共享数据节和映像加载器的更多内容)
(译注:栈cookies 通过设置 Visual C/C++ 编译器的 /GS 编译选项来实现,具体参考下图)



(要自行研究栈 cookies的内部机制也非常容易,只要以IDA等工具反汇编通过启用/GS 编译选项构建的PE文件即可,另外,在源代码的开头添加如下指令,可以让编译器在绝大多数的例程中添加 cookie)
[FONT="微软雅黑"][SIZE="4"][COLOR="Black"]#pragma strict_gs_check(on)[/COLOR][/SIZE][/FONT]



The cookie value that is calculated is also saved for use with the EncodeSystemPointer and DecodeSystemPointer APIs, which implement pointer encoding. When an application or a DLL has static pointers that are dynamically called, it runs the risk of having malicious code overwrite the pointer values with code that the malware controls. By encoding all pointers with the cookie value and then decoding them, when malicious code sets a nonencoded pointer, the application will still
attempt to decode the pointer, resulting in a corrupted value and causing the program to crash. The EncodePointer and DecodePointer APIs provide similar protection but with a per-process cookie (created on demand) instead of a per-system cookie.

API 函数EncodeSystemPointer 与 DecodeSystemPointer 用于实现指针编码,它们也可以保存计算出的cookie值。当一个应用程序或DLL对静态指针进行动态调用,存在让恶意代码将指针值覆写成恶意软件控制的代码风险。通过用cookie值编码所有指针然后对其进行解码,当恶意代码设置一个非编码的指针,应用程序将仍旧尝试解码该指针,从而导致一个损坏的值并造成程序崩溃。API 函数EncodePointer 与 DecodePointer 提供类似的保护功能,但是使用每进程cookie(按需创建)取代每系统cookie。

Note  The system cookie is a combination of the system time at generation, the stack value of the saved system time, the number of page faults, and the current interrupt time.
注意    系统cookie结合了system time at generation,the stack value of the saved system time(译注:这2项不知作何翻译),缺页数,以及当前中断次数。
实战:结合进程浏览器与内核调试器查看并验证进程的DEP状态
1。首先在进程浏览器中,查看 Visual Studio 2010 主进程 devenv.exe 的 DEP 状态为“永久启用”(permanent),如下所示:



2。启动 KD.exe,执行  !process 0 0 devenv.exe  命令,列出进程的 EPROCESS 结构信息,然后显式设置调试上下文为该进程:


3。由于每个进程的 EPROCESS 结构(执行体进程块)中的第一个成员就是 KPROCESS 结构(内核进程块),而 KPROCESS 结构中的 KEXECUTE_OPTIONS 结构,存储了该进程的 DEP 信息,因此,我们可以使用  dt nt!_KPROCESS  命令,后面直接带 EPROCESS 结构的线性地址,然后加上  -r  开关,递归遍历其中的所有成员,从而找出 KEXECUTE_OPTIONS 结构:







Copy-on-Write
Copy-on-write page protection is an optimization the memory manager uses to conserve physical memory. When a process maps a copy-on-write view of a section object that contains read/write pages, instead of making a process private copy at the time the view is mapped, the memory manager defers making a copy of the pages until the page is written to. For example, as shown in Figure 10-3, two processes are sharing three pages, each marked copy-on-write, but neither of the two processes has attempted to modify any data on the pages.

写时复制
写时复制页面保护是内存管理器用于节约物理内存的一种优化手段。既然一个进程映射一个包含读/写页节对象的写时复制视图,而不是在当时创建一个进程私有的副本视图来映射,那么内存管理器将推迟创建该页的私有副本,直到该页被写入。如下图10-3所示,在2个进程之间共享3个页面,每个页面都被标记为写时复制,但是这2个进程都没有尝试修改这些页中的任何数据。




If a thread in either process writes to a page, a memory management fault is generated. The memory manager sees that the write is to a copy-on-write page, so instead of reporting the fault as an access violation, it allocates a new read/write page in physical memory, copies the contents of the original page to the new page, updates the corresponding page-mapping information (explained later in this chapter) in this process to point to the new location, and dismisses the exception, thus
causing the instruction that generated the fault to be reexecuted. This time, the write operation succeeds, but as shown in Figure 10-4, the newly copied page is now private to the process that did the writing and isn’t visible to the other process still sharing the copy-on-write page. Each new process that writes to that same shared page will also get its own private copy.

假设任一进程中的一个线程向其中一个页面写入,生成一个内存管理错误。内存管理器知道写操作针对的是一个写时复制的页,它不会向进/线程报告一个非法访问错误,而是在物理内存中分配一个新的读/写页,然后将原始页的内容拷贝到新的页,并在该进/线程中更新相应的页面映射信息(本章稍后解释)来指向新的物理页位置,内存管理器还会驳回(dismisses)前面生成的异常,因而使得生成错误的指令重新被执行。这次写操作就会成功,如如下图10-4所示,执行写操作的新副本页现在属于该进/线程私有,并且对于其它仍旧共享原始写时复制页的进程不可见。每个写入同一个写时复制共享页的新进程,都将获得其自身的私有副本页来实际写入。



One application of copy-on-write is to implement breakpoint support in debuggers. For example, by default, code pages start out as execute-only. If a programmer sets a breakpoint while debugging a program, however, the debugger must add a breakpoint instruction to the code. It does this by first changing the protection on the page to PAGE_EXECUTE_READWRITE and then changing the instruction stream. Because the code page is part of a mapped section, the memory manager creates a private copy for the process with the breakpoint set, while other processes continue using the unmodified code page.
Copy-on-write is one example of an evaluation technique known as lazy evaluation that the memory manager uses as often as possible. Lazy-evaluation algorithms avoid performing an expensive operation until absolutely required—if the operation is never required, no time is wasted on it.
To examine the rate of copy-on-write faults, see the performance counter Memory: Write Copies/sec.

写时复制的一个应用就是实现调试器支持的断点功能。例如默认情况下,被调试进程的代码页一开始是仅执行的。然而如果一个程序员在调试程序时设置了一个断点,调试器必须加入一个断点指令到代码中。这是通过首先将目标代码页的内存保护选项改为PAGE_EXECUTE_READWRITE(译注:请参考表10-2),然后改变指令流做到的。由于代码页是映射节(译注:这应该是通过节视图映射到可执行文件中的类似.text节实现)的一部分,内存管理器会为调试器进程创建一个设置了断点的代码页私有副本(在调试器向代码页写入断点指令时),而其它进程可以继续共享未修改的原始代码页。写时复制是一种内存管理器优先常用的,被称为lazy evaluation(暂译为惰性求值或惰性计算)的求值技术例子之一。惰性求值算法试图避免执行开销很大的操作,除非绝对必要——如果不是必须的,就尽可能节省计算时间。要审查写时复制频率,参见性能计数器内存:Write Copies/sec.

Address Windowing Extensions
Although the 32-bit version of Windows can support up to 64 GB of physical memory (as shown in Table 2-2 in Part 1), each 32-bit user process has by default only a 2-GB virtual address space. (This can be configured up to 3 GB when using the increaseuserva BCD option, described in the upcoming section “User Address Space Layout.”) An application that needs to make more than 2 GB (or 3 GB) of data easily available in a single process could do so via file mapping, remapping a part of its address space into various portions of a large file. However, significant paging would be involved upon each remap.
For higher performance (and also more fine-grained control), Windows provides a set of functions called Address Windowing Extensions (AWE). These functions allow a process to allocate more physical memory than can be represented in its virtual address space. It then can access the physical memory by mapping a portion of its virtual address space into selected portions of the physical memory at various times.

地址窗口化扩展
尽管Windows的32位版本能够支持最多64GB的物理内存(如本书上册表2-2所示),每个32位用户模式进程默认情况只有一个2GB的虚拟地址空间。(使用 increaseuserva BCD 选项时,可以配置成最多3GB,本章后面的“用户地址空间布局”小节即将对此进行论述。)一个需要使用或产生多于2GB(或3GB)数据的应用程序通过文件映射,即可在单一进程中轻易获得这样的空间分配,重新将自身地址空间的一部分映射到一个大型文件的各个部分。然而,每个重映射过程都将涉及显式分页。为了提高性能(以及更多细粒度的控制),Windows提供了一组叫做地址窗口化扩展(AWE)的函数。这些函数允许分配给一个进程的物理内存超过能够代表进程自身虚拟地址空间的物理内存。(译注:AWE函数的原型声明在 winbase.h 头文件中。这些 API 允许程序访问的物理内存总量大于它的线性地址空间对其限定的物理内存量;“线性地址”是英特尔文档中的用语,微软将其称为“虚拟地址”,在处理器启用分段与分页的多数场景下,将虚拟地址进行第一阶段分段,得到线性地址;将线性地址进行第二阶段分页,得到物理地址,这就是地址翻译的过程;在仅启用分页的情况下,一般可以认为虚拟地址就是线性地址。)
接着,在不同时间点上,进程都可以通过将自身虚拟地址空间的一部分映射到选定部分的物理内存,来对其进行访问。
(译注:increaseuserva BCD 选项的设置方法如下图所示)





Allocating and using memory via the AWE functions is done in three steps:
1. Allocating the physical memory to be used. The application uses the Windows functions AllocateUserPhysicalPages or AllocateUserPhysicalPagesNuma. (These require the Lock Pages In Memory user right.)
2. Creating one or more regions of virtual address space to act as windows to map views of the physical memory. The application uses the Win32 VirtualAlloc, VirtualAllocEx, or VirtualAllocExNuma function with the MEM_PHYSICAL flag.
3. The preceding steps are, generally speaking, initialization steps. To actually use the memory, the application uses MapUserPhysicalPages or MapUserPhysicalPagesScatter to map a portion of the physical region allocated in step 1 into one of the virtual regions, or windows, allocated in step 2.

通过AWE函数分配和使用内存需要如下3个步骤:
1.   分配要使用的物理内存。应用程序使用Windows函数AllocateUserPhysicalPages 或 AllocateUserPhysicalPagesNuma(这要求调用方具有内存中锁定页面的用户权限)。
(译注:这是由于后文将提及的:通过AWE分配的物理内存永远不会换出到磁盘,必须锁定在内存中,因此调用AWE函数的应用程序需要获得“锁定内存页”权限)
2.   创建一个或者更多的虚拟地址空间区域,作为要映射物理内存视图的窗口(译注:原文中的windows为小写,因此假定作者原意指窗口,而非Windows操作系统)。应用程序使用Win32 API 函数VirtualAlloc,VirtualAllocEx,或者 VirtualAllocExNuma,并且指定 MEM_PHYSICAL 标志。
3.   一般而言,前面2步为初始化操作。为了实际使用内存,应用程序需要 MapUserPhysicalPages 或 MapUserPhysicalPagesScatter函数,来将第1步分配的一部分物理区域映射到第2步分配的其中一个虚拟区域或窗口。


Figure 10-5 shows an example. The application has created a 256-MB window in its address space and has allocated 4 GB of physical memory (on a system with more than 4 GB of physical memory).
It can then use MapUserPhysicalPages or MapUserPhysicalPagesScatter to access any portion of the physical memory by mapping the desired portion of memory into the 256-MB window. The size of the application’s virtual address space window determines the amount of physical memory that the application can access with any given mapping. To access another portion of the allocated RAM, the application can simply remap the area.

图10-5展示了一个例子。应用程序在其自身地址空间中,创建一个256MB大小的窗口并且为该窗口分配4GB的物理内存(在一个超过4GB物理内存的系统上)。应用程序接着可以使用 MapUserPhysicalPages 或 MapUserPhysicalPagesScatter函数,通过将期望的物理内存(4GB中的一部分)映射到256MB的窗口,来对其进行访问。应用程序虚拟地址空间窗口的大小决定了它能够通过任意给定映射访问的物理内存数量。应用程序可以简单地重映射区域来访问已分配RAM中的其它部分。



The AWE functions exist on all editions of Windows and are usable regardless of how much physical memory a system has. However, AWE is most useful on 32-bit systems with more than 2 GB of physical memory because it provides a way for a 32-bit process to access more RAM than its virtual address space would otherwise allow. Another use is for security purposes: because AWE memory is
never paged out, the data in AWE memory can never have a copy in the paging file that someone could examine by rebooting into an alternate operating system. (VirtualLock provides the same guarantee for pages in general.)

所有Windows版本中都提供了AWE函数,并且无论系统有多少物理内存,它都可以正常工作。然而,在有多于2GB物理内存的32位系统上,AWE最有帮助,因为它提供一种方式,让32位进程能够访问更多其自身虚拟地址空间原本不允许访问的RAM。另一个安全用途是:由于AWE内存永远不会换出(到磁盘上的页面文件),AWE内存中的数据就不会在页面文件中有一份副本,从而别有用心人士就无法通过重启进入其它操作系统来查看页面文件中的副本。(译注:将安装在其它硬盘上的操作系统加载到内存后,查看目标硬盘分页文件的脱机攻击)(总体上,VirtualLock函数为页面提供相同的安全保障)

Finally, there are some restrictions on memory allocated and mapped by the AWE functions:
■ Pages can’t be shared between processes.
■ The same physical page can’t be mapped to more than one virtual address in the same process.
■ Page protection is limited to read/write, read-only, and no access.

最后,在使用AWE函数进行内存分配和映射方面,有一些限制:
■ 页面不能在进程间共享;
■ 相同的物理页只能被映射到同一个进程中的单个虚拟地址窗口(即多个虚拟地址窗口不能映射相同的物理页,参考图10-5很容易理解)
■ AWE的页面保护选项被限制为:读/写,只读,以及不允许访问。


AWE is less useful on x64 or IA64 Windows systems because these systems support 8 TB or 7 TB (respectively) of virtual address space per process, while allowing a maximum of only 2 TB of RAM.
Therefore, AWE is not necessary to allow an application to use more RAM than it has virtual address space; the amount of RAM on the system will always be smaller than the process virtual address space. AWE remains useful, however, for setting up nonpageable regions of a process address space. It provides finer granularity than the file mapping APIs (the system page size, 4 KB or 8 KB, versus 64 KB).
For a description of the page table data structures used to map memory on systems with more than 4 GB of physical memory, see the section “Physical Address Extension (PAE).”

在 x64 或 IA64 Windows系统上,AWE用处不大,因为此类系统分别支持每进程 8TB 或 7TB 的虚拟地址空间,而仅允许最大 2TB 的RAM。因此,在允许一个应用程序使用超过其自身虚拟地址空间代表的物理内存方面(即 8TB 或 7TB),AWE不是必需的;此类系统上的RAM数量将总是小于进程的虚拟地址空间。即便如此,AWE仍旧有用,例如设置一个进程地址空间中不可分页区域。
在这方面,AWE提供比文件映射 APIs 更细的粒度(AWE提供 4KB 或 8KB 的系统页尺寸分配粒度,而文件映射 APIs 只能基于 64KB 边界对齐的分配粒度)。
参见本章“物理地址扩展”一节,其中讨论了带有4GB以上物理内存的系统用于映射内存的页表数据结构。


至此,第一部分就结束了,感谢各位能够坚持把这篇陋文读完,第二部分涉及更深入的内容,包括在第一部分多处提到的一些内存管理器组件和服务的具体数据结构实现,算法描述;其利用的硬件特性与 Intel x86/x64 体系结构紧密相关,尤其是对地址翻译,内存保护以及分页相关机制的解析,因此若能够与《Intel 64 与 IA-32 架构软件开发者手册》系列配套阅读能够有助于理解(后者也是英文原版,有1000多页需要翻译。。。)另外,实验解析了各种场景下Windows内核调试器的输出,因此第二部分的翻译将更有难度,期待你们的反馈信息!

[课程]FART 脱壳王!加量不加价!FART作者讲授!

上传的附件:
收藏
免费 4
支持
分享
最新回复 (29)
雪    币: 608
活跃值: (643)
能力值: ( LV4,RANK:50 )
在线值:
发帖
回帖
粉丝
2
建议读原版吧,翻译的过程有时会曲解一些东西。
2015-9-17 01:38
0
雪    币: 912
活跃值: (87)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
3
心有余而力不足,看都看不懂,更别说翻译了~默默支持樓主!
2015-9-17 08:58
0
雪    币: 324
活跃值: (1029)
能力值: ( LV7,RANK:100 )
在线值:
发帖
回帖
粉丝
4
请给个英文原版的下载地址
2015-9-17 09:16
0
雪    币: 118
活跃值: (72)
能力值: ( LV4,RANK:50 )
在线值:
发帖
回帖
粉丝
5
希望有余力的朋友可以一起支持
2015-9-17 09:29
0
雪    币: 1604
活跃值: (640)
能力值: ( LV13,RANK:460 )
在线值:
发帖
回帖
粉丝
6
这倒是由于中西文化的用语差异造成的,除非能和作者交流沟通,但也需要作者能够理解中文用语才行.
2015-9-17 10:38
0
雪    币: 24
活跃值: (114)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
7
楼主不愧是楼主,精神上支持一下!
2015-9-17 10:58
0
雪    币: 1050
活跃值: (1208)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
8
很支持……我觉的有中文版的是非常必要的,个别差异对于新手而言没什么差别,能理解那些差异的也是个中好手了,因为你就算英语好,不懂计算机,不照样一头雾水?而且中文版对快速索引到自己需要的内容也很方便,毕竟是工具书,通读一遍之后剩下的就是碰到具体问题的时候看
2015-9-17 11:01
0
雪    币: 1604
活跃值: (640)
能力值: ( LV13,RANK:460 )
在线值:
发帖
回帖
粉丝
9
您好,这里是第6版原版上册的百度网盘链接,提取码为ei8a

http://pan.baidu.com/s/1jGH8JQ2

这是下册的链接,提取码为mnge
http://pan.baidu.com/s/1pJN4jY7
2015-9-17 11:03
0
雪    币: 438
活跃值: (228)
能力值: ( LV5,RANK:70 )
在线值:
发帖
回帖
粉丝
10
支持楼主。
2015-9-17 15:55
0
雪    币: 144
活跃值: (46)
能力值: ( LV6,RANK:90 )
在线值:
发帖
回帖
粉丝
11
都英文的就行吧
2015-9-17 16:59
0
雪    币: 326
活跃值: (56)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
12
支持一下楼主。
2015-10-13 15:34
0
雪    币: 22
活跃值: (52)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
13
MARK一下
2015-10-13 21:07
0
雪    币: 60
活跃值: (1010)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
14
这本书原翻译者太监了,明年出第七版了。
2015-10-17 10:36
0
雪    币: 1604
活跃值: (640)
能力值: ( LV13,RANK:460 )
在线值:
发帖
回帖
粉丝
15
哪来的消息?
2015-10-17 14:21
0
雪    币: 13
活跃值: (10)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
16
佩服佩服!
2015-10-23 21:40
0
雪    币: 13
活跃值: (10)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
17
mark
2016-1-8 12:13
0
雪    币: 50
活跃值: (73)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
18
支持楼主!
2016-1-15 13:22
0
雪    币: 1085
活跃值: (114)
能力值: ( LV8,RANK:120 )
在线值:
发帖
回帖
粉丝
19
解析windows操作系统
好文章
2016-3-3 14:33
0
雪    币: 346
活跃值: (25)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
20
感谢楼主!!!
2016-4-21 09:32
0
雪    币: 242
活跃值: (10)
能力值: ( LV4,RANK:50 )
在线值:
发帖
回帖
粉丝
21
感谢翻译,能够分享出来。刚好需要英文对照。
2016-4-21 11:57
0
雪    币: 1217
活跃值: (606)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
22
支持支持,中英对照这种方式很好!感谢楼主的无私奉献
2016-4-26 12:55
0
雪    币: 7
活跃值: (49)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
23
感谢无私的分享,话说书中约好的下册怎么没出版呢。
2016-4-28 18:01
0
雪    币: 1604
活跃值: (640)
能力值: ( LV13,RANK:460 )
在线值:
发帖
回帖
粉丝
24
可能是下册的翻译难度更大一些,没官方翻译我们只有自力更生翻译了
2016-4-28 20:51
0
雪    币: 1604
活跃值: (640)
能力值: ( LV13,RANK:460 )
在线值:
发帖
回帖
粉丝
25
2016-9-13 23:09
0
游客
登录 | 注册 方可回帖
返回
//