作者:hhstudy
博客: http://hi.baidu.com/hhstudy
转载请在文章开头注明出处
垂直同步是vertical synchronization的简称。基本思路是将显示器的刷新周期和显卡输出画面的周期同步起来,其目的是为了避免一种称之为“画面撕裂”的现象。顾名思义,“画面撕裂”其实就是画面移动较快的时候,画面看上去是两截,上面半副图和下面半副图就像是两张相差不多的图的各自半部分拼凑起来的一样。这现象恐怕打游戏的都见过。垂直同步的原理和“画面撕裂”的原因请自行谷歌。
下面主要研究某卡驱动是如何猥琐的实现强制覆盖应用程序的垂直同步设置。以及自己实现一个覆盖应用程序垂直同步设置的程序。
环境配置
OS win8.1_x86 显卡 Nvidia GTX 660
在UserMode中,垂直同步至少可以在2个地方设置(其实一般也就这两个地方了)。
1.由程序发起调用 ,程序在CreateDevice 的时候指定pPresentationParameters->PresentationInterval = D3DPRESENT_INTERVAL_ONE(TWO,THREE,FOUR也可以) 参数即可。
HRESULT CreateDevice(
[in] UINT Adapter,
[in] D3DDEVTYPE DeviceType,
[in] HWND hFocusWindow,
[in] DWORD BehaviorFlags,
[in, out] D3DPRESENT_PARAMETERS *pPresentationParameters,
[out, retval] IDirect3DDevice9 **ppReturnedDeviceInterface
);
2.由UserMode Driver发起调用。
PFND3DDDI_PRESENTCB pfnPresentCb
__checkReturn HRESULT APIENTRY CALLBACK pfnPresentCb(
_In_ HANDLE hDevice,
_In_ const D3DDDICB_PRESENT *pData
);
...还有一些其他相关函数,先看下面的堆栈调用
先在kernel里断一帧...
ChildEBP RetAddr Args to Child
bffe7b90 8c6422d9 8110d858 bffe7cd0 bffe7c8c dxgkrnl!DXGCONTEXT::Present (FPO: [Non-Fpo])
bffe7d38 92350ea0 00000000 c1dc1db8 bffe7d54 dxgkrnl!DxgkPresent+0x24d (FPO: [Non-Fpo])
bffe7d48 81d1b377 00fbb170 0394f3f8 778b2da4 win32k!NtGdiDdDDIPresent+0x14 (FPO: [1,0,0])
bffe7d48 778b2da4 00fbb170 0394f3f8 778b2da4 nt!KiSystemServicePostCall (FPO: [0,3] TrapFrame @ bffe7d54)
0394f130 76896bdc 70747363 00fbb170 00fbb170 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
0394f134 70747363 00fbb170 00fbb170 0587d000 GDI32!NtGdiDdDDIPresent+0xa (FPO: [1,0,0])
0394f3f8 53134e24 00fbb0b0 0394f540 00fbb170 d3d9!PresentCB+0xe2 (FPO: [2,171,4])
WARNING: Stack unwind information not available. Following frames may be wrong.
0394f424 53134b58 0394f540 00000000 0587d000 nvd3dum!QueryOglResource+0x21b84
0394f46c 53134d1b 0394f540 0587d000 025b5140 nvd3dum!QueryOglResource+0x218b8
0394f484 531079ab 0394f540 00000504 0587d000 nvd3dum!QueryOglResource+0x21a7b
0394f760 531094db 00000504 036c9040 03789818 nvd3dum+0x6979ab
0394f778 70870b2a 0587d000 03789818 00000001 nvd3dum+0x6994db
0394f7a0 70871665 03789314 00000001 00000000 d3d9!CBatchFilterI::ProcessBatch+0x4c2 (FPO: [2,3,4])
0394f7b8 70870472 0394f7cc 76ae17ad 036c9040 d3d9!CBatchFilterI::WorkerThread+0x2d (FPO: [0,0,4])
0394f7c0 76ae17ad 036c9040 0394f810 77893af4 d3d9!CBatchFilterI::LHBatchWorkerThread+0xd (FPO: [1,0,0])
0394f7cc 77893af4 036c9040 2cb22ace 00000000 KERNEL32!BaseThreadInitThunk+0xe (FPO: [1,0,0])
0394f810 77893acd ffffffff 778c42a6 00000000 ntdll!__RtlUserThreadStart+0x20 (FPO: [Non-Fpo])
0394f820 00000000 70870465 036c9040 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [2,2,0])
首先我非常不怀疑hook。一般来说vender追求稳定性还有顾及和MS的合作关系,是不会使用hook这种手段的。所以我首先考虑的是驱动在某个关键的call把参数改了。所以下来做的事情就是对比垂直同步驱动强制开启和驱动默认的情况下调用堆栈上各个函数的参数。
这里首先对比的是Gdi函数 NtGdiDdDDIPresent.
很遗憾,并没有找到NtGdiDdDDIPresent的的相关文档。但是很幸运找到了一个OpenGL里面类似的一个函数:D3DKMTPresent。MSDN上这样描述 The D3DKMTPresent function submits a present command to the Microsoft DirectX graphics kernel subsystem (Dxgkrnl.sys). 从上面的调用堆栈也可以看到NtGdiDdDDIPresent 直接调用dxgkrnl!DxgkPresent了。所以猜测微软在kernel会为OpenGL 和 DX 使用同一套接口。
NTSTATUS APIENTRY D3DKMTPresent(
_In_ const D3DKMT_PRESENT *pData
);
typedef struct _D3DKMT_PRESENT {
union {
D3DKMT_HANDLE hDevice;
D3DKMT_HANDLE hContext;
};
HWND hWindow;
D3DDDI_VIDEO_PRESENT_SOURCE_ID VidPnSourceId;
D3DKMT_HANDLE hSource;
D3DKMT_HANDLE hDestination;
UINT Color;
RECT DstRect;
RECT SrcRect;
UINT SubRectCnt;
const RECT *pSrcSubRects;
UINT PresentCount;
D3DDDI_FLIPINTERVAL_TYPE FlipInterval;
D3DKMT_PRESENTFLAGS Flags;
ULONG BroadcastContextCount;
D3DKMT_HANDLE BroadcastContext[D3DDDI_MAX_BROADCAST_CONTEXT];
HANDLE PresentLimitSemaphore;
D3DKMT_PRESENTHISTORYTOKEN PresentHistoryToken;
#if (DXGKDDI_INTERFACE_VERSION >= DXGKDDI_INTERFACE_VERSION_WIN8)
D3DKMT_PRESENT_RGNS *pPresentRegions;
#endif
} D3DKMT_PRESENT;
FlipInterval(offset 0x44) 和 Flags(offset 0x48) 是要关注的重点。如果Flags为Flip 并且 FlipInterval>0, 垂直同步开启。其他情况,垂直同步关闭。附上msdn 地址http://msdn.microsoft.com/zh-cn/library/ff548168(v=vs.85).aspx
驱动默认垂直同步 应用程序垂直同步关闭 NtGdiDdDDIPresent参数
1: kd> dd 00fbb170
00fbb170 800014c0 00500442 00000000 40002840
00fbb180 00000000 00000000 00000000 00000000
00fbb190 00000000 00000000 00000000 00000000
00fbb1a0 00000000 00000000 00000001 00000000
00fbb1b0 00097f37 00000000 00001004 00000000
00fbb1c0 00000000 00000000 00000000 00000000
00fbb1d0 00000000 00000000 00000000 00000000
00fbb1e0 00000000 00000000 00000000 00000000
驱动强制开启垂直同步 应用程序垂直同步关闭 NtGdiDdDDIPresent参数
0: kd> dd 03405328
03405328 80001540 001f0446 00000000 80001f00
03405338 00000000 00000000 00000000 00000000
03405348 00000000 00000000 00000000 00000000
03405358 00000000 00000000 00000001 00000000
03405368 0000614a 00000001 00001004 00000000
03405378 00000000 00000000 00000000 00000000
03405388 00000000 00000000 00000000 00000000
03405398 00000000 00000000 00000000 00000000
可以看到驱动强制开启垂直同步的情况FlipInterval为1,驱动默认垂直同步的情况FlipInterval为0。果断内存断点ba w4 03405328 +0x44, F5 起飞》》》
下面断点被命中了
nvd3dum!QueryOglResource+0x218f5:
001b:53134b95 8b5508 mov edx,dword ptr [ebp+8]
看一下前面一段的反汇编
2: kd> ub
nvd3dum!QueryOglResource+0x218e0:
001b:53134b80 eb02 jmp nvd3dum!QueryOglResource+0x218e4 (53134b84)
001b:53134b82 33c0 xor eax,eax
001b:53134b84 3bf8 cmp edi,eax
001b:53134b86 7510 jne nvd3dum!QueryOglResource+0x218f8 (53134b98)
001b:53134b88 8b55d0 mov edx,dword ptr [ebp-30h]
001b:53134b8b 8d4f44 lea ecx,[edi+44h]
001b:53134b8e 8b45dc mov eax,dword ptr [ebp-24h]
001b:53134b91 f00fb111 lock cmpxchg dword ptr [ecx],edx //现在在这里命中了前面设的断点
那么这又是谁调用的呢?
2: kd> kv
ChildEBP RetAddr Args to Child
WARNING: Stack unwind information not available. Following frames may be wrong.
0353f62c 53134d1b 00000000 02241800 0216d0d0 nvd3dum!QueryOglResource+0x218f5
0353f644 531079ab 0353f700 00000504 02241800 nvd3dum!QueryOglResource+0x21a7b
0353f920 531094db 00000504 032bb040 032bb7d4 nvd3dum+0x6979ab
0353f93c 70870b2a 02241800 032bb7d4 00000001 nvd3dum+0x6994db
0353f964 70871665 032bb2d0 00000001 00000000 d3d9!CBatchFilterI::ProcessBatch+0x4c2 (FPO: [2,3,4])
0353f97c 70870472 0353f990 76ae17ad 032bb040 d3d9!CBatchFilterI::WorkerThread+0x2d (FPO: [0,0,4])
0353f984 76ae17ad 032bb040 0353f9d4 77893af4 d3d9!CBatchFilterI::LHBatchWorkerThread+0xd (FPO: [1,0,0])
0353f990 77893af4 032bb040 012387ce 00000000 KERNEL32!BaseThreadInitThunk+0xe (FPO: [1,0,0])
0353f9d4 77893acd ffffffff 778c4293 00000000 ntdll!__RtlUserThreadStart+0x20 (FPO: [Non-Fpo])
0353f9e4 00000000 70870465 032bb040 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [2,2,0])
发现和前面的Kernel里Present断下来的时候的堆栈调用的非常像,其实驱动就是在调用d3d9!PresentCB前 通过其传递给Rumtime的参数 来改变FlipInterval这个值的。驱动略大...分析这个分析过程略去。
下面分析一下如何通过d3d9!PresentCB来获取FlipInterval这个参数。
.text:1000783D ; signed int __stdcall PresentCB(HANDLE hDevice, int pPresentCB)
.text:1000783D _PresentCB@8 proc near ; DATA XREF: _CreateDeviceLHDDI+5917o
.text:1000783D
.text:1000783D var_2A8 = dword ptr -2A8h
.text:1000783D var_2A4 = dword ptr -2A4h
.text:1000783D var_2A0 = dword ptr -2A0h
.text:1000783D var_29C = dword ptr -29Ch
.text:1000783D var_298 = dword ptr -298h
.text:1000783D var_218 = dword ptr -218h
.text:1000783D var_214 = dword ptr -214h
.text:1000783D var_1D8 = dword ptr -1D8h
.text:1000783D var_1D4 = dword ptr -1D4h
.text:1000783D var_1D0 = dword ptr -1D0h
.text:1000783D var_150 = dword ptr -150h
.text:1000783D var_14C = dword ptr -14Ch
.text:1000783D var_48 = dword ptr -48h
.text:1000783D var_44 = dword ptr -44h
.text:1000783D var_4 = dword ptr -4
.text:1000783D hDevice = dword ptr 8
.text:1000783D pPresentCB = dword ptr 0Ch
.text:1000783D
.text:1000783D ; FUNCTION CHUNK AT .text:1004A634 SIZE 00000096 BYTES
.text:1000783D ; FUNCTION CHUNK AT .text:1004B6A1 SIZE 00000010 BYTES
.text:1000783D ; FUNCTION CHUNK AT .text:1005191D SIZE 0000000F BYTES
.text:1000783D ; FUNCTION CHUNK AT .text:100B0490 SIZE 00000190 BYTES
.text:1000783D
.text:1000783D mov edi, edi
.text:1000783F push ebp
.text:10007840 mov ebp, esp
.text:10007842 and esp, 0FFFFFFF8h
.text:10007845 sub esp, 2ACh
.text:1000784B mov eax, ___security_cookie
.text:10007850 xor eax, esp
.text:10007852 mov [esp+2ACh+var_4], eax
.text:10007859 mov edx, [ebp+pPresentCB]
.text:1000785C push ebx
.text:1000785D push esi
;保存hDevice到esi. hDevice为MS不公开的数据结构。
.text:1000785E mov esi, [ebp+hDevice]
.text:10007861 push edi
.text:10007862 xor edi, edi
.text:10007864 push 1
.text:10007866 mov eax, [esi+12C8h]
.text:1000786C or eax, [esi+12CCh]
.text:10007872 pop ebx
.text:10007873 jnz loc_100B0490
.text:10007879
.text:10007879 loc_10007879: ; CODE XREF: PresentCB(x,x)+A8C5Aj
.text:10007879 mov [esp+2B8h+var_2A4], ebx
.text:1000787D
.text:1000787D loc_1000787D: ; CODE XREF: PresentCB(x,x)+A8C64j
.text:1000787D cmp _g_ForceDeviceRemoved, edi
.text:10007883 jnz loc_100B04A6
.text:10007889
.text:10007879 loc_10007879: ; CODE XREF: PresentCB(x,x)+A8C5Aj
.text:10007879 mov [esp+2B8h+var_2A4], ebx
.text:1000787D
.text:1000787D loc_1000787D: ; CODE XREF: PresentCB(x,x)+A8C64j
.text:1000787D cmp _g_ForceDeviceRemoved, edi
.text:10007883 jnz loc_100B04A6
.text:10007889
.text:10007889 loc_10007889: ; CODE XREF: PresentCB(x,x)+A8C70j
.text:10007889 test byte ptr [esi+84h], 2
.text:10007890 jnz loc_10007964
.text:10007896 mov eax, [esi+108h]
.text:1000789C test al, 7
.text:1000789E jz loc_1004B6A1
.text:100078A4
.text:100078A4 loc_100078A4: ; CODE XREF: PresentCB(x,x)+43E69j
.text:100078A4 mov ecx, [esi+11E8h]
.text:100078AA cmp ecx, ebx
.text:100078AC jbe loc_100B04BC
.text:100078B2 mov eax, [edx+8]
.text:100078B5
.text:100078B5 loc_100078B5: ; CODE XREF: PresentCB(x,x)+A8C85j
.text:100078B5 mov [esp+2B8h+var_2A8], eax
.text:100078B9 mov eax, [eax]
.text:100078BB mov [esi+0C0h], eax
.text:100078C1 mov eax, [edx]
.text:100078C3 mov [esi+0CCh], eax
.text:100078C9 mov eax, [edx+4]
.text:100078CC and [esi+67Ch], edi
.text:100078D2 mov [esi+0D0h], eax
.text:100078D8 cmp ecx, 4
.text:100078DB jbe loc_100B04EB
.text:100078E1 mov eax, [edx+0Ch]
.text:100078E4 cmp eax, 40h
.text:100078E7 ja loc_100B04EB
.text:100078ED xor ecx, ecx
.text:100078EF mov [esi+10Ch], eax
.text:100078F5 test eax, eax
.text:100078F7 jnz loc_100B04C7
.text:100078FD
.text:100078FD loc_100078FD: ; CODE XREF: PresentCB(x,x)+A8CA6j
.text:100078FD ; PresentCB(x,x)+A8CB4j
.text:100078FD mov ecx, [esi+12C0h]
.text:10007903 test ecx, ecx
.text:10007905 jnz loc_100B04F6
.text:1000790B
.text:1000790B loc_1000790B: ; CODE XREF: PresentCB(x,x)+A8CC9j
.text:1000790B ; PresentCB(x,x)+A8D00j
.text:1000790B cmp [esp+2B8h+var_2A4], 0
.text:10007910 jz short loc_10007921
;保存esi+0x0ch到eax,没错,这个eax就是存放的D3DKMT_PRESENT。然后把下面的_pfnOsThunkDDIPresent想象成NtGdiDdDDIPresent。熟悉了吧,这个就是刚开始断下来调用堆栈中的一段。
;bffe7d48 81d1b377 00fbb170 0394f3f8 778b2da4 win32k!NtGdiDdDDIPresent+0x14 (FPO: [1,0,0])
;bffe7d48 778b2da4 00fbb170 0394f3f8 778b2da4 nt!KiSystemServicePostCall (FPO: [0,3] TrapFrame @ bffe7d54)
;0394f130 76896bdc 70747363 00fbb170 00fbb170 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
;0394f134 70747363 00fbb170 00fbb170 0587d000 GDI32!NtGdiDdDDIPresent+0xa (FPO: [1,0,0])
;0394f3f8 53134e24 00fbb0b0 0394f540 00fbb170 d3d9!PresentCB+0xe2 (FPO: [2,171,4])
.text:10007912 lea eax, [esi+0C0h]
.text:10007918 push eax ; _DWORD
.text:10007919 call _pfnOsThunkDDIPresent
.text:1000791F mov edi, eax
.text:10007921
.text:10007921 loc_10007921: ; CODE XREF: PresentCB(x,x)+D3j
.text:10007921 test edi, edi
.text:10007923 js loc_1004A634
.text:10007929
.text:10007929 loc_10007929: ; CODE XREF: PresentCB(x,x)+42E81j
.text:10007929 ; PresentCB(x,x)+4A0EAj ...
.text:10007929 mov ecx, [esi+12C0h]
.text:1000792F test ecx, ecx
.text:10007931 jnz loc_100B05BA
.text:10007937
.text:10007937 loc_10007937: ; CODE XREF: PresentCB(x,x)+A8D8Dj
.text:10007937 ; PresentCB(x,x)+A8DDEj
.text:10007937 and dword ptr [esi+12C0h], 0
.text:1000793E and dword ptr [esi+12C8h], 0
.text:10007945 and dword ptr [esi+12CCh], 0
.text:1000794C and dword ptr [esi+12D0h], 0
.text:10007953 and dword ptr [esi+12D4h], 0
.text:1000795A mov dword ptr [esi+678h], 0
.text:10007964
.text:10007964 loc_10007964: ; CODE XREF: PresentCB(x,x)+53j
.text:10007964 xor eax, eax
.text:10007966
.text:10007966 loc_10007966: ; CODE XREF: PresentCB(x,x)+A8C7Aj
.text:10007966 mov ecx, [esp+2B8h+var_4]
.text:1000796D pop edi
.text:1000796E pop esi
.text:1000796F pop ebx
.text:10007970 xor ecx, esp
现在我们知道了D3DKMT_PRESENT在哪里,那么下面的事情就好办了。
我想到了2个办法,
1. Hook d3d9.dll中的presentcb
2. Hook GDI32!NtGdiDdDDIPresent 或者 win32k!NtGdiDdDDIPresent
3. 另外提一个,在usermode driver中通过通过hDevice参数直接改,但是这个咱们做不到,谁让人家是Vender呢,o(╯□╰)o,...
下来动手实践了
这里Hook了 win32k!NtGdiDdDDIPresent, 没有写代码,直接在windbg中找了一块不用的地方 跳转过去,执行完hook代码后继续执行原函数。NtGdiDdDDIPresent MS并没有文档化,其原型可参照OpenGL Driver 中 D3DKMTPresent,这个函数在msdn上是有详细说明的。
附上windbg中改的代码...
我偷了个懒,拿beep.sys中的空间来存放我们的hook代码了...省时省力
(DX9强制开启垂直同步弄清楚了,其他的DX9,DX10, DX11的强制开启和关闭应该也都类似...)
win32k!NtGdiDdDDIPresent:
90971e8c e96f910af6 jmp Beep!DriverEntry <PERF> (Beep+0x0) (86a1b000)
Beep!DriverEntry <PERF> (Beep+0x0):
86a1b000 50 push eax
86a1b001 8b442408 mov eax,dword ptr [esp+8]
86a1b005 0544000000 add eax,44h ;找到flipInterval的偏移,直接修改。
86a1b00a c70001000000 mov dword ptr [eax],1
86a1b010 58 pop eax
86a1b011 55 push ebp
86a1b012 8bec mov ebp,esp
86a1b014 e9786ef509 jmp win32k!NtGdiDdDDIPresent+0x5 (90971e91)
改完后效果如下
[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课
上传的附件: