[转帖]VMProtect 逆向分析-软件逆向-看雪-安全社区|安全招聘|kanxue.com

[转帖]VMProtect 逆向分析

发表于: 2008-8-7 10:06 31241

[转帖]VMProtect 逆向分析

Bughoho

2008-8-7 10:06

31241

VMProtect, Part 0: Basics

Author: RolfRolles

VMProtect is a virtualization protector. Like other protections in the genre, among others
ReWolf's x86 Virtualizer and CodeVirtualizer, it works by disassembling the x86 bytecode of
the target executable and compiling it into a proprietary, polymorphic bytecode which is
executed in a custom interpreter at run-time. This is unlike the traditional notions of
packing, in which the x86 bytecode is simply encrypted and/or compressed: with virtualization,
the original x86 bytecode in the protected areas is gone, never to be seen again. Or so the
idea goes.

If you've never looked at VMProtect before, I encourage you to take a five-minute look in
IDA (here's a sample packed binary). As far as VMs go, it is particularly skeletal and
easily comprehended. The difficulty lies in recreating working x86 bytecode from the VM
bytecode. Here's a two-minute analysis of its dispatcher.

push edi ; push all registers
push ecx
push edx
push esi
push ebp
push ebx
push eax
push edx
pushf
push 0 ; imagebase fixup
mov esi, [esp+8+arg_0] ; esi = pointer to VM bytecode
mov ebp, esp ; ebp = VM's "stack" pointer
sub esp, 0C0h
mov edi, esp ; edi = "scratch" data area

VM__FOLLOW__Update:
add esi, [ebp+0]

VM__FOLLOW__Regular:
mov al, [esi] ; read a byte from EIP
movzx eax, al
sub esi, -1 ; increment EIP
jmp ds:VM__HandlerTable[eax*4] ; execute instruction handler

A feature worth discussing is the "scratch space", referenced by the register edi throughout
the dispatch loop. This is a 16-dword-sized area on the stack where VMProtect saves the
registers upon entering the VM, modifies them throughout the course of a basic block,
and from whence it restores the registers upon exit. For each basic block protected by the VM,
the layout of the registers in the scratch space can potentially be different.

Here's a disassembly of some instruction handlers. Notice that A) VMProtect is a stack machine
and that B) each handler -- though consisting of scant few instructions -- performs several
tasks, e.g. popping several values, performing multiple operations, pushing one or more values.

#00: x = [EIP-1] & 0x3C; y = popd; [edi+x] = y

.text:00427251 and al, 3Ch ; al = instruction number
.text:00427254 mov edx, [ebp+0] ; grab a dword off the stack
.text:00427257 add ebp, 4 ; pop the stack
.text:0042725A mov [edi+eax], edx ; store the dword in the scratch space

#01: x = [EIP-1] & 0x3C; y = [edi+x]; pushd y

.vmp0:0046B0EB and al, 3Ch ; al = instruction number
.vmp0:0046B0EE mov edx, [edi+eax] ; grab a dword out of the scratch space
.vmp0:0046B0F1 sub ebp, 4 ; subtract 4 from the stack pointer
.vmp0:0046B0F4 mov [ebp+0], edx ; push the dword onto the stack

#02: x = popw, y = popw, z = x + y, pushw z, pushf

.text:004271FB mov ax, [ebp+0] ; pop a word off the stack
.text:004271FF sub ebp, 2
.text:00427202 add [ebp+4], ax ; add it to another word on the stack
.text:00427206 pushf
.text:00427207 pop dword ptr [ebp+0] ; push the flags

#03: x = [EIP++]; w = popw; [edi+x] = Byte(w)

.vmp0:0046B02A movzx eax, byte ptr [esi] ; read a byte from EIP
.vmp0:0046B02D mov dx, [ebp+0] ; pop a word off the stack
.vmp0:0046B031 inc esi ; EIP++
.vmp0:0046B032 add ebp, 2 ; adjust stack pointer
.vmp0:0046B035 mov [edi+eax], dl ; write a byte into the scratch area

#04: x = popd, y = popw, z = x << y, pushd z, pushf

.vmp0:0046B095 mov eax, [ebp+0] ; pop a dword off the stack
.vmp0:0046B098 mov cl, [ebp+4] ; pop a word off the stack
.vmp0:0046B09B sub ebp, 2
.vmp0:0046B09E shr eax, cl ; shr the dword by the word
.vmp0:0046B0A0 mov [ebp+4], eax ; push the result
.vmp0:0046B0A3 pushf
.vmp0:0046B0A4 pop dword ptr [ebp+0] ; push the flags

#05: x = popd, pushd ss:[x]

.vmp0:0046B5F7 mov eax, [ebp+0] ; pop a dword off the stack
.vmp0:0046B5FA mov eax, ss:[eax] ; read a dword from ss
.vmp0:0046B5FD mov [ebp+0], eax ; push that dword

[招生]科锐逆向工程师培训(2025年3月11日实地，远程教学同时开班, 第52期)！

#VM保护

收藏・40

免费・7

支持

赞赏记录

参与人

雪币

留言

时间

Youlor

为你点赞~

2024-1-4 00:07

伟叔叔

为你点赞~

2023-11-12 00:10

QinBeast

为你点赞~

2023-8-21 00:18

PLEBFE

为你点赞~

2023-8-19 00:22

shinratensei

为你点赞~

2023-7-25 04:07

心游尘世外

为你点赞~

2023-7-16 03:54

飘零丶

为你点赞~

2023-7-4 03:12

最新回复 (43) 1 2 ▶
Bughoho 雪币： 1946 活跃值： (283) 能力值： (RANK：330 ) 在线值：发帖 72 回帖 1217 粉丝 27 关注私信	Bughoho 8 2 楼 Part 1: Bytecode and IR Author: RolfRolles The approach I took with ReWolf's x86 Virtualizer is also applicable here, although a more sophisticated compiler is required. What follows is some preliminary notes on the design and implementation of such a component. These are not complete details on breaking the protection; I confess to having only looked at a few samples, and I am not sure which protection options were enabled. As before, we begin by constructing a disassembler for the interpreter. This is immediately problematic, since the bytecode language is polymorphic. I have created an IDA plugin that automatically constructs OCaml source code for a bytecode disassembler. In a production-quality implementation, this should be implemented as a standalone component that returns a closure. The generated disassembler, then, looks like this: let disassemble bytearray index = match (bytearray.(index) land 0xff) with 0x0 -> (VM__Handler0__PopIntoRegister(0),[index+1]) \| 0x1 -> (VM__Handler1__PushDwordFromRegister(0),[index+1]) \| 0x2 -> (VM__Handler2__AddWords,[index+1]) \| 0x3 -> (VM__Handler3__StoreByteIntoRegister(bytearray.(index+1)),[index+2]) \| 0x4 -> (VM__Handler0__PopIntoRegister(4),[index+1]) \| 0x5 -> (VM__Handler1__PushDwordFromRegister(4),[index+1]) \| 0x6 -> (VM__Handler4__ShrDword,[index+1]) \| 0x7 -> (VM__Handler5__ReadDword__FromStackSegment,[index+1]) \| ... -> ... Were we to work with the instructions individually in their natural granularity, depicted above, the bookkeeping on the semantics of each would likely prove tedious. For illustration, compare and contrast handlers #02 and #04. Both have the same basic pattern: pop two values (words vs. dwords), perform a binary operation (add vs. shr), push the result, then push the flags. The current representation of instructions does not express these, or any, similarities. Handler #02: Handler #04: mov ax, [ebp+0] mov eax, [ebp+0] sub ebp, 2 mov cl, [ebp+4] add [ebp+4], ax sub ebp, 2 pushf shr eax, cl pop dword ptr [ebp+0] mov [ebp+4], eax pushf pop dword ptr [ebp+0] Therefore, we pull a standard compiler-writer's trick and translate the VMProtect instructions into a simpler, "intermediate" language (hereinafter "IR") which resembles the pseudocode snippets atop the handlers in part zero. Below is a fragment of that language's abstract syntax. type size = B \| W \| D \| Q type temp = int * size type seg = Scratch \| SS \| FS \| Regular type irbinop = Add \| And \| Shl \| Shr \| MakeQword type irunop = Neg \| MakeByte \| TakeHighDword \| Flags type irexpr = Reg of register \| Temp of int \| Const of const \| Deref of seg * irexpr * size \| Binop of irexpr * irbinop * irexpr \| Unop of irexpr * irunop type ir = DeclareTemps of temp list \| Assign of irexpr * irexpr \| Push of irexpr \| Pop of irexpr \| Return A portion of the VMProtect -> IR translator follows; compare the translation for handlers #02 and #04. let make_microcode = function VM__Handler0__PopIntoRegister(b) -> [Pop(Deref(Scratch, Const(Dword(zero_extend_byte_dword(b land 0x3C))), D))] \| VM__Handler2__AddWords -> [DeclareTemps([(0, W);(1, W);(2, W)]); Pop(Temp(0)); Pop(Temp(1)); Assign(Temp(2), Binop(Temp(0), Add, Temp(1))); Push(Temp(2)); Push(Unop(Temp(2), Flags))] \| VM__Handler4__ShrDword -> [DeclareTemps([(0, D);(1, W);(2, D)]); Pop(Temp(0)); Pop(Temp(1)); Assign(Temp(2), Binop(Temp(0), Shr, Temp(1))); Push(Temp(2)); Push(Unop(Temp(2), Flags))] \| VM__Handler7__PushESP -> [Push(Reg(Esp))] \| VM__Handler23__WriteDwordIntoFSSegment -> [DeclareTemps([(0, D);(1, D)]); Pop(Temp(0)); Pop(Temp(1)); Assign(Deref(FS, Temp(0), D), Temp(1))] \| ... -> ... To summarize the process, below is a listing of VMProtect instructions, followed by the assembly code that is executed for each, and to the right is the IR translation. VM__Handler1__PushDwordFromRegister 32 and al, 3Ch ; al = 32 mov edx, [edi+eax] sub ebp, 4 mov [ebp+0], edx Push (Deref (Scratch, Const (Dword 32l), D)); VM__Handler7__PushESP mov eax, ebp sub ebp, 4 mov [ebp+0], eax Push (Reg Esp); VM__Handler0__PopIntoRegister 40 and al, 3Ch mov edx, [ebp+0] add ebp, 4 mov [edi+eax], edx Pop (Deref (Scratch, Const (Dword 40l), D)); VM__Handler19__PushSignedByteAsDword (-1l) movzx eax, byte ptr [esi] ; esi = -1 sub esi, 0FFFFFFFFh cbw cwde sub ebp, 4 mov [ebp+0], eax Push (Const (Dword (-1l))); VM__Handler9__PushDword 4525664l mov eax, [esi] ; esi = 4525664l add esi, 4 sub ebp, 4 mov [ebp+0], eax Push (Const (Dword 4525664l)); VM__Handler9__PushDword 4362952l}; mov eax, [esi] ; esi = 4362952l add esi, 4 sub ebp, 4 mov [ebp+0], eax Push (Const (Dword 4362952l)); VM__Handler19__PushSignedByteAsDword 0l}; movzx eax, byte ptr [esi] ; esi = 0 sub esi, 0FFFFFFFFh cbw cwde sub ebp, 4 mov [ebp+0], eax Push (Const (Dword (0l))); VM__Handler42__ReadDwordFromFSSegment}; mov eax, [ebp+0] DeclareTemps([(0,D)]); Pop (Temp 0); mov eax, fs:[eax] mov [ebp+0], eax Push (Deref (FS, Temp 0, D)); 2008-8-7 10:09 0
Bughoho 雪币： 1946 活跃值： (283) 能力值： (RANK：330 ) 在线值：发帖 72 回帖 1217 粉丝 27 关注私信	Bughoho 8 3 楼 Part 2: Introduction to Optimization Author: RolfRolles Basically, VMProtect bytecode and the IR differ from x86 assembly language in four ways: 1) It's a stack machine; 2) The IR contains "temporary variables"; 3) It contains what I've called a "scratch" area, upon which computations are performed rather than in the registers; 4) In the case of VMProtect Ultra, it's obfuscated (or rather, "de-optimized") in certain ways. It turns out that removing these four aspects from the IR is sufficient preparation for compilation into sensible x86 code. We accomplish this via standard compiler optimizations applied locally to each basic block. In general, there are a few main compiler optimizations used in this process. The first one is "constant propagation". Consider the following C code. int x = 1; function(x); Clearly x will always be 1 when function is invoked: it is defined on the first line, and is not re-defined before it is used in the following line (the definition in line one "reaches" the use in line two; alternatively, the path between the two is "definition-clear" with respect to x). Thus, the code can be safely transformed into "function(1)". If the first line is the only definition of the variable x, then we can replace all uses of x with the integer 1. If the second line is the only use of the variable x, then we can eliminate the variable. The next is "constant folding". Consider the following C code. int x = 1024; function(x1024); By the above remarks, we know we can transform the second line into "function(10241024);". It would be silly to generate code that actually performed this multiplication at run-time: the value is known at compile-time, and should be computed then. We can replace the second line with "function(1048576);", and in general we can replace any binary operation performed upon constant values with the computed result of that operation. Similar to constant propagation is "copy propagation", as in the below. void f(int i) { int j = i; g(j); } The variable j is merely a copy of the variable i, and so the variable i can be substituted in for j until the point where either is redefined. Lacking redefinitions, j can be eliminated entirely. The next optimization is "dead-store elimination". Consider the following C code. int y = 2; y = 1; The definition of the variable y on line one is immediately un-done by the one on line two. Therefore, there is no reason to actually assign the value 2 to the variable; the first store to y is a "dead store", and can be eliminated by the standard liveness-based compiler optimization known as "dead-store elimination", or more generally "dead-code elimination". Here's an example from the VMProtect IR. ecx = DWORD Scratch:[Dword(44)] ecx = DWORD Scratch:[Dword(20)] After dead-store-eliminating the first instruction, it turns out no other instructions use Scratch:[Dword(44)], and so its previous definition can be eliminated as well. 2008-8-7 10:11 0
Bughoho 雪币： 1946 活跃值： (283) 能力值： (RANK：330 ) 在线值：发帖 72 回帖 1217 粉丝 27 关注私信	Bughoho 8 4 楼 Part 3: Optimizing and Compiling Author: RolfRolles The reader should inspect this unoptimized IR listing before continuing. In an attempt to keep this entry from becoming unnecessarily long, the example snippets will be small, but for completeness a more thorough running example is linked throughout the text. We begin by removing the stack machine features of the IR. Since VMProtect operates on disassembled x86 code, and x86 itself is not a stack machine, this aspect of the protection is unnatural and easily removed. Here is a 15-line fragment of VMProtect IR. push Dword(-88) push esp push Dword(4) pop t3 pop t4 t5 = t3 + t4 push t5 push flags t5 pop DWORD Scratch:[Dword(52)] pop t6 pop t7 t8 = t6 + t7 push t8 push flags t8 pop DWORD Scratch:[Dword(12)] pop esp All but two instructions are pushes or pops, and the pushes can be easily matched up with the pops. Tracking the stack pointer, we see that, for example, t3 = Dword(4). A simple analysis allows us to "optimize away" the push/pop pairs into assignment statements. Simply iterate through each instruction in a basic block and keep a stack describing the source of each push. For every pop, ensure that the sizes match and record the location of the corresponding push. We wish to replace the pop with an assignment to the popped expression from the pushed expression, as in t3 = Dword(4) t4 = esp t7 = Dword(-88) With the stack aspects removed, we are left with a more conventional listing containing many assignment statements. This optimization substantially reduces the number of instructions in a given basic block (~40% for the linked example) and opens the door for other optimizations. The newly optimized code is eight lines, roughly half of the original: t3 = Dword(4) t4 = esp t5 = t3 + t4 DWORD Scratch:[Dword(52)] = flags t5 t6 = t5 t7 = Dword(-88) t8 = t6 + t7 DWORD Scratch:[Dword(12)] = flags t8 esp = t8 A complete listing of the unoptimized IR versus the one with the stack machine features removed is here, which should be perused before proceeding. Now we turn our attention to the temporary variables and the scratch area. Recall that the former were not part of the pre-protected x86 code, nor the VMProtect bytecode -- they were introduced in order to ease the IR translation. The latter is part of the VMProtect bytecode, but was not part of the original pre-protected x86 code. Since these are not part of the languages we are modelling, we shall eliminate them wholesale. On a high level, we treat each temporary variable, each byte of the scratch space, and each register as being a variable defined within a basic block, and then eliminate the former two via the compiler optimizations previously discussed. Looking again at the last snippet of IR, we can see several areas for improvement. First, consider the variable t6. It is clearly just a copy of t5, neither of which are redefined before the next use in the assignment to t8. Copy propagation will replace variable t6 with t5 and eliminate the former. More generally, t3, t4, and t7 contain either constants or values that are not modified between their uses. Constant and copy propagation will substitute the assignments to these variables in for their uses and eliminate them. The newly optimized code is a slender three lines compared to the original 15; we have removed 80% of the IR for the running example. DWORD Scratch:[Dword(52)] = flags Dword(4) + esp esp = Dword(4) + esp + Dword(-88) DWORD Scratch:[Dword(12)] = flags Dword(4) + esp + Dword(-88) The side-by-side comparison can be found here. The IR now looks closer to x86, with the exception that the results of computations are being stored in the scratch area, not into registers. As before, we apply dead-store elimination, copy and constant propagation to the scratch area, removing dependence upon it entirely in the process. See here for a comparison with the last phase. Here is a comparison of the final, optimized code against the original x86: push ebp push ebp ebp = esp mov ebp, esp push Dword(-1) push 0FFFFFFFFh push Dword(4525664) push 450E60h push Dword(4362952) push offset sub_4292C8 eax = DWORD FS:[Dword(0)] mov eax, large fs:0 push eax push eax DWORD FS:[Dword(0)] = esp mov large fs:0, esp eflags = flags esp + Dword(-88) esp = esp + Dword(-88) add esp, 0FFFFFFA8h push ebx push ebx push esi push esi push edi push edi DWORD SS:[Dword(-24) + ebp] = esp mov [ebp-18h], esp call DWORD [Dword(4590300)] call dword ptr ds:unk_460ADC vmreturn Dword(0) + Dword(4638392) Code generation is an afterthought. 2008-8-7 10:12 0
Bughoho 雪币： 1946 活跃值： (283) 能力值： (RANK：330 ) 在线值：发帖 72 回帖 1217 粉丝 27 关注私信	Bughoho 8 5 楼 vmp这次要蛋疼了 2008-8-7 10:15 0
yingyue 雪币： 1844 活跃值： (35) 能力值： ( LV3，RANK：30 ) 在线值：发帖 16 回帖 2595 粉丝 3 关注私信	yingyue 6 楼谢，收下，。。。。。。。。。。。。。。。。 2008-8-7 11:21 0
heXer 雪币： 254 活跃值： (126) 能力值： ( LV8，RANK：130 ) 在线值：发帖 12 回帖 802 粉丝 2 关注私信	heXer 3 7 楼比发哥的挫啊 2008-8-7 11:54 0
lunglungyu 雪币： 207 活跃值： (10) 能力值： ( LV4，RANK：50 ) 在线值：发帖 5 回帖 339 粉丝 0 关注私信	lunglungyu 1 8 楼佔楼将来再看。 2008-8-7 14:05 0
壹只老虎雪币： 112 活跃值： (16) 能力值： ( LV9，RANK：290 ) 在线值：发帖 38 回帖 709 粉丝 3 关注私信	壹只老虎 7 9 楼广告位出租! 2008-8-7 14:26 0
wangdell 雪币： 87 活跃值： (47) 能力值： ( LV12，RANK：250 ) 在线值：发帖 52 回帖 328 粉丝 3 关注私信	wangdell 6 10 楼虽然很详细，但这也不是新东西了。类似的工作我们已经做到了,opcode我们也已经全部解开了，块结构和RISC指令集也基本搞清楚。代码还原也手工做到了。虽然我们之前的分析只是1.2x，但和1.5的demo也区别不大。这里好像是demo版，对release版如此分析不是这么容易了。这还是白盒分析法，费体力和精力，除非有完美工具。以前用白盒分析，很辛苦，而vm多样，如何和机器对抗? 对vm,目前我觉得黑盒分析是最好的方法。不知fg的工具是何样的。 2008-8-7 15:37 0
Bughoho 雪币： 1946 活跃值： (283) 能力值： (RANK：330 ) 在线值：发帖 72 回帖 1217 粉丝 27 关注私信	Bughoho 8 11 楼我只是觉得RolfRolles的文章写的比之前出现的文章都更容易懂一点，这是我转帖的目的。既然你对vmp分析得如此透彻，你也可以做出来，不必指望fg的工具 2008-8-7 16:01 0
cd37ycs 雪币： 299 活跃值： (1805) 能力值： ( LV2，RANK：10 ) 在线值：发帖 4 回帖 411 粉丝 0 关注私信	cd37ycs 12 楼看样子再过个一年半载工具就会公开了。 2008-8-7 16:07 0
wangdell 雪币： 87 活跃值： (47) 能力值： ( LV12，RANK：250 ) 在线值：发帖 52 回帖 328 粉丝 3 关注私信	wangdell 6 13 楼文章不错，只是我期望过高，看后有点小失望。工具一是没时间，每天能有2个小时让我搞crack就不错；二是没水平，如果有这水平，我会先考虑写个vm。 vm千变万化，如果写修复工具只能针对某一特定vm或某一特定版本，还未必完美，是个费力且收效小的活。即使出了成品，随即该vmprotect过时(有了破解工具，就没人用了），vm立即升级，工具也会立即消亡，昙花一现。对fg的工具，是有点好奇和怀疑，会有人写如此工具吗？会完美吗？会通用吗？另如我是fg，即使有此工具，我也不会公布出来，那样该版本vmprotect随即死亡，该工具也立刻无用。 2008-8-7 16:37 0
海风月影雪币： 7327 活跃值： (3813) 能力值： (RANK：1130 ) 在线值：发帖 93 回帖 2310 粉丝 123 关注私信	海风月影 22 14 楼我一直认为如果vmprotect的作者应该来中国接某些业务，可能比他卖vmp还要赚钱 2008-8-7 16:41 0
KuNgBiM 雪币： 817 活跃值： (1927) 能力值： ( LV12，RANK：2670 ) 在线值：发帖 500 回帖 2517 粉丝 22 关注私信	KuNgBiM 66 15 楼呵呵深入人心！ 2008-8-7 19:46 0
cpuArt 雪币： 150 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 7 回帖 58 粉丝 0 关注私信	cpuArt 16 楼占楼看看~~ 2008-8-7 20:42 0
鸡蛋壳雪币： 427 活跃值： (412) 能力值： ( LV2，RANK：10 ) 在线值：发帖 456 回帖 3147 粉丝 9 关注私信	鸡蛋壳 17 楼 VMP并不能有效的阻止软件破解。 2008-8-7 21:30 0
游侠201 雪币： 201 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 0 回帖 11 粉丝 0 关注私信	游侠201 18 楼广告位出租! 2008-8-7 21:39 0
yangjt 雪币： 609 活跃值： (247) 能力值： ( LV12，RANK：441 ) 在线值：发帖 51 回帖 459 粉丝 1 关注私信	yangjt 10 19 楼强大……19楼广告位招租…… 2008-8-7 21:55 0
cater 雪币： 109 活跃值： (608) 能力值： ( LV12，RANK：220 ) 在线值：发帖 58 回帖 415 粉丝 3 关注私信	cater 5 20 楼高级逆向虚拟打人七夕膜拜七次 2008-8-7 22:06 0
uvbs 雪币： 30 活跃值： (987) 能力值： ( LV4，RANK：50 ) 在线值：发帖 4 回帖 200 粉丝 2 关注私信	uvbs 21 楼 a basic introduction to stack machine 2008-8-8 00:08 0
ucantseeme 雪币： 442 活跃值： (108) 能力值： ( LV2，RANK：10 ) 在线值：发帖 46 回帖 990 粉丝 1 关注私信	ucantseeme 22 楼插入先,最近想搞VMP 2008-9-6 08:23 0
Aker 雪币： 2134 活跃值： (14) 能力值： (RANK：170 ) 在线值：发帖 50 回帖 773 粉丝 3 关注私信	Aker 4 23 楼研究研究~~上次没mark 2008-9-7 20:47 0
q3 watcher 雪币： 215 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 23 回帖 902 粉丝 0 关注私信	q3 watcher 24 楼没有什么能阻止强大的蛋蛋 2008-9-8 08:14 0
zcfxdsh 雪币： 200 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 5 回帖 20 粉丝 0 关注私信	zcfxdsh 25 楼看来VMP确实很利害了。 2008-10-13 16:29 0
	游客登录 \| 注册方可回帖回帖表情雪币赚取及消费高级回复

Bughoho

发帖

1217

回帖

330

RANK

关注

私信

他的文章

关于我们

联系我们

企业服务

看雪公众号

最新回复 (43) 1 2 ▶
Bughoho 雪币： 1946 活跃值： (283) 能力值： (RANK：330 ) 在线值：发帖 72 回帖 1217 粉丝 27 关注私信	Bughoho 8 2 楼 Part 1: Bytecode and IR Author: RolfRolles The approach I took with ReWolf's x86 Virtualizer is also applicable here, although a more sophisticated compiler is required. What follows is some preliminary notes on the design and implementation of such a component. These are not complete details on breaking the protection; I confess to having only looked at a few samples, and I am not sure which protection options were enabled. As before, we begin by constructing a disassembler for the interpreter. This is immediately problematic, since the bytecode language is polymorphic. I have created an IDA plugin that automatically constructs OCaml source code for a bytecode disassembler. In a production-quality implementation, this should be implemented as a standalone component that returns a closure. The generated disassembler, then, looks like this: let disassemble bytearray index = match (bytearray.(index) land 0xff) with 0x0 -> (VM__Handler0__PopIntoRegister(0),[index+1]) \| 0x1 -> (VM__Handler1__PushDwordFromRegister(0),[index+1]) \| 0x2 -> (VM__Handler2__AddWords,[index+1]) \| 0x3 -> (VM__Handler3__StoreByteIntoRegister(bytearray.(index+1)),[index+2]) \| 0x4 -> (VM__Handler0__PopIntoRegister(4),[index+1]) \| 0x5 -> (VM__Handler1__PushDwordFromRegister(4),[index+1]) \| 0x6 -> (VM__Handler4__ShrDword,[index+1]) \| 0x7 -> (VM__Handler5__ReadDword__FromStackSegment,[index+1]) \| ... -> ... Were we to work with the instructions individually in their natural granularity, depicted above, the bookkeeping on the semantics of each would likely prove tedious. For illustration, compare and contrast handlers #02 and #04. Both have the same basic pattern: pop two values (words vs. dwords), perform a binary operation (add vs. shr), push the result, then push the flags. The current representation of instructions does not express these, or any, similarities. Handler #02: Handler #04: mov ax, [ebp+0] mov eax, [ebp+0] sub ebp, 2 mov cl, [ebp+4] add [ebp+4], ax sub ebp, 2 pushf shr eax, cl pop dword ptr [ebp+0] mov [ebp+4], eax pushf pop dword ptr [ebp+0] Therefore, we pull a standard compiler-writer's trick and translate the VMProtect instructions into a simpler, "intermediate" language (hereinafter "IR") which resembles the pseudocode snippets atop the handlers in part zero. Below is a fragment of that language's abstract syntax. type size = B \| W \| D \| Q type temp = int * size type seg = Scratch \| SS \| FS \| Regular type irbinop = Add \| And \| Shl \| Shr \| MakeQword type irunop = Neg \| MakeByte \| TakeHighDword \| Flags type irexpr = Reg of register \| Temp of int \| Const of const \| Deref of seg * irexpr * size \| Binop of irexpr * irbinop * irexpr \| Unop of irexpr * irunop type ir = DeclareTemps of temp list \| Assign of irexpr * irexpr \| Push of irexpr \| Pop of irexpr \| Return A portion of the VMProtect -> IR translator follows; compare the translation for handlers #02 and #04. let make_microcode = function VM__Handler0__PopIntoRegister(b) -> [Pop(Deref(Scratch, Const(Dword(zero_extend_byte_dword(b land 0x3C))), D))] \| VM__Handler2__AddWords -> [DeclareTemps([(0, W);(1, W);(2, W)]); Pop(Temp(0)); Pop(Temp(1)); Assign(Temp(2), Binop(Temp(0), Add, Temp(1))); Push(Temp(2)); Push(Unop(Temp(2), Flags))] \| VM__Handler4__ShrDword -> [DeclareTemps([(0, D);(1, W);(2, D)]); Pop(Temp(0)); Pop(Temp(1)); Assign(Temp(2), Binop(Temp(0), Shr, Temp(1))); Push(Temp(2)); Push(Unop(Temp(2), Flags))] \| VM__Handler7__PushESP -> [Push(Reg(Esp))] \| VM__Handler23__WriteDwordIntoFSSegment -> [DeclareTemps([(0, D);(1, D)]); Pop(Temp(0)); Pop(Temp(1)); Assign(Deref(FS, Temp(0), D), Temp(1))] \| ... -> ... To summarize the process, below is a listing of VMProtect instructions, followed by the assembly code that is executed for each, and to the right is the IR translation. VM__Handler1__PushDwordFromRegister 32 and al, 3Ch ; al = 32 mov edx, [edi+eax] sub ebp, 4 mov [ebp+0], edx Push (Deref (Scratch, Const (Dword 32l), D)); VM__Handler7__PushESP mov eax, ebp sub ebp, 4 mov [ebp+0], eax Push (Reg Esp); VM__Handler0__PopIntoRegister 40 and al, 3Ch mov edx, [ebp+0] add ebp, 4 mov [edi+eax], edx Pop (Deref (Scratch, Const (Dword 40l), D)); VM__Handler19__PushSignedByteAsDword (-1l) movzx eax, byte ptr [esi] ; esi = -1 sub esi, 0FFFFFFFFh cbw cwde sub ebp, 4 mov [ebp+0], eax Push (Const (Dword (-1l))); VM__Handler9__PushDword 4525664l mov eax, [esi] ; esi = 4525664l add esi, 4 sub ebp, 4 mov [ebp+0], eax Push (Const (Dword 4525664l)); VM__Handler9__PushDword 4362952l}; mov eax, [esi] ; esi = 4362952l add esi, 4 sub ebp, 4 mov [ebp+0], eax Push (Const (Dword 4362952l)); VM__Handler19__PushSignedByteAsDword 0l}; movzx eax, byte ptr [esi] ; esi = 0 sub esi, 0FFFFFFFFh cbw cwde sub ebp, 4 mov [ebp+0], eax Push (Const (Dword (0l))); VM__Handler42__ReadDwordFromFSSegment}; mov eax, [ebp+0] DeclareTemps([(0,D)]); Pop (Temp 0); mov eax, fs:[eax] mov [ebp+0], eax Push (Deref (FS, Temp 0, D)); 2008-8-7 10:09 0
Bughoho 雪币： 1946 活跃值： (283) 能力值： (RANK：330 ) 在线值：发帖 72 回帖 1217 粉丝 27 关注私信	Bughoho 8 3 楼 Part 2: Introduction to Optimization Author: RolfRolles Basically, VMProtect bytecode and the IR differ from x86 assembly language in four ways: 1) It's a stack machine; 2) The IR contains "temporary variables"; 3) It contains what I've called a "scratch" area, upon which computations are performed rather than in the registers; 4) In the case of VMProtect Ultra, it's obfuscated (or rather, "de-optimized") in certain ways. It turns out that removing these four aspects from the IR is sufficient preparation for compilation into sensible x86 code. We accomplish this via standard compiler optimizations applied locally to each basic block. In general, there are a few main compiler optimizations used in this process. The first one is "constant propagation". Consider the following C code. int x = 1; function(x); Clearly x will always be 1 when function is invoked: it is defined on the first line, and is not re-defined before it is used in the following line (the definition in line one "reaches" the use in line two; alternatively, the path between the two is "definition-clear" with respect to x). Thus, the code can be safely transformed into "function(1)". If the first line is the only definition of the variable x, then we can replace all uses of x with the integer 1. If the second line is the only use of the variable x, then we can eliminate the variable. The next is "constant folding". Consider the following C code. int x = 1024; function(x1024); By the above remarks, we know we can transform the second line into "function(10241024);". It would be silly to generate code that actually performed this multiplication at run-time: the value is known at compile-time, and should be computed then. We can replace the second line with "function(1048576);", and in general we can replace any binary operation performed upon constant values with the computed result of that operation. Similar to constant propagation is "copy propagation", as in the below. void f(int i) { int j = i; g(j); } The variable j is merely a copy of the variable i, and so the variable i can be substituted in for j until the point where either is redefined. Lacking redefinitions, j can be eliminated entirely. The next optimization is "dead-store elimination". Consider the following C code. int y = 2; y = 1; The definition of the variable y on line one is immediately un-done by the one on line two. Therefore, there is no reason to actually assign the value 2 to the variable; the first store to y is a "dead store", and can be eliminated by the standard liveness-based compiler optimization known as "dead-store elimination", or more generally "dead-code elimination". Here's an example from the VMProtect IR. ecx = DWORD Scratch:[Dword(44)] ecx = DWORD Scratch:[Dword(20)] After dead-store-eliminating the first instruction, it turns out no other instructions use Scratch:[Dword(44)], and so its previous definition can be eliminated as well. 2008-8-7 10:11 0
Bughoho 雪币： 1946 活跃值： (283) 能力值： (RANK：330 ) 在线值：发帖 72 回帖 1217 粉丝 27 关注私信	Bughoho 8 4 楼 Part 3: Optimizing and Compiling Author: RolfRolles The reader should inspect this unoptimized IR listing before continuing. In an attempt to keep this entry from becoming unnecessarily long, the example snippets will be small, but for completeness a more thorough running example is linked throughout the text. We begin by removing the stack machine features of the IR. Since VMProtect operates on disassembled x86 code, and x86 itself is not a stack machine, this aspect of the protection is unnatural and easily removed. Here is a 15-line fragment of VMProtect IR. push Dword(-88) push esp push Dword(4) pop t3 pop t4 t5 = t3 + t4 push t5 push flags t5 pop DWORD Scratch:[Dword(52)] pop t6 pop t7 t8 = t6 + t7 push t8 push flags t8 pop DWORD Scratch:[Dword(12)] pop esp All but two instructions are pushes or pops, and the pushes can be easily matched up with the pops. Tracking the stack pointer, we see that, for example, t3 = Dword(4). A simple analysis allows us to "optimize away" the push/pop pairs into assignment statements. Simply iterate through each instruction in a basic block and keep a stack describing the source of each push. For every pop, ensure that the sizes match and record the location of the corresponding push. We wish to replace the pop with an assignment to the popped expression from the pushed expression, as in t3 = Dword(4) t4 = esp t7 = Dword(-88) With the stack aspects removed, we are left with a more conventional listing containing many assignment statements. This optimization substantially reduces the number of instructions in a given basic block (~40% for the linked example) and opens the door for other optimizations. The newly optimized code is eight lines, roughly half of the original: t3 = Dword(4) t4 = esp t5 = t3 + t4 DWORD Scratch:[Dword(52)] = flags t5 t6 = t5 t7 = Dword(-88) t8 = t6 + t7 DWORD Scratch:[Dword(12)] = flags t8 esp = t8 A complete listing of the unoptimized IR versus the one with the stack machine features removed is here, which should be perused before proceeding. Now we turn our attention to the temporary variables and the scratch area. Recall that the former were not part of the pre-protected x86 code, nor the VMProtect bytecode -- they were introduced in order to ease the IR translation. The latter is part of the VMProtect bytecode, but was not part of the original pre-protected x86 code. Since these are not part of the languages we are modelling, we shall eliminate them wholesale. On a high level, we treat each temporary variable, each byte of the scratch space, and each register as being a variable defined within a basic block, and then eliminate the former two via the compiler optimizations previously discussed. Looking again at the last snippet of IR, we can see several areas for improvement. First, consider the variable t6. It is clearly just a copy of t5, neither of which are redefined before the next use in the assignment to t8. Copy propagation will replace variable t6 with t5 and eliminate the former. More generally, t3, t4, and t7 contain either constants or values that are not modified between their uses. Constant and copy propagation will substitute the assignments to these variables in for their uses and eliminate them. The newly optimized code is a slender three lines compared to the original 15; we have removed 80% of the IR for the running example. DWORD Scratch:[Dword(52)] = flags Dword(4) + esp esp = Dword(4) + esp + Dword(-88) DWORD Scratch:[Dword(12)] = flags Dword(4) + esp + Dword(-88) The side-by-side comparison can be found here. The IR now looks closer to x86, with the exception that the results of computations are being stored in the scratch area, not into registers. As before, we apply dead-store elimination, copy and constant propagation to the scratch area, removing dependence upon it entirely in the process. See here for a comparison with the last phase. Here is a comparison of the final, optimized code against the original x86: push ebp push ebp ebp = esp mov ebp, esp push Dword(-1) push 0FFFFFFFFh push Dword(4525664) push 450E60h push Dword(4362952) push offset sub_4292C8 eax = DWORD FS:[Dword(0)] mov eax, large fs:0 push eax push eax DWORD FS:[Dword(0)] = esp mov large fs:0, esp eflags = flags esp + Dword(-88) esp = esp + Dword(-88) add esp, 0FFFFFFA8h push ebx push ebx push esi push esi push edi push edi DWORD SS:[Dword(-24) + ebp] = esp mov [ebp-18h], esp call DWORD [Dword(4590300)] call dword ptr ds:unk_460ADC vmreturn Dword(0) + Dword(4638392) Code generation is an afterthought. 2008-8-7 10:12 0
Bughoho 雪币： 1946 活跃值： (283) 能力值： (RANK：330 ) 在线值：发帖 72 回帖 1217 粉丝 27 关注私信	Bughoho 8 5 楼 vmp这次要蛋疼了 2008-8-7 10:15 0
yingyue 雪币： 1844 活跃值： (35) 能力值： ( LV3，RANK：30 ) 在线值：发帖 16 回帖 2595 粉丝 3 关注私信	yingyue 6 楼谢，收下，。。。。。。。。。。。。。。。。 2008-8-7 11:21 0
heXer 雪币： 254 活跃值： (126) 能力值： ( LV8，RANK：130 ) 在线值：发帖 12 回帖 802 粉丝 2 关注私信	heXer 3 7 楼比发哥的挫啊 2008-8-7 11:54 0
lunglungyu 雪币： 207 活跃值： (10) 能力值： ( LV4，RANK：50 ) 在线值：发帖 5 回帖 339 粉丝 0 关注私信	lunglungyu 1 8 楼佔楼将来再看。 2008-8-7 14:05 0
壹只老虎雪币： 112 活跃值： (16) 能力值： ( LV9，RANK：290 ) 在线值：发帖 38 回帖 709 粉丝 3 关注私信	壹只老虎 7 9 楼广告位出租! 2008-8-7 14:26 0
wangdell 雪币： 87 活跃值： (47) 能力值： ( LV12，RANK：250 ) 在线值：发帖 52 回帖 328 粉丝 3 关注私信	wangdell 6 10 楼虽然很详细，但这也不是新东西了。类似的工作我们已经做到了,opcode我们也已经全部解开了，块结构和RISC指令集也基本搞清楚。代码还原也手工做到了。虽然我们之前的分析只是1.2x，但和1.5的demo也区别不大。这里好像是demo版，对release版如此分析不是这么容易了。这还是白盒分析法，费体力和精力，除非有完美工具。以前用白盒分析，很辛苦，而vm多样，如何和机器对抗? 对vm,目前我觉得黑盒分析是最好的方法。不知fg的工具是何样的。 2008-8-7 15:37 0
Bughoho 雪币： 1946 活跃值： (283) 能力值： (RANK：330 ) 在线值：发帖 72 回帖 1217 粉丝 27 关注私信	Bughoho 8 11 楼我只是觉得RolfRolles的文章写的比之前出现的文章都更容易懂一点，这是我转帖的目的。既然你对vmp分析得如此透彻，你也可以做出来，不必指望fg的工具 2008-8-7 16:01 0
cd37ycs 雪币： 299 活跃值： (1805) 能力值： ( LV2，RANK：10 ) 在线值：发帖 4 回帖 411 粉丝 0 关注私信	cd37ycs 12 楼看样子再过个一年半载工具就会公开了。 2008-8-7 16:07 0
wangdell 雪币： 87 活跃值： (47) 能力值： ( LV12，RANK：250 ) 在线值：发帖 52 回帖 328 粉丝 3 关注私信	wangdell 6 13 楼文章不错，只是我期望过高，看后有点小失望。工具一是没时间，每天能有2个小时让我搞crack就不错；二是没水平，如果有这水平，我会先考虑写个vm。 vm千变万化，如果写修复工具只能针对某一特定vm或某一特定版本，还未必完美，是个费力且收效小的活。即使出了成品，随即该vmprotect过时(有了破解工具，就没人用了），vm立即升级，工具也会立即消亡，昙花一现。对fg的工具，是有点好奇和怀疑，会有人写如此工具吗？会完美吗？会通用吗？另如我是fg，即使有此工具，我也不会公布出来，那样该版本vmprotect随即死亡，该工具也立刻无用。 2008-8-7 16:37 0
海风月影雪币： 7327 活跃值： (3813) 能力值： (RANK：1130 ) 在线值：发帖 93 回帖 2310 粉丝 123 关注私信	海风月影 22 14 楼我一直认为如果vmprotect的作者应该来中国接某些业务，可能比他卖vmp还要赚钱 2008-8-7 16:41 0
KuNgBiM 雪币： 817 活跃值： (1927) 能力值： ( LV12，RANK：2670 ) 在线值：发帖 500 回帖 2517 粉丝 22 关注私信	KuNgBiM 66 15 楼呵呵深入人心！ 2008-8-7 19:46 0
cpuArt 雪币： 150 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 7 回帖 58 粉丝 0 关注私信	cpuArt 16 楼占楼看看~~ 2008-8-7 20:42 0
鸡蛋壳雪币： 427 活跃值： (412) 能力值： ( LV2，RANK：10 ) 在线值：发帖 456 回帖 3147 粉丝 9 关注私信	鸡蛋壳 17 楼 VMP并不能有效的阻止软件破解。 2008-8-7 21:30 0
游侠201 雪币： 201 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 0 回帖 11 粉丝 0 关注私信	游侠201 18 楼广告位出租! 2008-8-7 21:39 0
yangjt 雪币： 609 活跃值： (247) 能力值： ( LV12，RANK：441 ) 在线值：发帖 51 回帖 459 粉丝 1 关注私信	yangjt 10 19 楼强大……19楼广告位招租…… 2008-8-7 21:55 0
cater 雪币： 109 活跃值： (608) 能力值： ( LV12，RANK：220 ) 在线值：发帖 58 回帖 415 粉丝 3 关注私信	cater 5 20 楼高级逆向虚拟打人七夕膜拜七次 2008-8-7 22:06 0
uvbs 雪币： 30 活跃值： (987) 能力值： ( LV4，RANK：50 ) 在线值：发帖 4 回帖 200 粉丝 2 关注私信	uvbs 21 楼 a basic introduction to stack machine 2008-8-8 00:08 0
ucantseeme 雪币： 442 活跃值： (108) 能力值： ( LV2，RANK：10 ) 在线值：发帖 46 回帖 990 粉丝 1 关注私信	ucantseeme 22 楼插入先,最近想搞VMP 2008-9-6 08:23 0
Aker 雪币： 2134 活跃值： (14) 能力值： (RANK：170 ) 在线值：发帖 50 回帖 773 粉丝 3 关注私信	Aker 4 23 楼研究研究~~上次没mark 2008-9-7 20:47 0
q3 watcher 雪币： 215 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 23 回帖 902 粉丝 0 关注私信	q3 watcher 24 楼没有什么能阻止强大的蛋蛋 2008-9-8 08:14 0
zcfxdsh 雪币： 200 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 5 回帖 20 粉丝 0 关注私信	zcfxdsh 25 楼看来VMP确实很利害了。 2008-10-13 16:29 0
	游客登录 \| 注册方可回帖回帖表情雪币赚取及消费高级回复

[转帖]VMProtect 逆向分析

账号登录 验证码登录

账号登录

验证码登录