首页
社区
课程
招聘
[原创]IDA Python 去混淆
发表于: 2023-8-29 10:57 17967

[原创]IDA Python 去混淆

2023-8-29 10:57
17967

我们可以认为一个程序的代码结构如下图所示:

一个程序由多个函数(function)组成,而每个函数由多个分支(branch)组成,对于函数和分支我们做如下定义:

因此去混淆的时候我们可以有如下代码框架,即先 bfs 函数,然后在每个函数内部再 bfs 所有分支。在 bfs 的过程中将已去混淆的代码拼接起来。这样做的好处是同一个函数的代码尽可能放在一起,ida 在反编译的时候容易识别。

代码的位置移动时,原本的 CALL 和 JCC 等跳转指令要想跳转到原来的地方需要进行指令修正,这个可以借助 keystone-engine 和 capstone 来完成。

然而在完成去混淆后程序中的绝大多数代码都移动了位置,因此程序中所有的 CALL 和 JCC 等跳转指令跳转的地址需要进行修正,也就是重定位。

对于指令修正我们可以通过并查集来维护。

一个程序的跳转指令可以看做是上图左边的结构。即存在一个跳转指令跳转到另一个跳转指令的情况。通过并查集我们可以将指令 A,B,C,D,E 的真实地址都修正为指令 E 的真实地址

在使用并查集维护重定位的时候需要注意以下几点:

附件下载链接

观察发现程序由下面的代码块构成:

分析该代码块的执行过程,发现本质是在一个 switch 中查找实际指令。该代码块可由 lea ecx, [esp+4] 指令代替。

首先,我们需要将程序中的代码块提取出来,然后记录几个有用的信息:

在提取代码块的有效信息的同时也可以检测该代码块是否有效,因此分析发现程序中会在代码块直接插入一些有实际功能的代码。

在提取出代码块之后利用提取到的有效信息可以在 call_target 中查找代码块对应的实际代码。这里有几个特殊情况:

这里涉及到了维护重定位的并查集 RelocDSU ,对应代码如下。在 get 函数中如果遇到了 jmp 指令且操作数是立即数就路径压缩到跳转的地址,直到地址在 .got.plt 或者指令不是 jmp 指令。另外判断是否是已处理代码是根据地址对应的最终地址是否不在 .text 段。

接下来就是考虑如何提取出一个 branch 的代码了。前面提到过程序中会在代码块直接插入一些有实际功能的代码,因此需要借助 try:...except:...assert 来处理。除此之外这里还有几个特殊情况:

能够处理 branch 后,我们就可以 bfs 依次处理所有的 function 和 branch 了,这里还有几个特殊情况:

在完成代码去混淆之后需要对代码进行重定位,重定位的时候需要注意 jmp 指令长度的变化。

最后去混淆后的 switch 不能被 ida 正常识别出来,具体原因是前面获取返回地址 eip 的函数被 patch 成了 mov reg, xxx 指令,导致其与编译器默认编译出的汇编不同(程序开启了 PIE,直接访问跳转表的地址 ida 不能正确识别),因此需要将这里的代码重新 patch 回去。

同时为了不影响原本程序中的数据,这里我将修复的跳转表放到了其他位置。另外还有两个字符串全局变量也移动到了正确位置。

最终去混淆脚本如下:

附件下载链接

首先将 0x1400100740x140017EFA140018C67 起始处的数据转换为汇编。

观察汇编,发现很多代码块之间相互跳转,因此先按照 retn 划分代码块。通过对代码块的观察,发现这些代码块按照 call $+5;pop rax(即 E8 00 00 00 00 58 ) 的出现次数可以分为三种:

出现 0 次:

本质上是 其它操作 + retn

出现 1 次:

这种代码块本质为 其它操作 + jmp target ,注意 其它操作 中可能包含 branch 。

出现 2 次:

这个可以看做 2 个出现 1 次的代码块两个拼在一起,其中前面一个代码块去掉 retn 。执行完前面一个代码块后由于没有 retn ,因此 target1 留在栈中。执行第 2 个代码块跳转到 target2 执行 ,在 target2 代码块返回时会返回到 target1 。因此这种代码块本质上相当于 其它操作 + call target2 且下一个要执行的代码块为 target1

我们定义代码块 Block 几个关键信息:

get_block 函数可以获取给定地址处的代码块并提取相关信息。代码块中可能有 push xxx;pop xxx; 这样的无意义指令,可以通过栈模拟来去除。

能够获取代码块信息之后就可以 bfs 函数以及函数中的所有分支,提取出汇编代码并写入 newcode 段。这里需要注意以下几点:

最后对代码进行重定位,需要注意的是代码块中的有效指令中也可能有 call 指令,这里 call 调用的是一个类似 plt 表的结构,会直接跳转到导入表中的函数地址表指向的函数,需要特判这种情况。

最后完整代码:

func_queue = Queue()
func_queue.put(entry_point)
 
while not func_queue.empty():
    func_address = func_queue.get()
   
    branch_queue = Queue()
    branch_queue.put(func_address)
    while not branch_queue.empty():
        branch_address = branch_queue.get()
        ... # 去混淆代码
            if idc.print_insn_mnem(ea) == 'call': # CALL function
                func_queue.put(call_target)
            elif idc.print_insn_mnem(ea)[0] == 'j' # JCC branch
                branch_queue.put(jcc_target)
... # 重定位代码
func_queue = Queue()
func_queue.put(entry_point)
 
while not func_queue.empty():
    func_address = func_queue.get()
   
    branch_queue = Queue()
    branch_queue.put(func_address)
    while not branch_queue.empty():
        branch_address = branch_queue.get()
        ... # 去混淆代码
            if idc.print_insn_mnem(ea) == 'call': # CALL function
                func_queue.put(call_target)
            elif idc.print_insn_mnem(ea)[0] == 'j' # JCC branch
                branch_queue.put(jcc_target)
... # 重定位代码
def mov_code(ea, new_code_ea):
    return asm(disasm(idc.get_bytes(ea, idc.get_item_size(ea)), ea), new_code_ea)
def mov_code(ea, new_code_ea):
    return asm(disasm(idc.get_bytes(ea, idc.get_item_size(ea)), ea), new_code_ea)
.text:000048F4 pushf
.text:000048F5 pusha
.text:000048F6 mov     cl, 3Fh ; '?'
.text:000048F8 call    sub_44FA
.text:000048F8
.text:000048FD pop     eax
.text:000048F4 pushf
.text:000048F5 pusha
.text:000048F6 mov     cl, 3Fh ; '?'
.text:000048F8 call    sub_44FA
.text:000048F8
.text:000048FD pop     eax
class Block:
    def __init__(self, start_ea, end_ea, imm, reg, call_target):
        self.start_ea = start_ea
        self.end_ea = end_ea
        self.imm = imm
        self.reg = reg
        self.call_target = call_target
         
def get_block(start_ea):
    global imm, reg, call_target
    mnem_list = ['pushf', 'pusha', 'mov', 'call', 'pop']
    ea = start_ea
    for i in range(5):
        mnem = idc.print_insn_mnem(ea)
        assert mnem == mnem_list[i]
        if mnem == 'mov':
            imm = idc.get_operand_value(ea, 1)
            reg = idc.print_operand(ea, 0)
        elif mnem == 'call':
            call_target = idc.get_operand_value(ea, 0)
        ea += idc.get_item_size(ea)
    return Block(start_ea, ea, imm, reg, call_target)
class Block:
    def __init__(self, start_ea, end_ea, imm, reg, call_target):
        self.start_ea = start_ea
        self.end_ea = end_ea
        self.imm = imm
        self.reg = reg
        self.call_target = call_target
         
def get_block(start_ea):
    global imm, reg, call_target
    mnem_list = ['pushf', 'pusha', 'mov', 'call', 'pop']
    ea = start_ea
    for i in range(5):
        mnem = idc.print_insn_mnem(ea)
        assert mnem == mnem_list[i]
        if mnem == 'mov':
            imm = idc.get_operand_value(ea, 1)
            reg = idc.print_operand(ea, 0)
        elif mnem == 'call':
            call_target = idc.get_operand_value(ea, 0)
        ea += idc.get_item_size(ea)
    return Block(start_ea, ea, imm, reg, call_target)
.text:000045CC popa
.text:000045CD popf
.text:000045CE pushf
.text:000045CF pusha
.text:000045D0 call    dec_index
.text:000045D0
.text:000045D5 popa
.text:000045D6 popf
.text:000045D7 retn
.text:000045CC popa
.text:000045CD popf
.text:000045CE pushf
.text:000045CF pusha
.text:000045D0 call    dec_index
.text:000045D0
.text:000045D5 popa
.text:000045D6 popf
.text:000045D7 retn
def get_real_code(block, new_code_ea):
    ea = block.call_target
    while True:
        if idc.print_insn_mnem(ea) == 'cmp':
            reg = idc.print_operand(ea, 0)
            imm = idc.get_operand_value(ea, 1)
            if reg == block.reg and imm == block.imm:
                ea += idc.get_item_size(ea)
                break
        ea += idc.get_item_size(ea)
 
    # 在 cmp 判断找到对应位置后会依次执行 jnz,popa,popf 三条指令
    assert idc.print_insn_mnem(ea) == 'jnz'
    ea += idc.get_item_size(ea)
 
    assert idc.print_insn_mnem(ea) == 'popa'
    ea += idc.get_item_size(ea)
 
    assert idc.print_insn_mnem(ea) == 'popf'
    ea += idc.get_item_size(ea)
 
    if idc.print_insn_mnem(ea) == 'pushf'# 第一种特殊情况,实际是 ret 指令。
        return True, asm('ret')
 
    new_code = b''
    while True:
        if idc.print_insn_mnem(ea) == 'jmp'# 第二种特殊情况,跳转过去可能还会有几条实际功能指令。
            jmp_ea = idc.get_operand_value(ea, 0)
            if idc.print_insn_mnem(jmp_ea) == 'pushf':
                break
            ea = jmp_ea
        else:
            code = mov_code(ea, new_code_ea)
            new_code += code
            new_code_ea += len(code)
            ea += get_item_size(ea)
    return False, new_code
def get_real_code(block, new_code_ea):
    ea = block.call_target
    while True:
        if idc.print_insn_mnem(ea) == 'cmp':
            reg = idc.print_operand(ea, 0)
            imm = idc.get_operand_value(ea, 1)
            if reg == block.reg and imm == block.imm:
                ea += idc.get_item_size(ea)
                break
        ea += idc.get_item_size(ea)
 
    # 在 cmp 判断找到对应位置后会依次执行 jnz,popa,popf 三条指令
    assert idc.print_insn_mnem(ea) == 'jnz'
    ea += idc.get_item_size(ea)
 
    assert idc.print_insn_mnem(ea) == 'popa'
    ea += idc.get_item_size(ea)
 
    assert idc.print_insn_mnem(ea) == 'popf'
    ea += idc.get_item_size(ea)
 
    if idc.print_insn_mnem(ea) == 'pushf'# 第一种特殊情况,实际是 ret 指令。
        return True, asm('ret')
 
    new_code = b''
    while True:
        if idc.print_insn_mnem(ea) == 'jmp'# 第二种特殊情况,跳转过去可能还会有几条实际功能指令。
            jmp_ea = idc.get_operand_value(ea, 0)
            if idc.print_insn_mnem(jmp_ea) == 'pushf':
                break
            ea = jmp_ea
        else:
            code = mov_code(ea, new_code_ea)
            new_code += code
            new_code_ea += len(code)
            ea += get_item_size(ea)
    return False, new_code
class RelocDSU:
 
    def __init__(self):
        self.reloc = {}
 
    def get(self, ea):
        if ea not in self.reloc:
            if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:
                jmp_ea = idc.get_operand_value(ea, 0)
 
                if idc.get_segm_name(jmp_ea) == '.got.plt':
                    self.reloc[ea] = ea
                    return self.reloc[ea], False
 
                self.reloc[ea], need_handle = self.get(idc.get_operand_value(ea, 0))
                return self.reloc[ea], need_handle
            else:
                self.reloc[ea] = ea
        if self.reloc[ea] != ea: self.reloc[ea] = self.get(self.reloc[ea])[0]
        return self.reloc[ea], idc.get_segm_name(self.reloc[ea]) == '.text'
 
    def merge(self, ea, reloc_ea):
        self.reloc[self.get(ea)[0]] = self.get(reloc_ea)[0]
 
 
reloc = RelocDSU()
class RelocDSU:
 
    def __init__(self):
        self.reloc = {}
 
    def get(self, ea):
        if ea not in self.reloc:
            if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:
                jmp_ea = idc.get_operand_value(ea, 0)
 
                if idc.get_segm_name(jmp_ea) == '.got.plt':
                    self.reloc[ea] = ea
                    return self.reloc[ea], False
 
                self.reloc[ea], need_handle = self.get(idc.get_operand_value(ea, 0))
                return self.reloc[ea], need_handle
            else:
                self.reloc[ea] = ea
        if self.reloc[ea] != ea: self.reloc[ea] = self.get(self.reloc[ea])[0]
        return self.reloc[ea], idc.get_segm_name(self.reloc[ea]) == '.text'
 
    def merge(self, ea, reloc_ea):
        self.reloc[self.get(ea)[0]] = self.get(reloc_ea)[0]
 
 
reloc = RelocDSU()
def handle_one_branch(branch_address, new_code_ea):
    new_code = b''
    ea = branch_address
    while True:
        try:
            block = get_block(ea)
            is_ret, real_code = get_real_code(block, new_code_ea)
            reloc.merge(ea, new_code_ea)
            ea = block.end_ea
            new_code_ea += len(real_code)
            new_code += real_code
            if is_ret: break
        except:
            get_eip_func = {0x900: 'ebx', 0x435c: 'eax'}
            if idc.print_insn_mnem(ea) == 'call' and get_operand_value(ea, 0) in get_eip_func:
                reloc.merge(ea, new_code_ea)
                real_code = asm('mov %s, 0x%x' % (get_eip_func[get_operand_value(ea, 0)], ea + 5), new_code_ea)
            else:
                if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:
                    reloc.merge(new_code_ea, ea)
                else:
                    reloc.merge(ea, new_code_ea)
                real_code = mov_code(ea, new_code_ea)
 
            new_code += real_code
            if real_code == asm('ret'): break
            new_code_ea += len(real_code)
            if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:  # jmp reg is a swtich
                jmp_ea = idc.get_operand_value(ea, 0)
                if reloc.get(jmp_ea)[1] == False: break  # 跳回之前的代码说明是个循环
                ea = reloc.get(jmp_ea)[0]
            else:
                ea += get_item_size(ea)
    return new_code
def handle_one_branch(branch_address, new_code_ea):
    new_code = b''
    ea = branch_address
    while True:
        try:
            block = get_block(ea)
            is_ret, real_code = get_real_code(block, new_code_ea)
            reloc.merge(ea, new_code_ea)
            ea = block.end_ea
            new_code_ea += len(real_code)
            new_code += real_code
            if is_ret: break
        except:
            get_eip_func = {0x900: 'ebx', 0x435c: 'eax'}
            if idc.print_insn_mnem(ea) == 'call' and get_operand_value(ea, 0) in get_eip_func:
                reloc.merge(ea, new_code_ea)
                real_code = asm('mov %s, 0x%x' % (get_eip_func[get_operand_value(ea, 0)], ea + 5), new_code_ea)
            else:
                if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:
                    reloc.merge(new_code_ea, ea)
                else:
                    reloc.merge(ea, new_code_ea)
                real_code = mov_code(ea, new_code_ea)
 
            new_code += real_code
            if real_code == asm('ret'): break
            new_code_ea += len(real_code)
            if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:  # jmp reg is a swtich
                jmp_ea = idc.get_operand_value(ea, 0)
                if reloc.get(jmp_ea)[1] == False: break  # 跳回之前的代码说明是个循环
                ea = reloc.get(jmp_ea)[0]
            else:
                ea += get_item_size(ea)
    return new_code
func_queue = Queue()
func_queue.put(entry_point)
 
while not func_queue.empty():
    func_address = func_queue.get()
    if reloc.get(func_address)[1] == False: continue
    reloc.merge(func_address, new_code_ea)
    branch_queue = Queue()
    branch_queue.put(func_address)
    if func_address == 0x4148# 特判 0x4148 地址处的函数,读取跳转表。
        assert new_code_ea == 0x963d0
        for eax in range(0x20):
            jmp_target = (ida_bytes.get_dword(jmp_table[0] + eax * 4) + jmp_table[1]) & 0xFFFFFFFF
            new_jmp_target, need_handle = reloc.get(jmp_target)
            if need_handle: branch_queue.put(jmp_target)
 
    while not branch_queue.empty():
        branch_address = branch_queue.get()
        new_code = handle_one_branch(branch_address, new_code_ea)
        ida_bytes.patch_bytes(new_code_ea, new_code)
 
        # 当前 branch 去完混淆之后需要遍历代码找到 call 和 jmp 指令从而找到其他的 function 和 branch 。
        ea = new_code_ea
        while ea < new_code_ea + len(new_code):
            idc.create_insn(ea)
            if idc.print_insn_mnem(ea) == 'call':
                call_target, need_handle = reloc.get(get_operand_value(ea, 0))
                if need_handle: func_queue.put(call_target)
            elif idc.print_insn_mnem(ea)[0] == 'j' and idc.get_operand_type(ea, 0) != idc.o_reg:
                jcc_target, need_handle = reloc.get(get_operand_value(ea, 0))
                if need_handle == True:
                    branch_queue.put(jcc_target)
            ea += get_item_size(ea)
        new_code_ea += len(new_code)
func_queue = Queue()
func_queue.put(entry_point)
 
while not func_queue.empty():
    func_address = func_queue.get()
    if reloc.get(func_address)[1] == False: continue
    reloc.merge(func_address, new_code_ea)
    branch_queue = Queue()
    branch_queue.put(func_address)
    if func_address == 0x4148# 特判 0x4148 地址处的函数,读取跳转表。
        assert new_code_ea == 0x963d0
        for eax in range(0x20):
            jmp_target = (ida_bytes.get_dword(jmp_table[0] + eax * 4) + jmp_table[1]) & 0xFFFFFFFF
            new_jmp_target, need_handle = reloc.get(jmp_target)
            if need_handle: branch_queue.put(jmp_target)
 
    while not branch_queue.empty():
        branch_address = branch_queue.get()
        new_code = handle_one_branch(branch_address, new_code_ea)
        ida_bytes.patch_bytes(new_code_ea, new_code)
 
        # 当前 branch 去完混淆之后需要遍历代码找到 call 和 jmp 指令从而找到其他的 function 和 branch 。
        ea = new_code_ea
        while ea < new_code_ea + len(new_code):
            idc.create_insn(ea)
            if idc.print_insn_mnem(ea) == 'call':
                call_target, need_handle = reloc.get(get_operand_value(ea, 0))
                if need_handle: func_queue.put(call_target)
            elif idc.print_insn_mnem(ea)[0] == 'j' and idc.get_operand_type(ea, 0) != idc.o_reg:
                jcc_target, need_handle = reloc.get(get_operand_value(ea, 0))
                if need_handle == True:
                    branch_queue.put(jcc_target)
            ea += get_item_size(ea)
        new_code_ea += len(new_code)
ea = new_code_start
while ea < new_code_ea:
    idc.create_insn(ea)
    mnem = idc.print_insn_mnem(ea)
 
    if mnem == 'call':
        call_target, need_handle = reloc.get(get_operand_value(ea, 0))
        assert need_handle == False
        ida_bytes.patch_bytes(ea, asm('call 0x%x' % (call_target), ea))
    elif mnem[0] == 'j' and idc.get_operand_type(ea, 0) != idc.o_reg:
        jcc_target, need_handle = reloc.get(get_operand_value(ea, 0))
        assert need_handle == False
        ida_bytes.patch_bytes(ea, asm('%s 0x%x' % (mnem, jcc_target), ea).ljust(idc.get_item_size(ea), b'\x90'))
    elif mnem == 'pushf':
        ida_bytes.patch_bytes(ea, b'\x90' * 9)
        ea += 9
        continue
    ea += get_item_size(ea)
ea = new_code_start
while ea < new_code_ea:
    idc.create_insn(ea)
    mnem = idc.print_insn_mnem(ea)
 
    if mnem == 'call':
        call_target, need_handle = reloc.get(get_operand_value(ea, 0))
        assert need_handle == False
        ida_bytes.patch_bytes(ea, asm('call 0x%x' % (call_target), ea))
    elif mnem[0] == 'j' and idc.get_operand_type(ea, 0) != idc.o_reg:
        jcc_target, need_handle = reloc.get(get_operand_value(ea, 0))
        assert need_handle == False
        ida_bytes.patch_bytes(ea, asm('%s 0x%x' % (mnem, jcc_target), ea).ljust(idc.get_item_size(ea), b'\x90'))
    elif mnem == 'pushf':
        ida_bytes.patch_bytes(ea, b'\x90' * 9)
        ea += 9
        continue
    ea += get_item_size(ea)
new_jmp_table = (0xA6000 - 0x2D54, 0xA6000)
 
# 移动并修复跳转表
for eax in range(0x20):
    jmp_target = (ida_bytes.get_dword(jmp_table[0] + eax * 4) + jmp_table[1]) & 0xFFFFFFFF
    new_jmp_target, need_handle = reloc.get(jmp_target)
    assert need_handle == False
    ida_bytes.patch_dword(new_jmp_table[0] + eax * 4, (new_jmp_target - new_jmp_table[1]) & 0xFFFFFFFF)
 
need_patch_addr = 0x963D7
ida_bytes.patch_bytes(need_patch_addr, asm('call 0x900;add ebx, 0x%x' % (new_jmp_table[1] - (need_patch_addr + 5)), need_patch_addr))  # 修复指令
ida_bytes.patch_bytes(new_jmp_table[1] - 0x2d7a, ida_bytes.get_bytes(jmp_table[1] - 0x2d7a, 0x26))  # 复制字符串到正确位置
new_jmp_table = (0xA6000 - 0x2D54, 0xA6000)
 
# 移动并修复跳转表
for eax in range(0x20):
    jmp_target = (ida_bytes.get_dword(jmp_table[0] + eax * 4) + jmp_table[1]) & 0xFFFFFFFF
    new_jmp_target, need_handle = reloc.get(jmp_target)
    assert need_handle == False
    ida_bytes.patch_dword(new_jmp_table[0] + eax * 4, (new_jmp_target - new_jmp_table[1]) & 0xFFFFFFFF)
 
need_patch_addr = 0x963D7
ida_bytes.patch_bytes(need_patch_addr, asm('call 0x900;add ebx, 0x%x' % (new_jmp_table[1] - (need_patch_addr + 5)), need_patch_addr))  # 修复指令
ida_bytes.patch_bytes(new_jmp_table[1] - 0x2d7a, ida_bytes.get_bytes(jmp_table[1] - 0x2d7a, 0x26))  # 复制字符串到正确位置
from queue import *
import ida_bytes
from idc import *
import idc
from keystone import *
from capstone import *
 
asmer = Ks(KS_ARCH_X86, KS_MODE_32)
disasmer = Cs(CS_ARCH_X86, CS_MODE_32)
 
 
def disasm(machine_code, addr=0):
    l = ""
    for i in disasmer.disasm(machine_code, addr):
        l += "{:8s} {};\n".format(i.mnemonic, i.op_str)
    return l.strip('\n')
 
 
def asm(asm_code, addr=0):
    l = b''
    for i in asmer.asm(asm_code, addr)[0]:
        l += bytes([i])
    return l
 
 
def print_asm(ea):
    print(disasm(idc.get_bytes(ea, idc.get_item_size(ea)), ea))
 
 
class RelocDSU:
 
    def __init__(self):
        self.reloc = {}
 
    def get(self, ea):
        if ea not in self.reloc:
            if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:
                jmp_ea = idc.get_operand_value(ea, 0)
 
                if idc.get_segm_name(jmp_ea) == '.got.plt':
                    self.reloc[ea] = ea
                    return self.reloc[ea], False
 
                self.reloc[ea], need_handle = self.get(idc.get_operand_value(ea, 0))
                return self.reloc[ea], need_handle
            else:
                self.reloc[ea] = ea
        if self.reloc[ea] != ea: self.reloc[ea] = self.get(self.reloc[ea])[0]
        return self.reloc[ea], idc.get_segm_name(self.reloc[ea]) == '.text'
 
    def merge(self, ea, reloc_ea):
        self.reloc[self.get(ea)[0]] = self.get(reloc_ea)[0]
 
 
reloc = RelocDSU()
 
 
class Block:
    def __init__(self, start_ea, end_ea, imm, reg, call_target):
        self.start_ea = start_ea
        self.end_ea = end_ea
        self.imm = imm
        self.reg = reg
        self.call_target = call_target
 
 
def mov_code(ea, new_code_ea):
    return asm(disasm(idc.get_bytes(ea, idc.get_item_size(ea)), ea), new_code_ea)
 
 
def get_real_code(block, new_code_ea):
    ea = block.call_target
    while True:
        if idc.print_insn_mnem(ea) == 'cmp':
            reg = idc.print_operand(ea, 0)
            imm = idc.get_operand_value(ea, 1)
            if reg == block.reg and imm == block.imm:
                ea += idc.get_item_size(ea)
                break
        ea += idc.get_item_size(ea)
 
    # 在 cmp 判断找到对应位置后会依次执行 jnz,popa,popf 三条指令
    assert idc.print_insn_mnem(ea) == 'jnz'
    ea += idc.get_item_size(ea)
 
    assert idc.print_insn_mnem(ea) == 'popa'
    ea += idc.get_item_size(ea)
 
    assert idc.print_insn_mnem(ea) == 'popf'
    ea += idc.get_item_size(ea)
 
    if idc.print_insn_mnem(ea) == 'pushf'# 第一种特殊情况,实际是 ret 指令。
        return True, asm('ret')
 
    new_code = b''
    while True:
        if idc.print_insn_mnem(ea) == 'jmp'# 第二种特殊情况,跳转过去可能还会有几条实际功能指令。
            jmp_ea = idc.get_operand_value(ea, 0)
            if idc.print_insn_mnem(jmp_ea) == 'pushf':
                break
            ea = jmp_ea
        else:
            code = mov_code(ea, new_code_ea)
            new_code += code
            new_code_ea += len(code)
            ea += get_item_size(ea)
    return False, new_code
 
 
def get_block(start_ea):
    global imm, reg, call_target
    mnem_list = ['pushf', 'pusha', 'mov', 'call', 'pop']
    ea = start_ea
    for i in range(5):
        mnem = idc.print_insn_mnem(ea)
        assert mnem == mnem_list[i]
        if mnem == 'mov':
            imm = idc.get_operand_value(ea, 1)
            reg = idc.print_operand(ea, 0)
        elif mnem == 'call':
            call_target = idc.get_operand_value(ea, 0)
        ea += idc.get_item_size(ea)
    return Block(start_ea, ea, imm, reg, call_target)
 
 
def handle_one_branch(branch_address, new_code_ea):
    new_code = b''
    ea = branch_address
    while True:
        try:
            block = get_block(ea)
            is_ret, real_code = get_real_code(block, new_code_ea)
            reloc.merge(ea, new_code_ea)
            ea = block.end_ea
            new_code_ea += len(real_code)
            new_code += real_code
            if is_ret: break
        except:
            get_eip_func = {0x900: 'ebx', 0x435c: 'eax'}
            if idc.print_insn_mnem(ea) == 'call' and get_operand_value(ea, 0) in get_eip_func:
                reloc.merge(ea, new_code_ea)
                real_code = asm('mov %s, 0x%x' % (get_eip_func[get_operand_value(ea, 0)], ea + 5), new_code_ea)
            else:
                if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:
                    reloc.merge(new_code_ea, ea)
                else:
                    reloc.merge(ea, new_code_ea)
                real_code = mov_code(ea, new_code_ea)
 
            new_code += real_code
            if real_code == asm('ret'): break
            new_code_ea += len(real_code)
            if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:  # jmp reg is a swtich
                jmp_ea = idc.get_operand_value(ea, 0)
                if reloc.get(jmp_ea)[1] == False: break  # 跳回之前的代码说明是个循环
                ea = reloc.get(jmp_ea)[0]
            else:
                ea += get_item_size(ea)
    return new_code
 
 
def solve():
    entry_point = 0x48F4
    new_code_start = 0x96150
    new_code_ea = new_code_start
 
    jmp_table = (0x892ac, 0x8c000# [0x8c000 + (eax>>2) - 0x2d54] + 0x8c000
 
    for _ in range(0x10000): idc.del_items(new_code_ea + _)
    ida_bytes.patch_bytes(new_code_ea, 0x10000 * b'\x90')
 
    func_queue = Queue()
    func_queue.put(entry_point)
 
    while not func_queue.empty():
        func_address = func_queue.get()
        if reloc.get(func_address)[1] == False: continue
        reloc.merge(func_address, new_code_ea)
        branch_queue = Queue()
        branch_queue.put(func_address)
        if func_address == 0x4148# 特判 0x4148 地址处的函数,读取跳转表。
            assert new_code_ea == 0x963d0
            for eax in range(0x20):
                jmp_target = (ida_bytes.get_dword(jmp_table[0] + eax * 4) + jmp_table[1]) & 0xFFFFFFFF
                new_jmp_target, need_handle = reloc.get(jmp_target)
                if need_handle: branch_queue.put(jmp_target)
 
        while not branch_queue.empty():
            branch_address = branch_queue.get()
            new_code = handle_one_branch(branch_address, new_code_ea)
            ida_bytes.patch_bytes(new_code_ea, new_code)
 
            # 当前 branch 去完混淆之后需要遍历代码找到 call 和 jmp 指令从而找到其他的 function 和 branch 。
            ea = new_code_ea
            while ea < new_code_ea + len(new_code):
                idc.create_insn(ea)
                if idc.print_insn_mnem(ea) == 'call':
                    call_target, need_handle = reloc.get(get_operand_value(ea, 0))
                    if need_handle: func_queue.put(call_target)
                elif idc.print_insn_mnem(ea)[0] == 'j' and idc.get_operand_type(ea, 0) != idc.o_reg:
                    jcc_target, need_handle = reloc.get(get_operand_value(ea, 0))
                    if need_handle == True:
                        branch_queue.put(jcc_target)
                ea += get_item_size(ea)
            new_code_ea += len(new_code)
 
    ea = new_code_start
    while ea < new_code_ea:
        idc.create_insn(ea)
        mnem = idc.print_insn_mnem(ea)
 
        if mnem == 'call':
            call_target, need_handle = reloc.get(get_operand_value(ea, 0))
            assert need_handle == False
            ida_bytes.patch_bytes(ea, asm('call 0x%x' % (call_target), ea))
        elif mnem[0] == 'j' and idc.get_operand_type(ea, 0) != idc.o_reg:
            jcc_target, need_handle = reloc.get(get_operand_value(ea, 0))
            assert need_handle == False
            ida_bytes.patch_bytes(ea, asm('%s 0x%x' % (mnem, jcc_target), ea).ljust(idc.get_item_size(ea), b'\x90'))
        elif mnem == 'pushf':
            ida_bytes.patch_bytes(ea, b'\x90' * 9)
            ea += 9
            continue
        ea += get_item_size(ea)
 
    new_jmp_table = (0xA6000 - 0x2D54, 0xA6000)
 
    # 移动并修复跳转表
    for eax in range(0x20):
        jmp_target = (ida_bytes.get_dword(jmp_table[0] + eax * 4) + jmp_table[1]) & 0xFFFFFFFF
        new_jmp_target, need_handle = reloc.get(jmp_target)
        assert need_handle == False
        ida_bytes.patch_dword(new_jmp_table[0] + eax * 4, (new_jmp_target - new_jmp_table[1]) & 0xFFFFFFFF)
 
    need_patch_addr = 0x963D7
    ida_bytes.patch_bytes(need_patch_addr, asm('call 0x900;add ebx, 0x%x' % (new_jmp_table[1] - (need_patch_addr + 5)), need_patch_addr))  # 修复指令
    ida_bytes.patch_bytes(new_jmp_table[1] - 0x2d7a, ida_bytes.get_bytes(jmp_table[1] - 0x2d7a, 0x26))  # 复制字符串到正确位置
 
    for _ in range(0x10000): idc.del_items(new_code_ea + _)
    idc.jumpto(new_code_start)
    ida_funcs.add_func(new_code_start)
 
    print("finish")
 
 
solve()
from queue import *
import ida_bytes
from idc import *
import idc
from keystone import *
from capstone import *
 
asmer = Ks(KS_ARCH_X86, KS_MODE_32)
disasmer = Cs(CS_ARCH_X86, CS_MODE_32)
 
 
def disasm(machine_code, addr=0):
    l = ""
    for i in disasmer.disasm(machine_code, addr):
        l += "{:8s} {};\n".format(i.mnemonic, i.op_str)
    return l.strip('\n')
 
 
def asm(asm_code, addr=0):
    l = b''
    for i in asmer.asm(asm_code, addr)[0]:
        l += bytes([i])
    return l
 
 
def print_asm(ea):
    print(disasm(idc.get_bytes(ea, idc.get_item_size(ea)), ea))
 
 
class RelocDSU:
 
    def __init__(self):
        self.reloc = {}
 
    def get(self, ea):
        if ea not in self.reloc:
            if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:
                jmp_ea = idc.get_operand_value(ea, 0)
 
                if idc.get_segm_name(jmp_ea) == '.got.plt':
                    self.reloc[ea] = ea
                    return self.reloc[ea], False
 
                self.reloc[ea], need_handle = self.get(idc.get_operand_value(ea, 0))
                return self.reloc[ea], need_handle
            else:
                self.reloc[ea] = ea
        if self.reloc[ea] != ea: self.reloc[ea] = self.get(self.reloc[ea])[0]
        return self.reloc[ea], idc.get_segm_name(self.reloc[ea]) == '.text'
 
    def merge(self, ea, reloc_ea):
        self.reloc[self.get(ea)[0]] = self.get(reloc_ea)[0]
 
 
reloc = RelocDSU()
 
 
class Block:
    def __init__(self, start_ea, end_ea, imm, reg, call_target):
        self.start_ea = start_ea
        self.end_ea = end_ea
        self.imm = imm
        self.reg = reg
        self.call_target = call_target
 
 
def mov_code(ea, new_code_ea):
    return asm(disasm(idc.get_bytes(ea, idc.get_item_size(ea)), ea), new_code_ea)
 
 
def get_real_code(block, new_code_ea):
    ea = block.call_target
    while True:
        if idc.print_insn_mnem(ea) == 'cmp':
            reg = idc.print_operand(ea, 0)
            imm = idc.get_operand_value(ea, 1)
            if reg == block.reg and imm == block.imm:
                ea += idc.get_item_size(ea)
                break
        ea += idc.get_item_size(ea)
 
    # 在 cmp 判断找到对应位置后会依次执行 jnz,popa,popf 三条指令
    assert idc.print_insn_mnem(ea) == 'jnz'
    ea += idc.get_item_size(ea)
 
    assert idc.print_insn_mnem(ea) == 'popa'
    ea += idc.get_item_size(ea)
 
    assert idc.print_insn_mnem(ea) == 'popf'
    ea += idc.get_item_size(ea)
 
    if idc.print_insn_mnem(ea) == 'pushf'# 第一种特殊情况,实际是 ret 指令。
        return True, asm('ret')
 
    new_code = b''
    while True:
        if idc.print_insn_mnem(ea) == 'jmp'# 第二种特殊情况,跳转过去可能还会有几条实际功能指令。
            jmp_ea = idc.get_operand_value(ea, 0)
            if idc.print_insn_mnem(jmp_ea) == 'pushf':
                break
            ea = jmp_ea
        else:
            code = mov_code(ea, new_code_ea)
            new_code += code
            new_code_ea += len(code)
            ea += get_item_size(ea)
    return False, new_code
 
 
def get_block(start_ea):
    global imm, reg, call_target
    mnem_list = ['pushf', 'pusha', 'mov', 'call', 'pop']
    ea = start_ea
    for i in range(5):
        mnem = idc.print_insn_mnem(ea)
        assert mnem == mnem_list[i]
        if mnem == 'mov':
            imm = idc.get_operand_value(ea, 1)
            reg = idc.print_operand(ea, 0)
        elif mnem == 'call':
            call_target = idc.get_operand_value(ea, 0)
        ea += idc.get_item_size(ea)
    return Block(start_ea, ea, imm, reg, call_target)
 
 
def handle_one_branch(branch_address, new_code_ea):
    new_code = b''
    ea = branch_address
    while True:
        try:
            block = get_block(ea)
            is_ret, real_code = get_real_code(block, new_code_ea)
            reloc.merge(ea, new_code_ea)
            ea = block.end_ea
            new_code_ea += len(real_code)
            new_code += real_code
            if is_ret: break
        except:
            get_eip_func = {0x900: 'ebx', 0x435c: 'eax'}
            if idc.print_insn_mnem(ea) == 'call' and get_operand_value(ea, 0) in get_eip_func:
                reloc.merge(ea, new_code_ea)
                real_code = asm('mov %s, 0x%x' % (get_eip_func[get_operand_value(ea, 0)], ea + 5), new_code_ea)
            else:
                if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:
                    reloc.merge(new_code_ea, ea)
                else:
                    reloc.merge(ea, new_code_ea)
                real_code = mov_code(ea, new_code_ea)
 
            new_code += real_code
            if real_code == asm('ret'): break
            new_code_ea += len(real_code)
            if idc.print_insn_mnem(ea) == 'jmp' and idc.get_operand_type(ea, 0) != idc.o_reg:  # jmp reg is a swtich
                jmp_ea = idc.get_operand_value(ea, 0)
                if reloc.get(jmp_ea)[1] == False: break  # 跳回之前的代码说明是个循环
                ea = reloc.get(jmp_ea)[0]
            else:
                ea += get_item_size(ea)
    return new_code
 
 
def solve():
    entry_point = 0x48F4
    new_code_start = 0x96150
    new_code_ea = new_code_start
 
    jmp_table = (0x892ac, 0x8c000# [0x8c000 + (eax>>2) - 0x2d54] + 0x8c000
 
    for _ in range(0x10000): idc.del_items(new_code_ea + _)
    ida_bytes.patch_bytes(new_code_ea, 0x10000 * b'\x90')
 
    func_queue = Queue()
    func_queue.put(entry_point)
 
    while not func_queue.empty():
        func_address = func_queue.get()
        if reloc.get(func_address)[1] == False: continue
        reloc.merge(func_address, new_code_ea)
        branch_queue = Queue()
        branch_queue.put(func_address)
        if func_address == 0x4148# 特判 0x4148 地址处的函数,读取跳转表。
            assert new_code_ea == 0x963d0
            for eax in range(0x20):
                jmp_target = (ida_bytes.get_dword(jmp_table[0] + eax * 4) + jmp_table[1]) & 0xFFFFFFFF
                new_jmp_target, need_handle = reloc.get(jmp_target)
                if need_handle: branch_queue.put(jmp_target)
 
        while not branch_queue.empty():
            branch_address = branch_queue.get()
            new_code = handle_one_branch(branch_address, new_code_ea)
            ida_bytes.patch_bytes(new_code_ea, new_code)
 
            # 当前 branch 去完混淆之后需要遍历代码找到 call 和 jmp 指令从而找到其他的 function 和 branch 。
            ea = new_code_ea
            while ea < new_code_ea + len(new_code):
                idc.create_insn(ea)
                if idc.print_insn_mnem(ea) == 'call':
                    call_target, need_handle = reloc.get(get_operand_value(ea, 0))
                    if need_handle: func_queue.put(call_target)
                elif idc.print_insn_mnem(ea)[0] == 'j' and idc.get_operand_type(ea, 0) != idc.o_reg:
                    jcc_target, need_handle = reloc.get(get_operand_value(ea, 0))
                    if need_handle == True:
                        branch_queue.put(jcc_target)
                ea += get_item_size(ea)
            new_code_ea += len(new_code)
 
    ea = new_code_start
    while ea < new_code_ea:
        idc.create_insn(ea)
        mnem = idc.print_insn_mnem(ea)
 
        if mnem == 'call':
            call_target, need_handle = reloc.get(get_operand_value(ea, 0))
            assert need_handle == False
            ida_bytes.patch_bytes(ea, asm('call 0x%x' % (call_target), ea))
        elif mnem[0] == 'j' and idc.get_operand_type(ea, 0) != idc.o_reg:
            jcc_target, need_handle = reloc.get(get_operand_value(ea, 0))
            assert need_handle == False
            ida_bytes.patch_bytes(ea, asm('%s 0x%x' % (mnem, jcc_target), ea).ljust(idc.get_item_size(ea), b'\x90'))
        elif mnem == 'pushf':
            ida_bytes.patch_bytes(ea, b'\x90' * 9)
            ea += 9
            continue
        ea += get_item_size(ea)
 
    new_jmp_table = (0xA6000 - 0x2D54, 0xA6000)
 
    # 移动并修复跳转表
    for eax in range(0x20):
        jmp_target = (ida_bytes.get_dword(jmp_table[0] + eax * 4) + jmp_table[1]) & 0xFFFFFFFF
        new_jmp_target, need_handle = reloc.get(jmp_target)

[招生]科锐逆向工程师培训(2024年11月15日实地,远程教学同时开班, 第51期)

最后于 2023-8-29 19:13 被sky_123编辑 ,原因:
收藏
免费 38
支持
分享
打赏 + 3.00雪花
打赏次数 3 雪花 + 3.00
 
赞赏  栀花谢了春红   +1.00 2024/09/10
赞赏  jelasin   +1.00 2023/09/21 厉害!!!
赞赏  R0g   +1.00 2023/09/19
最新回复 (22)
雪    币: 169
活跃值: (482)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
2
学习
2023-8-29 23:17
0
雪    币: 3059
活跃值: (30876)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
3
感谢分享
2023-8-30 09:07
1
雪    币: 1229
活跃值: (1760)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
4
感谢分享
2023-8-30 10:21
0
雪    币: 1671
活跃值: (215832)
能力值: ( LV4,RANK:40 )
在线值:
发帖
回帖
粉丝
5
tql
2023-8-30 10:42
0
雪    币: 10023
活跃值: (4416)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
6
感谢分享,强
2023-8-30 13:37
0
雪    币: 8447
活跃值: (5041)
能力值: ( LV4,RANK:45 )
在线值:
发帖
回帖
粉丝
7
难得的讲的明白的好文章,强
2023-8-30 16:20
0
雪    币: 3738
活跃值: (3872)
能力值: ( LV4,RANK:50 )
在线值:
发帖
回帖
粉丝
8
感谢分享!
2023-8-30 21:49
0
雪    币: 8894
活跃值: (4208)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
9
感谢分享
2023-8-31 09:49
0
雪    币: 573
活跃值: (1009)
能力值: ( LV3,RANK:20 )
在线值:
发帖
回帖
粉丝
10
1024
2023-8-31 09:55
0
雪    币: 1743
活跃值: (1375)
能力值: ( LV4,RANK:50 )
在线值:
发帖
回帖
粉丝
11
冷饭嗯炒
2023-8-31 11:17
0
雪    币: 15
活跃值: (1913)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
12
楼主能不能说下这个流程图是在哪里画的啊
2023-8-31 17:17
0
雪    币: 15
能力值: ( LV1,RANK:0 )
在线值:
发帖
回帖
粉丝
13
看的我头晕
2023-8-31 17:25
0
雪    币: 1310
活跃值: (816)
能力值: ( LV4,RANK:40 )
在线值:
发帖
回帖
粉丝
14
mb_rhynjqzk 楼主能不能说下这个流程图是在哪里画的啊[em_3]
draw.io
2023-9-1 07:59
0
雪    币: 15
活跃值: (1913)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
15
sky_123 draw.io
灰常感谢!
2023-9-1 11:22
0
雪    币: 1451
活跃值: (3886)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
16
ida用哪个版本?7.7咋不行呢?
2023-9-2 02:46
0
雪    币: 5453
活跃值: (1482)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
17
先收藏下 以后学到了再来
2023-9-2 10:27
0
雪    币: 1310
活跃值: (816)
能力值: ( LV4,RANK:40 )
在线值:
发帖
回帖
粉丝
18
方向感 ida用哪个版本?7.7咋不行呢?
7.7 应该可以,你是不是忘了在程序后面新创建一个段用于存放去混淆的代码?
2023-9-2 23:11
0
雪    币: 1266
活跃值: (1307)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
19
mark
2023-9-4 15:25
0
雪    币: 1810
活跃值: (4020)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
20
感谢分享
2023-9-19 12:21
0
雪    币:
能力值: ( LV1,RANK:0 )
在线值:
发帖
回帖
粉丝
ac_
21
第二题脚本新建了段,为什么会出现断言错误
2024-3-4 00:02
0
雪    币: 1310
活跃值: (816)
能力值: ( LV4,RANK:40 )
在线值:
发帖
回帖
粉丝
22
ac_ 第二题脚本新建了段,为什么会出现断言错误
附件里面的idb可以正常去混淆,可以对比看看哪里设置错了。
2024-3-4 12:46
1
雪    币: 20
能力值: ( LV1,RANK:0 )
在线值:
发帖
回帖
粉丝
23
强网杯2022 find_basic感觉还存在逻辑错误,比方说,一开始执行到get_real_code函数,因为脚本直接判断是否为cmp指令,而此时为地址为0x44fa,其汇编指令为pop edx。然后assert idc.print_insn_mnem(ea) == 'jnz'就会出现断言错误。
2024-11-14 12:31
0
游客
登录 | 注册 方可回帖
返回
//