首页
社区
课程
招聘
[原创]python310新特性->Structural Pattern Matching在VM虚拟机逆向中的妙用
2023-11-2 16:24 6831

[原创]python310新特性->Structural Pattern Matching在VM虚拟机逆向中的妙用

2023-11-2 16:24
6831

前言:
这种写法最初是在2022-GoogleCTF-eldar, 国外的DiceGang的hgarrereyn师傅的wp:https://ctf.harrisongreen.me/2022/googlectf/eldar/中首次用到。也是用于解析虚拟机, 不过是用于解析的ELF metadata-driven turing weird machine。
然后后来国内的2022-强网杯中的deeprev又将这个ELF metadata-driven turing weird machine出了一次, 我也用其来写过这个重定位机的解析,确实效果非常好,可以说毫不夸张像魔法一样。
当时就在Todolist中写道, 用Structural Pattern Matching这种新特性去写下正常虚拟机的解析, 肯定属于是轻轻松松。后续工作上的事情就放下了没去完成, 在我的Todolist中吃灰了接近一年,这一年都在被工作推着走,每天就像机器人一样去执行自己头天写的指令,记忆好像也变差了,经常忘事情,年末项目交付了一些了才有时间弄些自己的,创业之路真的很难。
言归正传, 后续在dicectf-2022-breach这道题的wp:https://github.com/reductor/dice-ctf-2022-breach-writeup中,被正式用于了解析常规虚拟机。
直至放到了今日, 才回来写个这个。
其实虚拟机解析之前我在之前已经发过不少:
[原创]对VM逆向的分析(CTF)(比较经典的一个虚拟机逆向题目)
[原创]处理VM的一种特殊方法和思路
[原创]CTF之自动化VM分析
总结来说, 这种方法属于是disassembler的升级版, 远优于之前发的disassembler, 你说它优于decompiler吗? 我无法给出一个肯定答案, 毕竟decompiler属于一种抽象为高级语言的思路。

python310 Structural Pattern Matching

Learn Structural Pattern Matching

Structural Pattern Matching介绍

PEP 634 – Structural Pattern Matching: Specification:介绍 match 语法和支持的模式

PEP 635 – Structural Pattern Matching: Motivation and Rationale:解释语法这么设计的理由

PEP 636 – Structural Pattern Matching: Tutorial:一个教程。介绍概念、语法和语义

match patterns:

1
2
3
4
5
6
7
8
9
Mapping patterns: match mapping structures like dictionaries.
Sequence patterns: match sequence structures like tuples and lists.
Capture patterns: bind values to names.
AS patterns: bind the value of subpatterns to names.
OR patterns: match one of several different subpatterns.
Wildcard patterns: match anything.
Class patterns: match class structures.
Value patterns: match values stored in attributes.
Literal patterns: match literal values.

Capture patterns(捕捉模式)

匹配一个模式,并绑定到一个name

1
2
3
4
5
6
def sum_list(numbers):
    match numbers:
        case []: # 匹配空列表
            return 0
        case [first, *rest]:# sequence pattern, 由两个capture pattern 组成的 sequence pattern。*rest 匹配剩下的所有元素
            return first + sum_list(rest)
1
2
3
4
5
6
7
8
9
10
def average(*args):
    match args:
        case [x, y]:           # captures the two elements of a sequence
            return (x + y) / 2
        case [x]:              # captures the only element of a sequence
            return x
        case []:
            return 0
        case a:                # captures the entire sequence
            return sum(a) / len(a)

guards(向模式添加条件)

用来进一步限制匹配模式,如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 从小到大排序
def sort(seq):
    match seq:
        case [] | [_]:   # 匹配空序列[] 或者 非空列表中的任何单个元素[_]
            return seq
        case [x, y] if x <= y:
            return seq
        case [x, y]:
            return [y, x]
        case [x, y, z] if x <= y <= z:
            return seq
        case [x, y, z] if x >= y >= z:
            return [z, y, x]
        case [p, *rest]:
            a = sort([x for x in rest if x <= p])     # 比p小的去排序
            b = sort([x for x in rest if p < x])      # 比p大的去排序
            return a + [p] + b

AS Patterns(as模式)

给限制条件取别名,使其能够与bind name一起工作

子模式在 match 语法里面是可以灵活组合的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
In : def as_pattern(obj):
...:     match obj:
...:         case str() as s:
...:             print(f'Got str: {s=}')
...:         case [0, int() as i]:
...:             print(f'Got int: {i=}')
...:         case [tuple() as tu]:
...:             print(f'Got tuple: {tu=}')
...:         case list() | set() | dict() as iterable:
...:             print(f'Got iterable: {iterable=}')
...:
...:
 
In : as_pattern('sss')
Got str: s='sss'
 
In : as_pattern([0, 1])
Got int: i=1
 
In : as_pattern([(1,)])
Got tuple: tu=(1,)
 
In : as_pattern([1, 2, 3])
Got iterable: iterable=[1, 2, 3]
 
In : as_pattern({'a': 1})
Got iterable: iterable={'a': 1}
1
2
3
4
5
6
7
8
9
10
def simplify_expr(tokens):
    match tokens:
        case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
            return simplify_expr(expr)
        case [0, ('+'|'-') as op, right]:
            return UnaryOp(op, right)
        case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
            return Num(left + right)
        case [(int() | float()) as value]:
            return Num(value)

OR Patterns(或模式)

第一种写法,用逗号分隔:

1
2
case 401, 403, 404:
    print("Some HTTP error")

第二种写法与C语言类似:

1
2
3
4
case 401:
case 403:
case 404:
    print("Some HTTP error")

第三种写法:

1
2
case in 401, 403, 404:
    print("Some HTTP error")

第四种写法:

1
case ("a"|"b"|"c"):

第五种写法:

1
case ("a"|"b"|"c") as letter:

Literal Patterns(字面量模式)

使用 Python 自带的基本数据结构,如字符串、数字、布尔值和 None等

1
2
3
4
5
6
7
match number:
    case 0:
        print('zero')
    case 1:
        print('one')
    case 2:
        print('two')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def simplify(expr):
    match expr:
        case ('+', 0, x):   # x + 0
            return x
        case ('+' | '-', x, 0):  # x +- 0
            return x
        case ('and', True, x):   # True and x
            return x
        case ('and', False, x):
            return False
        case ('or', False, x):
            return x
        case ('or', True, x):
            return True
        case ('not', ('not', x)):
            return x
    return expr

Wildcard Pattern(通配符模式)

Wildcard Pattern 是一种特殊的 capture pattern,它接收任何值,但是不将该值绑定到任何一个变量(其实就是忽略不关心的位置)

1
2
3
4
5
6
7
8
def is_closed(sequence):
    match sequence:
        case [_]:               # any sequence with a single element
            return True
        case [start, *_, end]:  # a sequence with at least two elements
            return start == end
        case _:                 # anything
            return False

Value Patterns(值模式)

这种模式主要匹配常量或者 enum 模块的枚举值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
In : class Color(Enum):
...:     RED = 1
...:     GREEN = 2
...:     BLUE = 3
...:
 
In : class NewColor:
...:     YELLOW = 4
...:
 
In : def constant_value(color):
...:     match color:
...:         case Color.RED:
...:             print('Red')
...:         case NewColor.YELLOW:
...:             print('Yellow')
...:         case new_color:
...:             print(new_color)
...:
 
In : constant_value(Color.RED)  # 匹配第一个case
Red
 
In : constant_value(NewColor.YELLOW)  # 匹配第二个case
Yellow
 
In : constant_value(Color.GREEN)  # 匹配第三个case
Color.GREEN
 
In : constant_value(4# 常量值一样都匹配第二个case
Yellow
 
In : constant_value(10# 其他常量
10
 
这里注意,因为 case 具有绑定的作用,所以不能直接使用 YELLOW 这种常量,例如下面这样:
YELLOW = 4
 
def constant_value(color):
    match color:
        case YELLOW:
            print('Yellow')
# 这样语法是错误的

就是在模式中使用其他变量的值,那么使用的其他变量与 capture 模式的绑定名如何区分呢?用 "." 区分。

目前只能使用带 '.' 的常量。

1
2
3
4
5
6
7
8
9
10
11
12
class Codes:
    SUCCESS = 200
    NOT_FOUND = 404
 
def handle(retcode):
    match retcode:
        case Codes.SUCCESS:
            print('success')
        case Codes.NOT_FOUND:
            print('not found')
        case _:
            print('unknown')

Sequence Patterns(序列模式)

可以在 match 里使用列表或者元组格式的结果。

不区分 [a, b, c], (a, b, c) 和 a, b, c,它们是等价的,若要明确判断类型则需要 list([a, b, c])
加星号的模式会匹配任意长度的元素,例如 (*, 3, *), 匹配任何含有 3 的列表。
不会迭代整个迭代器,所有的元素以下标和切片的形式访问。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
In : def sequence(collection):
...:     match collection:
...:         case 1, [x, *others]:
...:             print(f"Got 1 and a nested sequence: {x=}, {others=}")
...:         case (1, x):
...:             print(f"Got 1 and {x}")
...:         case [x, y, z]:
...:             print(f"{x=}, {y=}, {z=}")
...:
 
In : sequence([1])
 
In : sequence([1, 2])
Got 1 and 2
 
In : sequence([1, 2, 3])
x=1, y=2, z=3
 
In : sequence([1, [2, 3]])
Got 1 and a nested sequence: x=2, others=[3]
 
In : sequence([1, [2, 3, 4]])
Got 1 and a nested sequence: x=2, others=[3, 4]
 
In : sequence([2, 3])
 
In : sequence((1, 2))
Got 1 and 2

Mapping Patterns(映射模式)

为了效率,key 必须是常量(literals、value patterns)

其实就是 case 后支持使用字典做匹配

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In : def mapping(config):
...:     match config:
...:         case {'sub': sub_config, **rest}:
...:             print(f'Sub: {sub_config}')
...:             print(f'OTHERS: {rest}')
...:         case {'route': route}:
...:             print(f'ROUTE: {route}')
...:
 
In : mapping({})
 
In : mapping({'route': '/auth/login'})
ROUTE: /auth/login
 
# 匹配有sub键的字典,值绑定到sub_config上,字典其他部分绑定到rest上
In : mapping({'route': '/auth/login', 'sub': {'a': 1}})
Sub: {'a': 1}
OTHERS: {'route': '/auth/login'}
1
2
3
4
5
6
7
def change_red_to_blue(json_obj):
    match json_obj:
        case { 'color': ('red' | '#FF0000') }:
            json_obj['color'] = 'blue'
        case { 'children': children }:
            for child in children:
                change_red_to_blue(child)

Class Patterns(类模式)

Class Patterns 主要实现两个目标:检查对象是某个类的实例、从对象的特定属性中提取数据。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# case 后支持任何对象做匹配。我们先来一个错误的示例:
 
In : class Point:
...:     def __init__(self, x, y):
...:         self.x = x
...:         self.y = y
...:
 
In : def class_pattern(obj):
...:     match obj:
...:         case Point(x, y):
...:             print(f'Point({x=},{y=})')
...:
 
In : class_pattern(Point(1, 2))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [], in <cell line: 1>()
----> 1 class_pattern(Point(1, 2))
 
Input In [], in class_pattern(obj)
      1 def class_pattern(obj):
      2     match obj:
----> 3         case Point(x, y):
      4             print(f'Point({x=},{y=})')
 
TypeError: Point() accepts 0 positional sub-patterns (2 given)
 
# 这是因为对于匹配来说, 位置需要确定 ,所以需要使用位置参数来标识:
 
In : def class_pattern(obj):
...:     match obj:
...:         case Point(x=1, y=2):
...:             print(f'match')
...:
 
In : class_pattern(Point(1, 2))
match
 
# 另外一个解决这种自定义类不用位置参数的匹配方案,使用 __match_args__ 返回一个位置参数的数组,
# 就像这样:
In : class Point:
...:     __match_args__ = ('x', 'y')
...:
...:     def __init__(self, x, y):
...:         self.x = x
...:         self.y = y
...:
 
# 还有就是用dataclass, 这里的 Point2 使用了标准库的 dataclasses.dataclass 装饰器
# 它会提供 __match_args__ 属性,所以可以直接用
In : from dataclasses import dataclass
 
In : @dataclass
...: class Point2:
...:     x: int
...:     y: int
...:
 
In : def class_pattern(obj):
...:     match obj:
...:         case Point(x, y):
...:             print(f'Point({x=},{y=})')
...:         case Point2(x, y):
...:             print(f'Point2({x=},{y=})')
...:
 
In : class_pattern(Point(1, 2))
Point(x=1,y=2)
 
In : class_pattern(Point2(1, 2))
Point2(x=1,y=2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def eval_expr(expr):
    """Evaluate an expression and return the result."""
    match expr:
        case BinaryOp('+', left, right):
            return eval_expr(left) + eval_expr(right)
        case BinaryOp('-', left, right):
            return eval_expr(left) - eval_expr(right)
        case BinaryOp('*', left, right):
            return eval_expr(left) * eval_expr(right)
        case BinaryOp('/', left, right):
            return eval_expr(left) / eval_expr(right)
        case UnaryOp('+', arg):
            return eval_expr(arg)
        case UnaryOp('-', arg):
            return -eval_expr(arg)
        case VarExpr(name):
            raise ValueError(f"Unknown value of: {name}")
        case float() | int():
            return expr
        case _:
            raise ValueError(f"Invalid expression value: {repr(expr)}")

另外一个例子:

1
2
3
4
5
6
7
8
9
match media_object:
    case Image(type="jpg"):
        return media_object
    case Image(type="png") | Image(type="gif"):
        return render_as(media_object, "jpg")
    case Video():
        raise ValueError("Can't extract frames from video yet")
    case other_type:
        raise Exception(f"Media type {media_object} can't be handled yet")

namedtuple 例子,也属于是 class pattern:

1
2
3
4
5
from collections import namedtuple
Mov = namedtuple('mov', ['dst', 'src', 'sz', 'ridx'])
switch op:
    case Mov(dst, src, 8, ridx):
        pass

Type Unions, Aliases, and Guards

numbers 的类型指定为 List,元素类型可以是 float 或 int。

1
2
def mean(numbers: list[float | int]) -> float:
    return sum(numbers) / len(numbers)

可以定义类型别名,类型检查器和程序员都可以识别到这种模式:

1
2
3
4
from typing import TypeAlias
 
Card: TypeAlias = tuple[str, str]          # ('', '')
Deck: TypeAlias = list[Card]               # [('', '')]

Type guards 用于缩小 type union 的范围。

new disassembler of 2020GKCTF-EzMachine

一般这种disassembler都是逐渐去优化的,优化到最后能使用https://docs.pwntools.com/en/stable/asm.html#pwnlib.asm.make_elf_from_assembly
图片描述
直接装配成一个elf

1:建立指令类型,写出parse

Opcode Instruction Notes
0 nop pc+=1
1 mov dst, imm imm is an 1-byte immediate value
2 push imm imm is an 1-byte immediate value
3 push reg put (1-byte)reg in stack
4 pop reg pop (1-byte) from stack to reg
5 PrintStr print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker’
6 add reg1, reg2 reg1 += reg2
7 sub reg1, reg2 reg1 -= reg2
8 mul reg1, reg2 reg1 *= reg2
9 div reg1, reg2 reg1 /= reg2 (Put the quotient into eax and the remainder into ebx)
10 xor reg1, reg2 reg1 ^= reg2
11 jmp addr directly jump to address
12 cmp reg1, reg2 edx=reg1-reg2
13 jz addr jump if edx==0(reg1==reg2)
14 jnz addr jump if edx!=0(reg1≠reg2)
15 jg addr jump if edx>0(reg1>reg2)
16 jl addr jump if edx<0 (reg1 < reg2)
17 InputStr gets(mem); eax=strlen(mem);
18 InitMem memset(mem_addr, 0, sz)
19 MovRegStack mov reg, [ebp-src]
20 MovRegMem mov reg, mem[src]
0xff Exit exit(0)
  • Ezmachine-disassembler-parsefunc.py

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    from collections import namedtuple
    from dataclasses import dataclass
     
    @dataclass
    class Regs(object):
        idx: int
     
        def __repr__(self):
            if self.idx == 0:
                return "eax"
            elif self.idx == 1:
                return "ebx"
            elif self.idx == 2:
                return "ecx"
            elif self.idx == 3:
                return "edx"
            else:
                return "unknown reg {}".format(self.idx)
     
    Nop = namedtuple("Nop", ["addr"])  # case 0: nop
    MovReg = namedtuple("MovReg", ["addr", "dst", "imm"])  # case 1: mov reg, imm
    PushImm = namedtuple("PushImm", ["addr", "imm"])  # case 2: push imm
    PushReg = namedtuple("PushReg", ["addr", "reg"])  # case 3: push reg
    PopReg = namedtuple("PopReg", ["addr", "reg"])  # case 4: pop reg
    # case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
    PrintStr = namedtuple("PrintStr", ["addr"])
     
    AddReg = namedtuple("AddReg", ["addr", "dst", "src"])  # case 6: add reg, reg
    SubReg = namedtuple("SubReg", ["addr", "dst", "src"])  # case 7: sub reg, reg
    MulReg = namedtuple("MulReg", ["addr", "dst", "src"])  # case 8: mul reg, reg
    DivReg = namedtuple("DivReg", ["addr", "dst", "src"])  # case 9: div reg, reg
    XorReg = namedtuple("XorReg", ["addr", "dst", "src"])  # case 10: xor reg, reg
     
    Jmp = namedtuple("Jmp", ["addr", "target"])  # case 11: jmp addr
    Cmp = namedtuple("Cmp", ["addr", "dst", "src"])  # case 12: cmp reg, reg
    Jz = namedtuple("Jz", ["addr", "target"])  # case 13: jz addr
    Jnz = namedtuple("Jnz", ["addr", "target"])  # case 14: jnz addr
    Jg = namedtuple("Jg", ["addr", "target"])  # case 15: jg addr
    Jl = namedtuple("Jl", ["addr", "target"])  # case 16: jl addr
     
    # case 17: gets(mem); eax=strlen(mem);
    InputStr = namedtuple("InputStr", ["addr"])
     
    InitMem = namedtuple(
        "InitMem", ["addr", "mem_addr", "sz"]
    # case 18: memset(mem_addr, 0, sz)
     
    MovRegStack = namedtuple(
        "MovRegStack", ["addr", "dst", "src"]
    # case 19: mov reg, [ebp-src]
     
    MovRegMem = namedtuple(
        "MovRegMem", ["addr", "dst", "src"]
    # case 20: mov reg, mem[src]
     
    Exit = namedtuple("Exit", ["addr"])  # case 0xff: exit(0)
     
    def parse(buffer):
        instructions = []
     
        pc = 0
        while pc < len(buffer):
            opcode = buffer[pc]
     
            match opcode:
                case 0:
                    instructions.append(Nop(pc))
                    pc += 1
                case 1:
                    dst = buffer[pc + 1]
                    imm = buffer[pc + 2]
                    instructions.append(MovReg(pc, Regs(dst), imm))
                    pc += 3
                case 2:
                    imm = buffer[pc + 1]
                    instructions.append(PushImm(pc, imm))
                    pc += 3
                case 3:
                    reg = buffer[pc + 1]
                    instructions.append(PushReg(pc, Regs(reg)))
                    pc += 3
                case 4:
                    reg = buffer[pc + 1]
                    instructions.append(PopReg(pc, Regs(reg)))
                    pc += 3
                case 5:
                    instructions.append(PrintStr(pc))
                    pc += 3
                case 6:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(AddReg(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 7:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(SubReg(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 8:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(MulReg(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 9:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(DivReg(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 10:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(XorReg(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 11:
                    target = 3 * buffer[pc + 1] - 3
                    instructions.append(Jmp(pc, target))
                    pc += 3
                case 12:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(Cmp(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 13:
                    target = 3 * buffer[pc + 1] - 3
                    instructions.append(Jz(pc, target))
                    pc += 3
                case 14:
                    target = 3 * buffer[pc + 1] - 3
                    instructions.append(Jnz(pc, target))
                    pc += 3
                case 15:
                    target = 3 * buffer[pc + 1] - 3
                    instructions.append(Jg(pc, target))
                    pc += 3
                case 16:
                    target = 3 * buffer[pc + 1] - 3
                    instructions.append(Jl(pc, target))
                    pc += 3
                case 17:
                    instructions.append(InputStr(pc))
                    pc += 3
                case 18:
                    mem_addr = buffer[pc + 1]
                    sz = buffer[pc + 2]
                    instructions.append(InitMem(pc, mem_addr, sz))
                    pc += 3
                case 19:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(MovRegStack(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 20:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(MovRegMem(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 255:
                    instructions.append(Exit(pc))
                    pc += 3
                case _:
                    raise Exception(f"unknown opcode: {opcode} at {pc}")
                    break
             
        return instructions
     
    if __name__ == '__main__':
        opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
        instructions = parse(opcode)
            for ins in instructions:
                print(ins)
  • Ezmachine-disassembler-parsefunc.out

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    MovReg(addr=0, dst=edx, imm=3)
    PrintStr(addr=3)
    InputStr(addr=6)
    MovReg(addr=9, dst=ebx, imm=17)
    Cmp(addr=12, dst=eax, src=ebx)
    Jz(addr=15, target=27)
    MovReg(addr=18, dst=edx, imm=1)
    PrintStr(addr=21)
    Exit(addr=24)
    MovReg(addr=27, dst=ecx, imm=0)
    MovReg(addr=30, dst=eax, imm=17)
    Cmp(addr=33, dst=eax, src=ecx)
    Jz(addr=36, target=126)
    MovRegMem(addr=39, dst=eax, src=ecx)
    MovReg(addr=42, dst=ebx, imm=97)
    Cmp(addr=45, dst=eax, src=ebx)
    Jl(addr=48, target=75)
    MovReg(addr=51, dst=ebx, imm=122)
    Cmp(addr=54, dst=eax, src=ebx)
    Jg(addr=57, target=75)
    MovReg(addr=60, dst=ebx, imm=71)
    XorReg(addr=63, dst=eax, src=ebx)
    MovReg(addr=66, dst=ebx, imm=1)
    AddReg(addr=69, dst=eax, src=ebx)
    Jmp(addr=72, target=105)
    MovReg(addr=75, dst=ebx, imm=65)
    Cmp(addr=78, dst=eax, src=ebx)
    Jl(addr=81, target=105)
    MovReg(addr=84, dst=ebx, imm=90)
    Cmp(addr=87, dst=eax, src=ebx)
    Jg(addr=90, target=105)
    MovReg(addr=93, dst=ebx, imm=75)
    XorReg(addr=96, dst=eax, src=ebx)
    MovReg(addr=99, dst=ebx, imm=1)
    SubReg(addr=102, dst=eax, src=ebx)
    MovReg(addr=105, dst=ebx, imm=16)
    DivReg(addr=108, dst=eax, src=ebx)
    PushReg(addr=111, reg=ebx)
    PushReg(addr=114, reg=eax)
    MovReg(addr=117, dst=ebx, imm=1)
    AddReg(addr=120, dst=ecx, src=ebx)
    Jmp(addr=123, target=30)
    PushImm(addr=126, imm=7)
    PushImm(addr=129, imm=13)
    PushImm(addr=132, imm=0)
    PushImm(addr=135, imm=5)
    PushImm(addr=138, imm=1)
    PushImm(addr=141, imm=12)
    PushImm(addr=144, imm=1)
    PushImm(addr=147, imm=0)
    PushImm(addr=150, imm=0)
    PushImm(addr=153, imm=13)
    PushImm(addr=156, imm=5)
    PushImm(addr=159, imm=15)
    PushImm(addr=162, imm=0)
    PushImm(addr=165, imm=9)
    PushImm(addr=168, imm=5)
    PushImm(addr=171, imm=15)
    PushImm(addr=174, imm=3)
    PushImm(addr=177, imm=0)
    PushImm(addr=180, imm=2)
    PushImm(addr=183, imm=5)
    PushImm(addr=186, imm=3)
    PushImm(addr=189, imm=3)
    PushImm(addr=192, imm=1)
    PushImm(addr=195, imm=7)
    PushImm(addr=198, imm=7)
    PushImm(addr=201, imm=11)
    PushImm(addr=204, imm=2)
    PushImm(addr=207, imm=1)
    PushImm(addr=210, imm=2)
    PushImm(addr=213, imm=7)
    PushImm(addr=216, imm=2)
    PushImm(addr=219, imm=12)
    PushImm(addr=222, imm=2)
    PushImm(addr=225, imm=2)
    MovReg(addr=228, dst=ecx, imm=1)
    MovRegStack(addr=231, dst=ebx, src=ecx)
    PopReg(addr=234, reg=eax)
    Cmp(addr=237, dst=eax, src=ebx)
    Jnz(addr=240, target=270)
    MovReg(addr=243, dst=ebx, imm=34)
    Cmp(addr=246, dst=ecx, src=ebx)
    Jz(addr=249, target=264)
    MovReg(addr=252, dst=ebx, imm=1)
    AddReg(addr=255, dst=ecx, src=ebx)
    Jmp(addr=258, target=231)
    MovReg(addr=261, dst=edx, imm=0)
    PrintStr(addr=264)
    Exit(addr=267)
    MovReg(addr=270, dst=edx, imm=1)
    PrintStr(addr=273)
    Exit(addr=276)
    Nop(addr=279)

拿parsefunc.out的原因是检查parse及指定类型定义是否合理。

2:编写初步dump

  • Ezmachine-disassembler-version0.py

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    from collections import namedtuple
    from dataclasses import dataclass
     
    @dataclass
    class Regs(object):
        idx: int
     
        def __repr__(self):
            if self.idx == 0:
                return "eax"
            elif self.idx == 1:
                return "ebx"
            elif self.idx == 2:
                return "ecx"
            elif self.idx == 3:
                return "edx"
            else:
                return "unknown reg {}".format(self.idx)
     
    Nop = namedtuple("Nop", ["addr"])  # case 0: nop
    MovReg = namedtuple("MovReg", ["addr", "dst", "imm"])  # case 1: mov reg, imm
    PushImm = namedtuple("PushImm", ["addr", "imm"])  # case 2: push imm
    PushReg = namedtuple("PushReg", ["addr", "reg"])  # case 3: push reg
    PopReg = namedtuple("PopReg", ["addr", "reg"])  # case 4: pop reg
    # case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
    PrintStr = namedtuple("PrintStr", ["addr"])
     
    AddReg = namedtuple("AddReg", ["addr", "dst", "src"])  # case 6: add reg, reg
    SubReg = namedtuple("SubReg", ["addr", "dst", "src"])  # case 7: sub reg, reg
    MulReg = namedtuple("MulReg", ["addr", "dst", "src"])  # case 8: mul reg, reg
    DivReg = namedtuple("DivReg", ["addr", "dst", "src"])  # case 9: div reg, reg
    XorReg = namedtuple("XorReg", ["addr", "dst", "src"])  # case 10: xor reg, reg
     
    Jmp = namedtuple("Jmp", ["addr", "target"])  # case 11: jmp addr
    Cmp = namedtuple("Cmp", ["addr", "dst", "src"])  # case 12: cmp reg, reg
    Jz = namedtuple("Jz", ["addr", "target"])  # case 13: jz addr
    Jnz = namedtuple("Jnz", ["addr", "target"])  # case 14: jnz addr
    Jg = namedtuple("Jg", ["addr", "target"])  # case 15: jg addr
    Jl = namedtuple("Jl", ["addr", "target"])  # case 16: jl addr
     
    # case 17: gets(mem); eax=strlen(mem);
    InputStr = namedtuple("InputStr", ["addr"])
     
    InitMem = namedtuple(
        "InitMem", ["addr", "mem_addr", "sz"]
    # case 18: memset(mem_addr, 0, sz)
     
    MovRegStack = namedtuple(
        "MovRegStack", ["addr", "dst", "src"]
    # case 19: mov reg, [ebp-src]
     
    MovRegMem = namedtuple(
        "MovRegMem", ["addr", "dst", "src"]
    # case 20: mov reg, mem[src]
     
    Exit = namedtuple("Exit", ["addr"])  # case 0xff: exit(0)
     
    def parse(buffer):
        instructions = []
     
        pc = 0
        while pc < len(buffer):
            opcode = buffer[pc]
     
            match opcode:
                case 0:
                    instructions.append(Nop(pc))
                    pc += 1
                case 1:
                    dst = buffer[pc + 1]
                    imm = buffer[pc + 2]
                    instructions.append(MovReg(pc, Regs(dst), imm))
                    pc += 3
                case 2:
                    imm = buffer[pc + 1]
                    instructions.append(PushImm(pc, imm))
                    pc += 3
                case 3:
                    reg = buffer[pc + 1]
                    instructions.append(PushReg(pc, Regs(reg)))
                    pc += 3
                case 4:
                    reg = buffer[pc + 1]
                    instructions.append(PopReg(pc, Regs(reg)))
                    pc += 3
                case 5:
                    instructions.append(PrintStr(pc))
                    pc += 3
                case 6:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(AddReg(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 7:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(SubReg(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 8:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(MulReg(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 9:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(DivReg(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 10:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(XorReg(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 11:
                    target = 3 * buffer[pc + 1] - 3
                    instructions.append(Jmp(pc, target))
                    pc += 3
                case 12:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(Cmp(pc, Regs(dst), Regs(src)))
                    pc += 3
                case 13:
                    target = 3 * buffer[pc + 1] - 3
                    instructions.append(Jz(pc, target))
                    pc += 3
                case 14:
                    target = 3 * buffer[pc + 1] - 3
                    instructions.append(Jnz(pc, target))
                    pc += 3
                case 15:
                    target = 3 * buffer[pc + 1] - 3
                    instructions.append(Jg(pc, target))
                    pc += 3
                case 16:
                    target = 3 * buffer[pc + 1] - 3
                    instructions.append(Jl(pc, target))
                    pc += 3
                case 17:
                    instructions.append(InputStr(pc))
                    pc += 3
                case 18:
                    mem_addr = buffer[pc + 1]
                    sz = buffer[pc + 2]
                    instructions.append(InitMem(pc, mem_addr, sz))
                    pc += 3
                case 19:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(MovRegStack(pc, Regs(dst), src))
                    pc += 3
                case 20:
                    dst = buffer[pc + 1]
                    src = buffer[pc + 2]
                    instructions.append(MovRegMem(pc, Regs(dst), src))
                    pc += 3
                case 255:
                    instructions.append(Exit(pc))
                    pc += 3
                case _:
                    raise Exception(f"unknown opcode: {opcode} at {pc}")
                    break
             
        return instructions
     
    def dump(instructions):
        for ins in instructions:
            match ins:
                case Nop(addr):
                    print(f"_0x{addr:04x}: nop")
                case MovReg(addr, dst, imm):
                    print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
                case PushImm(addr, imm):
                    print(f"_0x{addr:04x}: push 0x{imm:02x}")
                case PushReg(addr, reg):
                    print(f"_0x{addr:04x}: push {reg}")
                case PopReg(addr, reg):
                    print(f"_0x{addr:04x}: pop {reg}")
                case PrintStr(addr):
                    print(f"_0x{addr:04x}: print_str")
                case AddReg(addr, dst, src):
                    print(f"_0x{addr:04x}: add {dst}, {src}")
                case SubReg(addr, dst, src):
                    print(f"_0x{addr:04x}: sub {dst}, {src}")
                case MulReg(addr, dst, src):
                    print(f"_0x{addr:04x}: mul {dst}, {src}")
                case DivReg(addr, dst, src):
                    print(f"_0x{addr:04x}: div {dst}, {src}")
                case XorReg(addr, dst, src):
                    print(f"_0x{addr:04x}: xor {dst}, {src}")
                case Jmp(addr, target):
                    print(f"_0x{addr:04x}: jmp _0x{target:04x}")
                case Cmp(addr, dst, src):
                    print(f"_0x{addr:04x}: cmp {dst}, {src}")
                case Jz(addr, target):
                    print(f"_0x{addr:04x}: jz _0x{target:04x}")
                case Jnz(addr, target):
                    print(f"_0x{addr:04x}: jnz _0x{target:04x}")
                case Jg(addr, target):
                    print(f"_0x{addr:04x}: jg _0x{target:04x}")
                case Jl(addr, target):
                    print(f"_0x{addr:04x}: jl _0x{target:04x}")
                case InputStr(addr):
                    print(f"_0x{addr:04x}: input_str")
                case InitMem(addr, mem_addr, sz):
                    print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
                case MovRegStack(addr, dst, src):
                    print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
                case MovRegMem(addr, dst, src):
                    print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
                case Exit(addr):
                    print(f"_0x{addr:04x}: exit(0)")
                case _:
                    raise Exception(f"unknown instruction: {ins}")
                    break
     
    if __name__ == '__main__':
        opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
        instructions = parse(opcode)
        dump(instructions)
  • Ezmachine-disassembler-dumpfunc-version0.out

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    _0x0000: mov edx, 0x03
    _0x0003: print_str
    _0x0006: input_str
    _0x0009: mov ebx, 0x11
    _0x000c: cmp eax, ebx
    _0x000f: jz _0x001b
    _0x0012: mov edx, 0x01
    _0x0015: print_str
    _0x0018: exit(0)
    _0x001b: mov ecx, 0x00
    _0x001e: mov eax, 0x11
    _0x0021: cmp eax, ecx
    _0x0024: jz _0x007e
    _0x0027: mov eax, mem[2]
    _0x002a: mov ebx, 0x61
    _0x002d: cmp eax, ebx
    _0x0030: jl _0x004b
    _0x0033: mov ebx, 0x7a
    _0x0036: cmp eax, ebx
    _0x0039: jg _0x004b
    _0x003c: mov ebx, 0x47
    _0x003f: xor eax, ebx
    _0x0042: mov ebx, 0x01
    _0x0045: add eax, ebx
    _0x0048: jmp _0x0069
    _0x004b: mov ebx, 0x41
    _0x004e: cmp eax, ebx
    _0x0051: jl _0x0069
    _0x0054: mov ebx, 0x5a
    _0x0057: cmp eax, ebx
    _0x005a: jg _0x0069
    _0x005d: mov ebx, 0x4b
    _0x0060: xor eax, ebx
    _0x0063: mov ebx, 0x01
    _0x0066: sub eax, ebx
    _0x0069: mov ebx, 0x10
    _0x006c: div eax, ebx
    _0x006f: push ebx
    _0x0072: push eax
    _0x0075: mov ebx, 0x01
    _0x0078: add ecx, ebx
    _0x007b: jmp _0x001e
    _0x007e: push 0x07
    _0x0081: push 0x0d
    _0x0084: push 0x00
    _0x0087: push 0x05
    _0x008a: push 0x01
    _0x008d: push 0x0c
    _0x0090: push 0x01
    _0x0093: push 0x00
    _0x0096: push 0x00
    _0x0099: push 0x0d
    _0x009c: push 0x05
    _0x009f: push 0x0f
    _0x00a2: push 0x00
    _0x00a5: push 0x09
    _0x00a8: push 0x05
    _0x00ab: push 0x0f
    _0x00ae: push 0x03
    _0x00b1: push 0x00
    _0x00b4: push 0x02
    _0x00b7: push 0x05
    _0x00ba: push 0x03
    _0x00bd: push 0x03
    _0x00c0: push 0x01
    _0x00c3: push 0x07
    _0x00c6: push 0x07
    _0x00c9: push 0x0b
    _0x00cc: push 0x02
    _0x00cf: push 0x01
    _0x00d2: push 0x02
    _0x00d5: push 0x07
    _0x00d8: push 0x02
    _0x00db: push 0x0c
    _0x00de: push 0x02
    _0x00e1: push 0x02
    _0x00e4: mov ecx, 0x01
    _0x00e7: mov ebx, [ebp-2]
    _0x00ea: pop eax
    _0x00ed: cmp eax, ebx
    _0x00f0: jnz _0x010e
    _0x00f3: mov ebx, 0x22
    _0x00f6: cmp ecx, ebx
    _0x00f9: jz _0x0108
    _0x00fc: mov ebx, 0x01
    _0x00ff: add ecx, ebx
    _0x0102: jmp _0x00e7
    _0x0105: mov edx, 0x00
    _0x0108: print_str
    _0x010b: exit(0)
    _0x010e: mov edx, 0x01
    _0x0111: print_str
    _0x0114: exit(0)
    _0x0117: nop

其实这里拿到的Ezmachine-disassembler-dumpfunc-version0.out,就跟以前我们的disassembler得到的差不多。

拿这个dumpfunc-version0.out的目的,就是为了参考这个去做优化。

3:优化

- (1) 添加函数头尾

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
![图片描述](upload/attach/202311/921830_QUTPQFYHWRH7AJ4.webp)
 
由于头和尾都是直接开始的指令,没有栈帧,我们为其添加
 
```python
from collections import namedtuple
from dataclasses import dataclass
 
......
 
# 优化(1): 添加main函数序言和结尾
prologue = namedtuple("prologue", [])
epilogue = namedtuple("epilogue", [])
def add_main_prologue_epilogue(instructions):
    instructions.insert(0, prologue())
    instructions.append(epilogue())
    return instructions
 
def dump(instructions):
    for ins in instructions:
        match ins:
            case prologue():
                print(f"push ebp")
                print(f"mov ebp, esp")
            case epilogue():
                print(f"mov esp, ebp")
                print(f"pop ebp")
                print(f"ret")
           ......
            case _:
                raise Exception(f"unknown instruction: {ins}")
                break
 
if __name__ == '__main__':
    opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
    instructions = parse(opcode)
    instructions = add_main_prologue_epilogue(instructions)
    dump(instructions)
```

- (2) 处理VM中mem及字符串

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
```python
.....
 
# VM中要使用的内存
def dump_data():
    print("\n")
    print("""right:\n    .asciz "right" """)
    print("""wrong:\n    .asciz "wrong" """)
    print("""plz_input:\n    .asciz "plz input:" """)
    print("""hacker:\n    .asciz "hacker" """)
    print("""mem:\n    .space 0x100 """)
 
if __name__ == '__main__':
        opcode = [...]
    instructions = parse(opcode)
    instructions = add_main_prologue_epilogue(instructions)
    dump(instructions)
    dump_data()
```

- (3) 处理print_str

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
![图片描述](upload/attach/202311/921830_4U5PDW3GE26ETHF.webp)
 
我们弄出来的汇编中有这种语句
 
```python
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr = namedtuple("PrintStr", ["addr"])
```
 
其主要就是根据edx的值,来打印不同的字符串
 
难以避免的要进行函数调用,我们可以借用pwntools的shellcraft来产生:https://docs.pwntools.com/en/stable/shellcraft/i386.html#module-pwnlib.shellcraft.i386.linux
 
```python
from collections import namedtuple
from dataclasses import dataclass
 
.....
write_func_call = namedtuple("write_func_call", ["addr", "str_idx"])
# 优化(3): 处理print_str
def handle_print_str(instructions):
    """
    _0x0000: mov edx, 0x03
    _0x0003: print_str
 
    _0x0012: mov edx, 0x01
    _0x0015: print_str
 
    _0x0105: mov edx, 0x00
    _0x0108: print_str
 
    _0x010e: mov edx, 0x01
    _0x0111: print_str
    """
    idx = 0
    while idx < len(instructions):
        match instructions[idx: idx+2]:
            case [
                MovReg(addr1, Regs(3), imm),
                PrintStr(addr2)
            ] if (imm == 0x00 or imm == 0x01 or imm == 0x03 or imm == 0x04):
                instructions[idx: idx+2] = [write_func_call(addr2, imm)]
        idx += 1
 
def dump(instructions):
    for ins in instructions:
        match ins:
                        ......
            case write_func_call(addr, str_idx):
                if str_idx == 0:
                    print_right = f"""/* write(fd=1, buf='right', n=5) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, right
    push 5
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_right)
                elif str_idx == 1:
                    print_wrong = f"""/* write(fd=1, buf='wrong', n=5) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, wrong
    push 5
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_wrong)
                elif str_idx == 3:
                    print_plz_input = f"""/* write(fd=1, buf='plz input:', n=10) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, plz_input
    push 10
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_plz_input)
                elif str_idx == 4:
                    print_hacker = f"""/* write(fd=1, buf='hacker', n=6) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, hacker
    push 6
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_hacker)
            case Nop(addr):
                print(f"_0x{addr:04x}: nop")
            case MovReg(addr, dst, imm):
                print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
            case PushImm(addr, imm):
                print(f"_0x{addr:04x}: push 0x{imm:02x}")
            case PushReg(addr, reg):
                print(f"_0x{addr:04x}: push {reg}")
            case PopReg(addr, reg):
                print(f"_0x{addr:04x}: pop {reg}")
            case PrintStr(addr):
                print(f"_0x{addr:04x}: print_str")
            case AddReg(addr, dst, src):
                print(f"_0x{addr:04x}: add {dst}, {src}")
            case SubReg(addr, dst, src):
                print(f"_0x{addr:04x}: sub {dst}, {src}")
            case MulReg(addr, dst, src):
                print(f"_0x{addr:04x}: mul {dst}, {src}")
            case DivReg(addr, dst, src):
                print(f"_0x{addr:04x}: div {dst}, {src}")
            case XorReg(addr, dst, src):
                print(f"_0x{addr:04x}: xor {dst}, {src}")
            case Jmp(addr, target):
                print(f"_0x{addr:04x}: jmp _0x{target:04x}")
            case Cmp(addr, dst, src):
                print(f"_0x{addr:04x}: cmp {dst}, {src}")
            case Jz(addr, target):
                print(f"_0x{addr:04x}: jz _0x{target:04x}")
            case Jnz(addr, target):
                print(f"_0x{addr:04x}: jnz _0x{target:04x}")
            case Jg(addr, target):
                print(f"_0x{addr:04x}: jg _0x{target:04x}")
            case Jl(addr, target):
                print(f"_0x{addr:04x}: jl _0x{target:04x}")
            case InputStr(addr):
                print(f"_0x{addr:04x}: input_str")
            case InitMem(addr, mem_addr, sz):
                print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
            case MovRegStack(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
            case MovRegMem(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
            case Exit(addr):
                print(f"_0x{addr:04x}: exit(0)")
            case _:
                raise Exception(f"unknown instruction: {ins}")
                break
 
......
```

- (4) 处理input_str

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
![图片描述](upload/attach/202311/921830_X3P96QF84NKZ4PR.webp)
 
```python
# case 17: gets(mem); eax=strlen(mem);
InputStr = namedtuple("InputStr", ["addr"])
```
 
 ![图片描述](upload/attach/202311/921830_JAXQMQ975NMVWXV.webp)
 
```python
from collections import namedtuple
from dataclasses import dataclass
 
......
 
read_strlen_func_call = namedtuple("read_func_call", ["addr"])
# 优化(4): 处理input_str
def handle_input_str(instructions):
    """
    _0x0006: input_str
    """
    idx = 0
    while idx < len(instructions):
        match instructions[idx: idx+1]:
            case [
                InputStr(addr)
            ]:
                instructions[idx: idx+1] = [read_strlen_func_call(addr)]
        idx += 1
 
def dump(instructions):
    for ins in instructions:
        match ins:
                        ......
            case read_strlen_func_call(addr):
                print_read_strlen = f"""/* read(fd=0, buf=mem, n=0x100) */
_0x{addr:04x}: push eax
    push ebx
    push ecx
    push edx
    xor ebx, ebx
    mov ecx, mem
    push 0x100
    pop edx
    push SYS_read  /* 3 */
    pop eax
    int 0x80
 
    /* strlen(mem) */
    mov edi, mem
    xor eax, eax
    push -1
    pop ecx
    repnz scas al, BYTE PTR [edi]
    inc ecx
    inc ecx
    neg ecx
    /* moving ecx into ecx, but this is a no-op */
    mov edi, ecx
    pop edx
    pop ecx
    pop ebx
    pop eax
    mov eax, edi
"""
                print(print_read_strlen)
            case Nop(addr):
                print(f"_0x{addr:04x}: nop")
            case MovReg(addr, dst, imm):
                print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
            case PushImm(addr, imm):
                print(f"_0x{addr:04x}: push 0x{imm:02x}")
            case PushReg(addr, reg):
                print(f"_0x{addr:04x}: push {reg}")
            case PopReg(addr, reg):
                print(f"_0x{addr:04x}: pop {reg}")
            case PrintStr(addr):
                print(f"_0x{addr:04x}: print_str")
            case AddReg(addr, dst, src):
                print(f"_0x{addr:04x}: add {dst}, {src}")
            case SubReg(addr, dst, src):
                print(f"_0x{addr:04x}: sub {dst}, {src}")
            case MulReg(addr, dst, src):
                print(f"_0x{addr:04x}: mul {dst}, {src}")
            case DivReg(addr, dst, src):
                print(f"_0x{addr:04x}: div {dst}, {src}")
            case XorReg(addr, dst, src):
                print(f"_0x{addr:04x}: xor {dst}, {src}")
            case Jmp(addr, target):
                print(f"_0x{addr:04x}: jmp _0x{target:04x}")
            case Cmp(addr, dst, src):
                print(f"_0x{addr:04x}: cmp {dst}, {src}")
            case Jz(addr, target):
                print(f"_0x{addr:04x}: jz _0x{target:04x}")
            case Jnz(addr, target):
                print(f"_0x{addr:04x}: jnz _0x{target:04x}")
            case Jg(addr, target):
                print(f"_0x{addr:04x}: jg _0x{target:04x}")
            case Jl(addr, target):
                print(f"_0x{addr:04x}: jl _0x{target:04x}")
            case InputStr(addr):
                print(f"_0x{addr:04x}: input_str")
            case InitMem(addr, mem_addr, sz):
                print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
            case MovRegStack(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
            case MovRegMem(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
            case Exit(addr):
                print(f"_0x{addr:04x}: exit(0)")
            case _:
                raise Exception(f"unknown instruction: {ins}")
                break
 
# 优化(2): VM中要使用的内存
def dump_data():
    print("\n")
    print("""right:\n    .asciz "right" """)
    print("""wrong:\n    .asciz "wrong" """)
    print("""plz_input:\n    .asciz "plz input:" """)
    print("""hacker:\n    .asciz "hacker" """)
    print("""mem:\n    .space 0x100 """)
 
if __name__ == '__main__':
    opcode = [.....]
    instructions = parse(opcode)
    instructions = add_main_prologue_epilogue(instructions)
    handle_print_str(instructions)
    handle_input_str(instructions)
    dump(instructions)
    dump_data()
```

- (5) 处理exit(0)

1
2
3
4
5
6
7
8
9
10
![图片描述](upload/attach/202311/921830_XYKFP696Y5UC9UJ.webp)
```python
case Exit(addr):
                print(f"""/* exit(status=0) */
_0x{addr:04x}: xor ebx, ebx
    push SYS_exit  /* 1 */
    pop eax
    int 0x80
""")
```

- (6) 优化mov ebx, [ebp-ecx]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
这种asm是会报错的
 
 ![图片描述](upload/attach/202311/921830_MVYBK4BXJK84HFF.webp)
 
换成如下这种
 
 ![图片描述](upload/attach/202311/921830_9RHVZ9B8HUX8NJE.webp)
 
 ![图片描述](upload/attach/202311/921830_4BK2G6J8HPXACSM.webp)
 
```python
case MovRegStack(addr, dst, src):
    # print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
    print(f"_0x{addr:04x}: mov {dst}, ebp")
    print(f"    sub {dst}, {src}")
    print(f"    mov {dst}, [{dst}]")
```

- (7) 优化_0x006c: div eax, ebx

1
2
3
4
5
6
7
8
9
10
11
![图片描述](upload/attach/202311/921830_2T8PCPWA695R3CV.webp)
 
正常的div ebx执行之后,商将存储在 eax 寄存器中,余数将存储在 edx 寄存器中
 
它的div有所不同,是存到eax和ebx中的
 
 ![图片描述](upload/attach/202311/921830_P25BCYVPP4BEK4M.webp)
 
我们还需要在div eax, ebx后面,加一条mov ebx, edx
 
 ![图片描述](upload/attach/202311/921830_PV5TZ5K9T466DWK.webp)

Ezmachine-disassembler.py

Ezmachine-disassembler-out.asm

4:调用pwntools make_elf

Ezmachine-asm_compile.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from ast import dump
from pwn import *
 
code = """
push ebp
mov ebp, esp
.....
ret
 
right:
    .asciz "right"
wrong:
    .asciz "wrong"
plz_input:
    .asciz "plz input:"
hacker:
    .asciz "hacker"
mem:
    .space 0x100
"""
 
elf = make_elf_from_assembly(code)
print(elf)

图片描述

效果

图片描述
图片描述


[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课

最后于 2023-11-2 16:31 被SYJ-Re编辑 ,原因:
收藏
点赞6
打赏
分享
最新回复 (2)
雪    币: 19349
活跃值: (28971)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
秋狝 2023-11-3 09:29
2
2
感谢分享
雪    币:
能力值: ( LV1,RANK:0 )
在线值:
发帖
回帖
粉丝
wx_李长宇 2024-4-3 12:20
3
0
这两个包就是python3.10自带的吗
游客
登录 | 注册 方可回帖
返回