-
-
[原创]python310新特性->Structural Pattern Matching在VM虚拟机逆向中的妙用
-
2023-11-2 16:24 6831
-
前言:
这种写法最初是在2022-GoogleCTF-eldar, 国外的DiceGang的hgarrereyn师傅的wp:https://ctf.harrisongreen.me/2022/googlectf/eldar/中首次用到。也是用于解析虚拟机, 不过是用于解析的ELF metadata-driven turing weird machine。
然后后来国内的2022-强网杯中的deeprev又将这个ELF metadata-driven turing weird machine出了一次, 我也用其来写过这个重定位机的解析,确实效果非常好,可以说毫不夸张像魔法一样。
当时就在Todolist中写道, 用Structural Pattern Matching这种新特性去写下正常虚拟机的解析, 肯定属于是轻轻松松。后续工作上的事情就放下了没去完成, 在我的Todolist中吃灰了接近一年,这一年都在被工作推着走,每天就像机器人一样去执行自己头天写的指令,记忆好像也变差了,经常忘事情,年末项目交付了一些了才有时间弄些自己的,创业之路真的很难。
言归正传, 后续在dicectf-2022-breach这道题的wp:https://github.com/reductor/dice-ctf-2022-breach-writeup中,被正式用于了解析常规虚拟机。
直至放到了今日, 才回来写个这个。
其实虚拟机解析之前我在之前已经发过不少:
[原创]对VM逆向的分析(CTF)(比较经典的一个虚拟机逆向题目)
[原创]处理VM的一种特殊方法和思路
[原创]CTF之自动化VM分析
总结来说, 这种方法属于是disassembler的升级版, 远优于之前发的disassembler, 你说它优于decompiler吗? 我无法给出一个肯定答案, 毕竟decompiler属于一种抽象为高级语言的思路。
python310 Structural Pattern Matching
Learn Structural Pattern Matching
Structural Pattern Matching介绍
PEP 634 – Structural Pattern Matching: Specification:介绍 match 语法和支持的模式
PEP 635 – Structural Pattern Matching: Motivation and Rationale:解释语法这么设计的理由
PEP 636 – Structural Pattern Matching: Tutorial:一个教程。介绍概念、语法和语义
match patterns:
1 2 3 4 5 6 7 8 9 | Mapping patterns: match mapping structures like dictionaries. Sequence patterns: match sequence structures like tuples and lists. Capture patterns: bind values to names. AS patterns: bind the value of subpatterns to names. OR patterns: match one of several different subpatterns. Wildcard patterns: match anything. Class patterns: match class structures. Value patterns: match values stored in attributes. Literal patterns: match literal values. |
Capture patterns(捕捉模式)
匹配一个模式,并绑定到一个name
1 2 3 4 5 6 | def sum_list(numbers): match numbers: case []: # 匹配空列表 return 0 case [first, * rest]: # sequence pattern, 由两个capture pattern 组成的 sequence pattern。*rest 匹配剩下的所有元素 return first + sum_list(rest) |
1 2 3 4 5 6 7 8 9 10 | def average( * args): match args: case [x, y]: # captures the two elements of a sequence return (x + y) / 2 case [x]: # captures the only element of a sequence return x case []: return 0 case a: # captures the entire sequence return sum (a) / len (a) |
guards(向模式添加条件)
用来进一步限制匹配模式,如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # 从小到大排序 def sort(seq): match seq: case [] | [_]: # 匹配空序列[] 或者 非空列表中的任何单个元素[_] return seq case [x, y] if x < = y: return seq case [x, y]: return [y, x] case [x, y, z] if x < = y < = z: return seq case [x, y, z] if x > = y > = z: return [z, y, x] case [p, * rest]: a = sort([x for x in rest if x < = p]) # 比p小的去排序 b = sort([x for x in rest if p < x]) # 比p大的去排序 return a + [p] + b |
AS Patterns(as模式)
给限制条件取别名,使其能够与bind name一起工作
子模式在 match 语法里面是可以灵活组合的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | In : def as_pattern(obj): ...: match obj: ...: case str () as s: ...: print (f 'Got str: {s=}' ) ...: case [ 0 , int () as i]: ...: print (f 'Got int: {i=}' ) ...: case [ tuple () as tu]: ...: print (f 'Got tuple: {tu=}' ) ...: case list () | set () | dict () as iterable: ...: print (f 'Got iterable: {iterable=}' ) ...: ...: In : as_pattern( 'sss' ) Got str : s = 'sss' In : as_pattern([ 0 , 1 ]) Got int : i = 1 In : as_pattern([( 1 ,)]) Got tuple : tu = ( 1 ,) In : as_pattern([ 1 , 2 , 3 ]) Got iterable: iterable = [ 1 , 2 , 3 ] In : as_pattern({ 'a' : 1 }) Got iterable: iterable = { 'a' : 1 } |
1 2 3 4 5 6 7 8 9 10 | def simplify_expr(tokens): match tokens: case [( '(' | '[' ) as l, * expr, ( ')' | ']' ) as r] if (l + r) in ( '()' , '[]' ): return simplify_expr(expr) case [ 0 , ( '+' | '-' ) as op, right]: return UnaryOp(op, right) case [( int () | float () as left) | Num(left), '+' , ( int () | float () as right) | Num(right)]: return Num(left + right) case [( int () | float ()) as value]: return Num(value) |
OR Patterns(或模式)
第一种写法,用逗号分隔:
1 2 | case 401 , 403 , 404 : print ( "Some HTTP error" ) |
第二种写法与C语言类似:
1 2 3 4 | case 401 : case 403 : case 404 : print ( "Some HTTP error" ) |
第三种写法:
1 2 | case in 401 , 403 , 404 : print ( "Some HTTP error" ) |
第四种写法:
1 | case ( "a" | "b" | "c" ): |
第五种写法:
1 | case ( "a" | "b" | "c" ) as letter: |
Literal Patterns(字面量模式)
使用 Python 自带的基本数据结构,如字符串、数字、布尔值和 None等
1 2 3 4 5 6 7 | match number: case 0 : print ( 'zero' ) case 1 : print ( 'one' ) case 2 : print ( 'two' ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | def simplify(expr): match expr: case ( '+' , 0 , x): # x + 0 return x case ( '+' | '-' , x, 0 ): # x +- 0 return x case ( 'and' , True , x): # True and x return x case ( 'and' , False , x): return False case ( 'or' , False , x): return x case ( 'or' , True , x): return True case ( 'not' , ( 'not' , x)): return x return expr |
Wildcard Pattern(通配符模式)
Wildcard Pattern 是一种特殊的 capture pattern,它接收任何值,但是不将该值绑定到任何一个变量(其实就是忽略不关心的位置)
1 2 3 4 5 6 7 8 | def is_closed(sequence): match sequence: case [_]: # any sequence with a single element return True case [start, * _, end]: # a sequence with at least two elements return start = = end case _: # anything return False |
Value Patterns(值模式)
这种模式主要匹配常量或者 enum 模块的枚举值:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | In : class Color(Enum): ...: RED = 1 ...: GREEN = 2 ...: BLUE = 3 ...: In : class NewColor: ...: YELLOW = 4 ...: In : def constant_value(color): ...: match color: ...: case Color.RED: ...: print ( 'Red' ) ...: case NewColor.YELLOW: ...: print ( 'Yellow' ) ...: case new_color: ...: print (new_color) ...: In : constant_value(Color.RED) # 匹配第一个case Red In : constant_value(NewColor.YELLOW) # 匹配第二个case Yellow In : constant_value(Color.GREEN) # 匹配第三个case Color.GREEN In : constant_value( 4 ) # 常量值一样都匹配第二个case Yellow In : constant_value( 10 ) # 其他常量 10 这里注意,因为 case 具有绑定的作用,所以不能直接使用 YELLOW 这种常量,例如下面这样: YELLOW = 4 def constant_value(color): match color: case YELLOW: print ( 'Yellow' ) # 这样语法是错误的 |
就是在模式中使用其他变量的值,那么使用的其他变量与 capture 模式的绑定名如何区分呢?用 "." 区分。
目前只能使用带 '.' 的常量。
1 2 3 4 5 6 7 8 9 10 11 12 | class Codes: SUCCESS = 200 NOT_FOUND = 404 def handle(retcode): match retcode: case Codes.SUCCESS: print ( 'success' ) case Codes.NOT_FOUND: print ( 'not found' ) case _: print ( 'unknown' ) |
Sequence Patterns(序列模式)
可以在 match 里使用列表或者元组格式的结果。
不区分 [a, b, c], (a, b, c) 和 a, b, c,它们是等价的,若要明确判断类型则需要 list([a, b, c])
加星号的模式会匹配任意长度的元素,例如 (*, 3, *), 匹配任何含有 3 的列表。
不会迭代整个迭代器,所有的元素以下标和切片的形式访问。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | In : def sequence(collection): ...: match collection: ...: case 1 , [x, *others]: ...: print(f "Got 1 and a nested sequence: {x=}, {others=}" ) ...: case ( 1 , x): ...: print(f "Got 1 and {x}" ) ...: case [x, y, z]: ...: print(f "{x=}, {y=}, {z=}" ) ...: In : sequence([ 1 ]) In : sequence([ 1 , 2 ]) Got 1 and 2 In : sequence([ 1 , 2 , 3 ]) x= 1 , y= 2 , z= 3 In : sequence([ 1 , [ 2 , 3 ]]) Got 1 and a nested sequence: x= 2 , others=[ 3 ] In : sequence([ 1 , [ 2 , 3 , 4 ]]) Got 1 and a nested sequence: x= 2 , others=[ 3 , 4 ] In : sequence([ 2 , 3 ]) In : sequence(( 1 , 2 )) Got 1 and 2 |
Mapping Patterns(映射模式)
为了效率,key 必须是常量(literals、value patterns)
其实就是 case 后支持使用字典做匹配
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | In : def mapping(config): ...: match config: ...: case { 'sub' : sub_config, * * rest}: ...: print (f 'Sub: {sub_config}' ) ...: print (f 'OTHERS: {rest}' ) ...: case { 'route' : route}: ...: print (f 'ROUTE: {route}' ) ...: In : mapping({}) In : mapping({ 'route' : '/auth/login' }) ROUTE: / auth / login # 匹配有sub键的字典,值绑定到sub_config上,字典其他部分绑定到rest上 In : mapping({ 'route' : '/auth/login' , 'sub' : { 'a' : 1 }}) Sub: { 'a' : 1 } OTHERS: { 'route' : '/auth/login' } |
1 2 3 4 5 6 7 | def change_red_to_blue(json_obj): match json_obj: case { 'color' : ( 'red' | '#FF0000' ) }: json_obj[ 'color' ] = 'blue' case { 'children' : children }: for child in children: change_red_to_blue(child) |
Class Patterns(类模式)
Class Patterns 主要实现两个目标:检查对象是某个类的实例、从对象的特定属性中提取数据。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | # case 后支持任何对象做匹配。我们先来一个错误的示例: In : class Point: ...: def __init__( self , x, y): ...: self .x = x ...: self .y = y ...: In : def class_pattern(obj): ...: match obj: ...: case Point(x, y): ...: print (f 'Point({x=},{y=})' ) ...: In : class_pattern(Point( 1 , 2 )) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - TypeError Traceback (most recent call last) Input In [], in <cell line: 1 >() - - - - > 1 class_pattern(Point( 1 , 2 )) Input In [], in class_pattern(obj) 1 def class_pattern(obj): 2 match obj: - - - - > 3 case Point(x, y): 4 print (f 'Point({x=},{y=})' ) TypeError: Point() accepts 0 positional sub - patterns ( 2 given) # 这是因为对于匹配来说, 位置需要确定 ,所以需要使用位置参数来标识: In : def class_pattern(obj): ...: match obj: ...: case Point(x = 1 , y = 2 ): ...: print (f 'match' ) ...: In : class_pattern(Point( 1 , 2 )) match # 另外一个解决这种自定义类不用位置参数的匹配方案,使用 __match_args__ 返回一个位置参数的数组, # 就像这样: In : class Point: ...: __match_args__ = ( 'x' , 'y' ) ...: ...: def __init__( self , x, y): ...: self .x = x ...: self .y = y ...: # 还有就是用dataclass, 这里的 Point2 使用了标准库的 dataclasses.dataclass 装饰器 # 它会提供 __match_args__ 属性,所以可以直接用 In : from dataclasses import dataclass In : @dataclass ...: class Point2: ...: x: int ...: y: int ...: In : def class_pattern(obj): ...: match obj: ...: case Point(x, y): ...: print (f 'Point({x=},{y=})' ) ...: case Point2(x, y): ...: print (f 'Point2({x=},{y=})' ) ...: In : class_pattern(Point( 1 , 2 )) Point(x = 1 ,y = 2 ) In : class_pattern(Point2( 1 , 2 )) Point2(x = 1 ,y = 2 ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | def eval_expr(expr): """Evaluate an expression and return the result.""" match expr: case BinaryOp( '+' , left, right): return eval_expr(left) + eval_expr(right) case BinaryOp( '-' , left, right): return eval_expr(left) - eval_expr(right) case BinaryOp( '*' , left, right): return eval_expr(left) * eval_expr(right) case BinaryOp( '/' , left, right): return eval_expr(left) / eval_expr(right) case UnaryOp( '+' , arg): return eval_expr(arg) case UnaryOp( '-' , arg): return - eval_expr(arg) case VarExpr(name): raise ValueError(f "Unknown value of: {name}" ) case float () | int (): return expr case _: raise ValueError(f "Invalid expression value: {repr(expr)}" ) |
另外一个例子:
1 2 3 4 5 6 7 8 9 | match media_object: case Image( type = "jpg" ): return media_object case Image( type = "png" ) | Image( type = "gif" ): return render_as(media_object, "jpg" ) case Video(): raise ValueError( "Can't extract frames from video yet" ) case other_type: raise Exception(f "Media type {media_object} can't be handled yet" ) |
namedtuple 例子,也属于是 class pattern:
1 2 3 4 5 | from collections import namedtuple Mov = namedtuple( 'mov' , [ 'dst' , 'src' , 'sz' , 'ridx' ]) switch op: case Mov(dst, src, 8 , ridx): pass |
Type Unions, Aliases, and Guards
numbers 的类型指定为 List,元素类型可以是 float 或 int。
1 2 | def mean(numbers: list [ float | int ]) - > float : return sum (numbers) / len (numbers) |
可以定义类型别名,类型检查器和程序员都可以识别到这种模式:
1 2 3 4 | from typing import TypeAlias Card: TypeAlias = tuple [ str , str ] # ('', '') Deck: TypeAlias = list [Card] # [('', '')] |
Type guards 用于缩小 type union 的范围。
new disassembler of 2020GKCTF-EzMachine
一般这种disassembler都是逐渐去优化的,优化到最后能使用https://docs.pwntools.com/en/stable/asm.html#pwnlib.asm.make_elf_from_assembly
直接装配成一个elf
1:建立指令类型,写出parse
Opcode | Instruction | Notes |
---|---|---|
0 | nop | pc+=1 |
1 | mov dst, imm | imm is an 1-byte immediate value |
2 | push imm | imm is an 1-byte immediate value |
3 | push reg | put (1-byte)reg in stack |
4 | pop reg | pop (1-byte) from stack to reg |
5 | PrintStr | print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker’ |
6 | add reg1, reg2 | reg1 += reg2 |
7 | sub reg1, reg2 | reg1 -= reg2 |
8 | mul reg1, reg2 | reg1 *= reg2 |
9 | div reg1, reg2 | reg1 /= reg2 (Put the quotient into eax and the remainder into ebx) |
10 | xor reg1, reg2 | reg1 ^= reg2 |
11 | jmp addr | directly jump to address |
12 | cmp reg1, reg2 | edx=reg1-reg2 |
13 | jz addr | jump if edx==0(reg1==reg2) |
14 | jnz addr | jump if edx!=0(reg1≠reg2) |
15 | jg addr | jump if edx>0(reg1>reg2) |
16 | jl addr | jump if edx<0 (reg1 < reg2) |
17 | InputStr | gets(mem); eax=strlen(mem); |
18 | InitMem | memset(mem_addr, 0, sz) |
19 | MovRegStack | mov reg, [ebp-src] |
20 | MovRegMem | mov reg, mem[src] |
0xff | Exit | exit(0) |
Ezmachine-disassembler-parsefunc.py
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170from
collections
import
namedtuple
from
dataclasses
import
dataclass
@dataclass
class
Regs(
object
):
idx:
int
def
__repr__(
self
):
if
self
.idx
=
=
0
:
return
"eax"
elif
self
.idx
=
=
1
:
return
"ebx"
elif
self
.idx
=
=
2
:
return
"ecx"
elif
self
.idx
=
=
3
:
return
"edx"
else
:
return
"unknown reg {}"
.
format
(
self
.idx)
Nop
=
namedtuple(
"Nop"
, [
"addr"
])
# case 0: nop
MovReg
=
namedtuple(
"MovReg"
, [
"addr"
,
"dst"
,
"imm"
])
# case 1: mov reg, imm
PushImm
=
namedtuple(
"PushImm"
, [
"addr"
,
"imm"
])
# case 2: push imm
PushReg
=
namedtuple(
"PushReg"
, [
"addr"
,
"reg"
])
# case 3: push reg
PopReg
=
namedtuple(
"PopReg"
, [
"addr"
,
"reg"
])
# case 4: pop reg
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr
=
namedtuple(
"PrintStr"
, [
"addr"
])
AddReg
=
namedtuple(
"AddReg"
, [
"addr"
,
"dst"
,
"src"
])
# case 6: add reg, reg
SubReg
=
namedtuple(
"SubReg"
, [
"addr"
,
"dst"
,
"src"
])
# case 7: sub reg, reg
MulReg
=
namedtuple(
"MulReg"
, [
"addr"
,
"dst"
,
"src"
])
# case 8: mul reg, reg
DivReg
=
namedtuple(
"DivReg"
, [
"addr"
,
"dst"
,
"src"
])
# case 9: div reg, reg
XorReg
=
namedtuple(
"XorReg"
, [
"addr"
,
"dst"
,
"src"
])
# case 10: xor reg, reg
Jmp
=
namedtuple(
"Jmp"
, [
"addr"
,
"target"
])
# case 11: jmp addr
Cmp
=
namedtuple(
"Cmp"
, [
"addr"
,
"dst"
,
"src"
])
# case 12: cmp reg, reg
Jz
=
namedtuple(
"Jz"
, [
"addr"
,
"target"
])
# case 13: jz addr
Jnz
=
namedtuple(
"Jnz"
, [
"addr"
,
"target"
])
# case 14: jnz addr
Jg
=
namedtuple(
"Jg"
, [
"addr"
,
"target"
])
# case 15: jg addr
Jl
=
namedtuple(
"Jl"
, [
"addr"
,
"target"
])
# case 16: jl addr
# case 17: gets(mem); eax=strlen(mem);
InputStr
=
namedtuple(
"InputStr"
, [
"addr"
])
InitMem
=
namedtuple(
"InitMem"
, [
"addr"
,
"mem_addr"
,
"sz"
]
)
# case 18: memset(mem_addr, 0, sz)
MovRegStack
=
namedtuple(
"MovRegStack"
, [
"addr"
,
"dst"
,
"src"
]
)
# case 19: mov reg, [ebp-src]
MovRegMem
=
namedtuple(
"MovRegMem"
, [
"addr"
,
"dst"
,
"src"
]
)
# case 20: mov reg, mem[src]
Exit
=
namedtuple(
"Exit"
, [
"addr"
])
# case 0xff: exit(0)
def
parse(
buffer
):
instructions
=
[]
pc
=
0
while
pc <
len
(
buffer
):
opcode
=
buffer
[pc]
match opcode:
case
0
:
instructions.append(Nop(pc))
pc
+
=
1
case
1
:
dst
=
buffer
[pc
+
1
]
imm
=
buffer
[pc
+
2
]
instructions.append(MovReg(pc, Regs(dst), imm))
pc
+
=
3
case
2
:
imm
=
buffer
[pc
+
1
]
instructions.append(PushImm(pc, imm))
pc
+
=
3
case
3
:
reg
=
buffer
[pc
+
1
]
instructions.append(PushReg(pc, Regs(reg)))
pc
+
=
3
case
4
:
reg
=
buffer
[pc
+
1
]
instructions.append(PopReg(pc, Regs(reg)))
pc
+
=
3
case
5
:
instructions.append(PrintStr(pc))
pc
+
=
3
case
6
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(AddReg(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
7
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(SubReg(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
8
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(MulReg(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
9
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(DivReg(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
10
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(XorReg(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
11
:
target
=
3
*
buffer
[pc
+
1
]
-
3
instructions.append(Jmp(pc, target))
pc
+
=
3
case
12
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(
Cmp
(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
13
:
target
=
3
*
buffer
[pc
+
1
]
-
3
instructions.append(Jz(pc, target))
pc
+
=
3
case
14
:
target
=
3
*
buffer
[pc
+
1
]
-
3
instructions.append(Jnz(pc, target))
pc
+
=
3
case
15
:
target
=
3
*
buffer
[pc
+
1
]
-
3
instructions.append(Jg(pc, target))
pc
+
=
3
case
16
:
target
=
3
*
buffer
[pc
+
1
]
-
3
instructions.append(Jl(pc, target))
pc
+
=
3
case
17
:
instructions.append(InputStr(pc))
pc
+
=
3
case
18
:
mem_addr
=
buffer
[pc
+
1
]
sz
=
buffer
[pc
+
2
]
instructions.append(InitMem(pc, mem_addr, sz))
pc
+
=
3
case
19
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(MovRegStack(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
20
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(MovRegMem(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
255
:
instructions.append(Exit(pc))
pc
+
=
3
case _:
raise
Exception(f
"unknown opcode: {opcode} at {pc}"
)
break
return
instructions
if
__name__
=
=
'__main__'
:
opcode
=
[
0x01
,
0x03
,
0x03
,
0x05
,
0x00
,
0x00
,
0x11
,
0x00
,
0x00
,
0x01
,
0x01
,
0x11
,
0x0C
,
0x00
,
0x01
,
0x0D
,
0x0A
,
0x00
,
0x01
,
0x03
,
0x01
,
0x05
,
0x00
,
0x00
,
0xFF
,
0x00
,
0x00
,
0x01
,
0x02
,
0x00
,
0x01
,
0x00
,
0x11
,
0x0C
,
0x00
,
0x02
,
0x0D
,
0x2B
,
0x00
,
0x14
,
0x00
,
0x02
,
0x01
,
0x01
,
0x61
,
0x0C
,
0x00
,
0x01
,
0x10
,
0x1A
,
0x00
,
0x01
,
0x01
,
0x7A
,
0x0C
,
0x00
,
0x01
,
0x0F
,
0x1A
,
0x00
,
0x01
,
0x01
,
0x47
,
0x0A
,
0x00
,
0x01
,
0x01
,
0x01
,
0x01
,
0x06
,
0x00
,
0x01
,
0x0B
,
0x24
,
0x00
,
0x01
,
0x01
,
0x41
,
0x0C
,
0x00
,
0x01
,
0x10
,
0x24
,
0x00
,
0x01
,
0x01
,
0x5A
,
0x0C
,
0x00
,
0x01
,
0x0F
,
0x24
,
0x00
,
0x01
,
0x01
,
0x4B
,
0x0A
,
0x00
,
0x01
,
0x01
,
0x01
,
0x01
,
0x07
,
0x00
,
0x01
,
0x01
,
0x01
,
0x10
,
0x09
,
0x00
,
0x01
,
0x03
,
0x01
,
0x00
,
0x03
,
0x00
,
0x00
,
0x01
,
0x01
,
0x01
,
0x06
,
0x02
,
0x01
,
0x0B
,
0x0B
,
0x00
,
0x02
,
0x07
,
0x00
,
0x02
,
0x0D
,
0x00
,
0x02
,
0x00
,
0x00
,
0x02
,
0x05
,
0x00
,
0x02
,
0x01
,
0x00
,
0x02
,
0x0C
,
0x00
,
0x02
,
0x01
,
0x00
,
0x02
,
0x00
,
0x00
,
0x02
,
0x00
,
0x00
,
0x02
,
0x0D
,
0x00
,
0x02
,
0x05
,
0x00
,
0x02
,
0x0F
,
0x00
,
0x02
,
0x00
,
0x00
,
0x02
,
0x09
,
0x00
,
0x02
,
0x05
,
0x00
,
0x02
,
0x0F
,
0x00
,
0x02
,
0x03
,
0x00
,
0x02
,
0x00
,
0x00
,
0x02
,
0x02
,
0x00
,
0x02
,
0x05
,
0x00
,
0x02
,
0x03
,
0x00
,
0x02
,
0x03
,
0x00
,
0x02
,
0x01
,
0x00
,
0x02
,
0x07
,
0x00
,
0x02
,
0x07
,
0x00
,
0x02
,
0x0B
,
0x00
,
0x02
,
0x02
,
0x00
,
0x02
,
0x01
,
0x00
,
0x02
,
0x02
,
0x00
,
0x02
,
0x07
,
0x00
,
0x02
,
0x02
,
0x00
,
0x02
,
0x0C
,
0x00
,
0x02
,
0x02
,
0x00
,
0x02
,
0x02
,
0x00
,
0x01
,
0x02
,
0x01
,
0x13
,
0x01
,
0x02
,
0x04
,
0x00
,
0x00
,
0x0C
,
0x00
,
0x01
,
0x0E
,
0x5B
,
0x00
,
0x01
,
0x01
,
0x22
,
0x0C
,
0x02
,
0x01
,
0x0D
,
0x59
,
0x00
,
0x01
,
0x01
,
0x01
,
0x06
,
0x02
,
0x01
,
0x0B
,
0x4E
,
0x00
,
0x01
,
0x03
,
0x00
,
0x05
,
0x00
,
0x00
,
0xFF
,
0x00
,
0x00
,
0x01
,
0x03
,
0x01
,
0x05
,
0x00
,
0x00
,
0xFF
,
0x00
,
0x00
,
0x00
]
instructions
=
parse(opcode)
for
ins
in
instructions:
print
(ins)
Ezmachine-disassembler-parsefunc.out
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394MovReg(addr
=
0
, dst
=
edx, imm
=
3
)
PrintStr(addr
=
3
)
InputStr(addr
=
6
)
MovReg(addr
=
9
, dst
=
ebx, imm
=
17
)
Cmp
(addr
=
12
, dst
=
eax, src
=
ebx)
Jz(addr
=
15
, target
=
27
)
MovReg(addr
=
18
, dst
=
edx, imm
=
1
)
PrintStr(addr
=
21
)
Exit(addr
=
24
)
MovReg(addr
=
27
, dst
=
ecx, imm
=
0
)
MovReg(addr
=
30
, dst
=
eax, imm
=
17
)
Cmp
(addr
=
33
, dst
=
eax, src
=
ecx)
Jz(addr
=
36
, target
=
126
)
MovRegMem(addr
=
39
, dst
=
eax, src
=
ecx)
MovReg(addr
=
42
, dst
=
ebx, imm
=
97
)
Cmp
(addr
=
45
, dst
=
eax, src
=
ebx)
Jl(addr
=
48
, target
=
75
)
MovReg(addr
=
51
, dst
=
ebx, imm
=
122
)
Cmp
(addr
=
54
, dst
=
eax, src
=
ebx)
Jg(addr
=
57
, target
=
75
)
MovReg(addr
=
60
, dst
=
ebx, imm
=
71
)
XorReg(addr
=
63
, dst
=
eax, src
=
ebx)
MovReg(addr
=
66
, dst
=
ebx, imm
=
1
)
AddReg(addr
=
69
, dst
=
eax, src
=
ebx)
Jmp(addr
=
72
, target
=
105
)
MovReg(addr
=
75
, dst
=
ebx, imm
=
65
)
Cmp
(addr
=
78
, dst
=
eax, src
=
ebx)
Jl(addr
=
81
, target
=
105
)
MovReg(addr
=
84
, dst
=
ebx, imm
=
90
)
Cmp
(addr
=
87
, dst
=
eax, src
=
ebx)
Jg(addr
=
90
, target
=
105
)
MovReg(addr
=
93
, dst
=
ebx, imm
=
75
)
XorReg(addr
=
96
, dst
=
eax, src
=
ebx)
MovReg(addr
=
99
, dst
=
ebx, imm
=
1
)
SubReg(addr
=
102
, dst
=
eax, src
=
ebx)
MovReg(addr
=
105
, dst
=
ebx, imm
=
16
)
DivReg(addr
=
108
, dst
=
eax, src
=
ebx)
PushReg(addr
=
111
, reg
=
ebx)
PushReg(addr
=
114
, reg
=
eax)
MovReg(addr
=
117
, dst
=
ebx, imm
=
1
)
AddReg(addr
=
120
, dst
=
ecx, src
=
ebx)
Jmp(addr
=
123
, target
=
30
)
PushImm(addr
=
126
, imm
=
7
)
PushImm(addr
=
129
, imm
=
13
)
PushImm(addr
=
132
, imm
=
0
)
PushImm(addr
=
135
, imm
=
5
)
PushImm(addr
=
138
, imm
=
1
)
PushImm(addr
=
141
, imm
=
12
)
PushImm(addr
=
144
, imm
=
1
)
PushImm(addr
=
147
, imm
=
0
)
PushImm(addr
=
150
, imm
=
0
)
PushImm(addr
=
153
, imm
=
13
)
PushImm(addr
=
156
, imm
=
5
)
PushImm(addr
=
159
, imm
=
15
)
PushImm(addr
=
162
, imm
=
0
)
PushImm(addr
=
165
, imm
=
9
)
PushImm(addr
=
168
, imm
=
5
)
PushImm(addr
=
171
, imm
=
15
)
PushImm(addr
=
174
, imm
=
3
)
PushImm(addr
=
177
, imm
=
0
)
PushImm(addr
=
180
, imm
=
2
)
PushImm(addr
=
183
, imm
=
5
)
PushImm(addr
=
186
, imm
=
3
)
PushImm(addr
=
189
, imm
=
3
)
PushImm(addr
=
192
, imm
=
1
)
PushImm(addr
=
195
, imm
=
7
)
PushImm(addr
=
198
, imm
=
7
)
PushImm(addr
=
201
, imm
=
11
)
PushImm(addr
=
204
, imm
=
2
)
PushImm(addr
=
207
, imm
=
1
)
PushImm(addr
=
210
, imm
=
2
)
PushImm(addr
=
213
, imm
=
7
)
PushImm(addr
=
216
, imm
=
2
)
PushImm(addr
=
219
, imm
=
12
)
PushImm(addr
=
222
, imm
=
2
)
PushImm(addr
=
225
, imm
=
2
)
MovReg(addr
=
228
, dst
=
ecx, imm
=
1
)
MovRegStack(addr
=
231
, dst
=
ebx, src
=
ecx)
PopReg(addr
=
234
, reg
=
eax)
Cmp
(addr
=
237
, dst
=
eax, src
=
ebx)
Jnz(addr
=
240
, target
=
270
)
MovReg(addr
=
243
, dst
=
ebx, imm
=
34
)
Cmp
(addr
=
246
, dst
=
ecx, src
=
ebx)
Jz(addr
=
249
, target
=
264
)
MovReg(addr
=
252
, dst
=
ebx, imm
=
1
)
AddReg(addr
=
255
, dst
=
ecx, src
=
ebx)
Jmp(addr
=
258
, target
=
231
)
MovReg(addr
=
261
, dst
=
edx, imm
=
0
)
PrintStr(addr
=
264
)
Exit(addr
=
267
)
MovReg(addr
=
270
, dst
=
edx, imm
=
1
)
PrintStr(addr
=
273
)
Exit(addr
=
276
)
Nop(addr
=
279
)
拿parsefunc.out的原因是检查parse及指定类型定义是否合理。
2:编写初步dump
Ezmachine-disassembler-version0.py
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220from
collections
import
namedtuple
from
dataclasses
import
dataclass
@dataclass
class
Regs(
object
):
idx:
int
def
__repr__(
self
):
if
self
.idx
=
=
0
:
return
"eax"
elif
self
.idx
=
=
1
:
return
"ebx"
elif
self
.idx
=
=
2
:
return
"ecx"
elif
self
.idx
=
=
3
:
return
"edx"
else
:
return
"unknown reg {}"
.
format
(
self
.idx)
Nop
=
namedtuple(
"Nop"
, [
"addr"
])
# case 0: nop
MovReg
=
namedtuple(
"MovReg"
, [
"addr"
,
"dst"
,
"imm"
])
# case 1: mov reg, imm
PushImm
=
namedtuple(
"PushImm"
, [
"addr"
,
"imm"
])
# case 2: push imm
PushReg
=
namedtuple(
"PushReg"
, [
"addr"
,
"reg"
])
# case 3: push reg
PopReg
=
namedtuple(
"PopReg"
, [
"addr"
,
"reg"
])
# case 4: pop reg
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr
=
namedtuple(
"PrintStr"
, [
"addr"
])
AddReg
=
namedtuple(
"AddReg"
, [
"addr"
,
"dst"
,
"src"
])
# case 6: add reg, reg
SubReg
=
namedtuple(
"SubReg"
, [
"addr"
,
"dst"
,
"src"
])
# case 7: sub reg, reg
MulReg
=
namedtuple(
"MulReg"
, [
"addr"
,
"dst"
,
"src"
])
# case 8: mul reg, reg
DivReg
=
namedtuple(
"DivReg"
, [
"addr"
,
"dst"
,
"src"
])
# case 9: div reg, reg
XorReg
=
namedtuple(
"XorReg"
, [
"addr"
,
"dst"
,
"src"
])
# case 10: xor reg, reg
Jmp
=
namedtuple(
"Jmp"
, [
"addr"
,
"target"
])
# case 11: jmp addr
Cmp
=
namedtuple(
"Cmp"
, [
"addr"
,
"dst"
,
"src"
])
# case 12: cmp reg, reg
Jz
=
namedtuple(
"Jz"
, [
"addr"
,
"target"
])
# case 13: jz addr
Jnz
=
namedtuple(
"Jnz"
, [
"addr"
,
"target"
])
# case 14: jnz addr
Jg
=
namedtuple(
"Jg"
, [
"addr"
,
"target"
])
# case 15: jg addr
Jl
=
namedtuple(
"Jl"
, [
"addr"
,
"target"
])
# case 16: jl addr
# case 17: gets(mem); eax=strlen(mem);
InputStr
=
namedtuple(
"InputStr"
, [
"addr"
])
InitMem
=
namedtuple(
"InitMem"
, [
"addr"
,
"mem_addr"
,
"sz"
]
)
# case 18: memset(mem_addr, 0, sz)
MovRegStack
=
namedtuple(
"MovRegStack"
, [
"addr"
,
"dst"
,
"src"
]
)
# case 19: mov reg, [ebp-src]
MovRegMem
=
namedtuple(
"MovRegMem"
, [
"addr"
,
"dst"
,
"src"
]
)
# case 20: mov reg, mem[src]
Exit
=
namedtuple(
"Exit"
, [
"addr"
])
# case 0xff: exit(0)
def
parse(
buffer
):
instructions
=
[]
pc
=
0
while
pc <
len
(
buffer
):
opcode
=
buffer
[pc]
match opcode:
case
0
:
instructions.append(Nop(pc))
pc
+
=
1
case
1
:
dst
=
buffer
[pc
+
1
]
imm
=
buffer
[pc
+
2
]
instructions.append(MovReg(pc, Regs(dst), imm))
pc
+
=
3
case
2
:
imm
=
buffer
[pc
+
1
]
instructions.append(PushImm(pc, imm))
pc
+
=
3
case
3
:
reg
=
buffer
[pc
+
1
]
instructions.append(PushReg(pc, Regs(reg)))
pc
+
=
3
case
4
:
reg
=
buffer
[pc
+
1
]
instructions.append(PopReg(pc, Regs(reg)))
pc
+
=
3
case
5
:
instructions.append(PrintStr(pc))
pc
+
=
3
case
6
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(AddReg(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
7
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(SubReg(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
8
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(MulReg(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
9
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(DivReg(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
10
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(XorReg(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
11
:
target
=
3
*
buffer
[pc
+
1
]
-
3
instructions.append(Jmp(pc, target))
pc
+
=
3
case
12
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(
Cmp
(pc, Regs(dst), Regs(src)))
pc
+
=
3
case
13
:
target
=
3
*
buffer
[pc
+
1
]
-
3
instructions.append(Jz(pc, target))
pc
+
=
3
case
14
:
target
=
3
*
buffer
[pc
+
1
]
-
3
instructions.append(Jnz(pc, target))
pc
+
=
3
case
15
:
target
=
3
*
buffer
[pc
+
1
]
-
3
instructions.append(Jg(pc, target))
pc
+
=
3
case
16
:
target
=
3
*
buffer
[pc
+
1
]
-
3
instructions.append(Jl(pc, target))
pc
+
=
3
case
17
:
instructions.append(InputStr(pc))
pc
+
=
3
case
18
:
mem_addr
=
buffer
[pc
+
1
]
sz
=
buffer
[pc
+
2
]
instructions.append(InitMem(pc, mem_addr, sz))
pc
+
=
3
case
19
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(MovRegStack(pc, Regs(dst), src))
pc
+
=
3
case
20
:
dst
=
buffer
[pc
+
1
]
src
=
buffer
[pc
+
2
]
instructions.append(MovRegMem(pc, Regs(dst), src))
pc
+
=
3
case
255
:
instructions.append(Exit(pc))
pc
+
=
3
case _:
raise
Exception(f
"unknown opcode: {opcode} at {pc}"
)
break
return
instructions
def
dump(instructions):
for
ins
in
instructions:
match ins:
case Nop(addr):
print
(f
"_0x{addr:04x}: nop"
)
case MovReg(addr, dst, imm):
print
(f
"_0x{addr:04x}: mov {dst}, 0x{imm:02x}"
)
case PushImm(addr, imm):
print
(f
"_0x{addr:04x}: push 0x{imm:02x}"
)
case PushReg(addr, reg):
print
(f
"_0x{addr:04x}: push {reg}"
)
case PopReg(addr, reg):
print
(f
"_0x{addr:04x}: pop {reg}"
)
case PrintStr(addr):
print
(f
"_0x{addr:04x}: print_str"
)
case AddReg(addr, dst, src):
print
(f
"_0x{addr:04x}: add {dst}, {src}"
)
case SubReg(addr, dst, src):
print
(f
"_0x{addr:04x}: sub {dst}, {src}"
)
case MulReg(addr, dst, src):
print
(f
"_0x{addr:04x}: mul {dst}, {src}"
)
case DivReg(addr, dst, src):
print
(f
"_0x{addr:04x}: div {dst}, {src}"
)
case XorReg(addr, dst, src):
print
(f
"_0x{addr:04x}: xor {dst}, {src}"
)
case Jmp(addr, target):
print
(f
"_0x{addr:04x}: jmp _0x{target:04x}"
)
case
Cmp
(addr, dst, src):
print
(f
"_0x{addr:04x}: cmp {dst}, {src}"
)
case Jz(addr, target):
print
(f
"_0x{addr:04x}: jz _0x{target:04x}"
)
case Jnz(addr, target):
print
(f
"_0x{addr:04x}: jnz _0x{target:04x}"
)
case Jg(addr, target):
print
(f
"_0x{addr:04x}: jg _0x{target:04x}"
)
case Jl(addr, target):
print
(f
"_0x{addr:04x}: jl _0x{target:04x}"
)
case InputStr(addr):
print
(f
"_0x{addr:04x}: input_str"
)
case InitMem(addr, mem_addr, sz):
print
(f
"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})"
)
case MovRegStack(addr, dst, src):
print
(f
"_0x{addr:04x}: mov {dst}, [ebp-{src}]"
)
case MovRegMem(addr, dst, src):
print
(f
"_0x{addr:04x}: mov {dst}, mem[{src}]"
)
case Exit(addr):
print
(f
"_0x{addr:04x}: exit(0)"
)
case _:
raise
Exception(f
"unknown instruction: {ins}"
)
break
if
__name__
=
=
'__main__'
:
opcode
=
[
0x01
,
0x03
,
0x03
,
0x05
,
0x00
,
0x00
,
0x11
,
0x00
,
0x00
,
0x01
,
0x01
,
0x11
,
0x0C
,
0x00
,
0x01
,
0x0D
,
0x0A
,
0x00
,
0x01
,
0x03
,
0x01
,
0x05
,
0x00
,
0x00
,
0xFF
,
0x00
,
0x00
,
0x01
,
0x02
,
0x00
,
0x01
,
0x00
,
0x11
,
0x0C
,
0x00
,
0x02
,
0x0D
,
0x2B
,
0x00
,
0x14
,
0x00
,
0x02
,
0x01
,
0x01
,
0x61
,
0x0C
,
0x00
,
0x01
,
0x10
,
0x1A
,
0x00
,
0x01
,
0x01
,
0x7A
,
0x0C
,
0x00
,
0x01
,
0x0F
,
0x1A
,
0x00
,
0x01
,
0x01
,
0x47
,
0x0A
,
0x00
,
0x01
,
0x01
,
0x01
,
0x01
,
0x06
,
0x00
,
0x01
,
0x0B
,
0x24
,
0x00
,
0x01
,
0x01
,
0x41
,
0x0C
,
0x00
,
0x01
,
0x10
,
0x24
,
0x00
,
0x01
,
0x01
,
0x5A
,
0x0C
,
0x00
,
0x01
,
0x0F
,
0x24
,
0x00
,
0x01
,
0x01
,
0x4B
,
0x0A
,
0x00
,
0x01
,
0x01
,
0x01
,
0x01
,
0x07
,
0x00
,
0x01
,
0x01
,
0x01
,
0x10
,
0x09
,
0x00
,
0x01
,
0x03
,
0x01
,
0x00
,
0x03
,
0x00
,
0x00
,
0x01
,
0x01
,
0x01
,
0x06
,
0x02
,
0x01
,
0x0B
,
0x0B
,
0x00
,
0x02
,
0x07
,
0x00
,
0x02
,
0x0D
,
0x00
,
0x02
,
0x00
,
0x00
,
0x02
,
0x05
,
0x00
,
0x02
,
0x01
,
0x00
,
0x02
,
0x0C
,
0x00
,
0x02
,
0x01
,
0x00
,
0x02
,
0x00
,
0x00
,
0x02
,
0x00
,
0x00
,
0x02
,
0x0D
,
0x00
,
0x02
,
0x05
,
0x00
,
0x02
,
0x0F
,
0x00
,
0x02
,
0x00
,
0x00
,
0x02
,
0x09
,
0x00
,
0x02
,
0x05
,
0x00
,
0x02
,
0x0F
,
0x00
,
0x02
,
0x03
,
0x00
,
0x02
,
0x00
,
0x00
,
0x02
,
0x02
,
0x00
,
0x02
,
0x05
,
0x00
,
0x02
,
0x03
,
0x00
,
0x02
,
0x03
,
0x00
,
0x02
,
0x01
,
0x00
,
0x02
,
0x07
,
0x00
,
0x02
,
0x07
,
0x00
,
0x02
,
0x0B
,
0x00
,
0x02
,
0x02
,
0x00
,
0x02
,
0x01
,
0x00
,
0x02
,
0x02
,
0x00
,
0x02
,
0x07
,
0x00
,
0x02
,
0x02
,
0x00
,
0x02
,
0x0C
,
0x00
,
0x02
,
0x02
,
0x00
,
0x02
,
0x02
,
0x00
,
0x01
,
0x02
,
0x01
,
0x13
,
0x01
,
0x02
,
0x04
,
0x00
,
0x00
,
0x0C
,
0x00
,
0x01
,
0x0E
,
0x5B
,
0x00
,
0x01
,
0x01
,
0x22
,
0x0C
,
0x02
,
0x01
,
0x0D
,
0x59
,
0x00
,
0x01
,
0x01
,
0x01
,
0x06
,
0x02
,
0x01
,
0x0B
,
0x4E
,
0x00
,
0x01
,
0x03
,
0x00
,
0x05
,
0x00
,
0x00
,
0xFF
,
0x00
,
0x00
,
0x01
,
0x03
,
0x01
,
0x05
,
0x00
,
0x00
,
0xFF
,
0x00
,
0x00
,
0x00
]
instructions
=
parse(opcode)
dump(instructions)
Ezmachine-disassembler-dumpfunc-version0.out
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394_0x0000: mov edx,
0x03
_0x0003: print_str
_0x0006: input_str
_0x0009: mov ebx,
0x11
_0x000c:
cmp
eax, ebx
_0x000f: jz _0x001b
_0x0012: mov edx,
0x01
_0x0015: print_str
_0x0018: exit(
0
)
_0x001b: mov ecx,
0x00
_0x001e: mov eax,
0x11
_0x0021:
cmp
eax, ecx
_0x0024: jz _0x007e
_0x0027: mov eax, mem[
2
]
_0x002a: mov ebx,
0x61
_0x002d:
cmp
eax, ebx
_0x0030: jl _0x004b
_0x0033: mov ebx,
0x7a
_0x0036:
cmp
eax, ebx
_0x0039: jg _0x004b
_0x003c: mov ebx,
0x47
_0x003f: xor eax, ebx
_0x0042: mov ebx,
0x01
_0x0045: add eax, ebx
_0x0048: jmp _0x0069
_0x004b: mov ebx,
0x41
_0x004e:
cmp
eax, ebx
_0x0051: jl _0x0069
_0x0054: mov ebx,
0x5a
_0x0057:
cmp
eax, ebx
_0x005a: jg _0x0069
_0x005d: mov ebx,
0x4b
_0x0060: xor eax, ebx
_0x0063: mov ebx,
0x01
_0x0066: sub eax, ebx
_0x0069: mov ebx,
0x10
_0x006c: div eax, ebx
_0x006f: push ebx
_0x0072: push eax
_0x0075: mov ebx,
0x01
_0x0078: add ecx, ebx
_0x007b: jmp _0x001e
_0x007e: push
0x07
_0x0081: push
0x0d
_0x0084: push
0x00
_0x0087: push
0x05
_0x008a: push
0x01
_0x008d: push
0x0c
_0x0090: push
0x01
_0x0093: push
0x00
_0x0096: push
0x00
_0x0099: push
0x0d
_0x009c: push
0x05
_0x009f: push
0x0f
_0x00a2: push
0x00
_0x00a5: push
0x09
_0x00a8: push
0x05
_0x00ab: push
0x0f
_0x00ae: push
0x03
_0x00b1: push
0x00
_0x00b4: push
0x02
_0x00b7: push
0x05
_0x00ba: push
0x03
_0x00bd: push
0x03
_0x00c0: push
0x01
_0x00c3: push
0x07
_0x00c6: push
0x07
_0x00c9: push
0x0b
_0x00cc: push
0x02
_0x00cf: push
0x01
_0x00d2: push
0x02
_0x00d5: push
0x07
_0x00d8: push
0x02
_0x00db: push
0x0c
_0x00de: push
0x02
_0x00e1: push
0x02
_0x00e4: mov ecx,
0x01
_0x00e7: mov ebx, [ebp
-
2
]
_0x00ea: pop eax
_0x00ed:
cmp
eax, ebx
_0x00f0: jnz _0x010e
_0x00f3: mov ebx,
0x22
_0x00f6:
cmp
ecx, ebx
_0x00f9: jz _0x0108
_0x00fc: mov ebx,
0x01
_0x00ff: add ecx, ebx
_0x0102: jmp _0x00e7
_0x0105: mov edx,
0x00
_0x0108: print_str
_0x010b: exit(
0
)
_0x010e: mov edx,
0x01
_0x0111: print_str
_0x0114: exit(
0
)
_0x0117: nop
其实这里拿到的Ezmachine-disassembler-dumpfunc-version0.out,就跟以前我们的disassembler得到的差不多。
拿这个dumpfunc-version0.out的目的,就是为了参考这个去做优化。
3:优化
- (1) 添加函数头尾
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ![图片描述](upload / attach / 202311 / 921830_QUTPQFYHWRH7AJ4 .webp) 由于头和尾都是直接开始的指令,没有栈帧,我们为其添加 ```python from collections import namedtuple from dataclasses import dataclass ...... # 优化(1): 添加main函数序言和结尾 prologue = namedtuple( "prologue" , []) epilogue = namedtuple( "epilogue" , []) def add_main_prologue_epilogue(instructions): instructions.insert( 0 , prologue()) instructions.append(epilogue()) return instructions def dump(instructions): for ins in instructions: match ins: case prologue(): print (f "push ebp" ) print (f "mov ebp, esp" ) case epilogue(): print (f "mov esp, ebp" ) print (f "pop ebp" ) print (f "ret" ) ...... case _: raise Exception(f "unknown instruction: {ins}" ) break if __name__ = = '__main__' : opcode = [ 0x01 , 0x03 , 0x03 , 0x05 , 0x00 , 0x00 , 0x11 , 0x00 , 0x00 , 0x01 , 0x01 , 0x11 , 0x0C , 0x00 , 0x01 , 0x0D , 0x0A , 0x00 , 0x01 , 0x03 , 0x01 , 0x05 , 0x00 , 0x00 , 0xFF , 0x00 , 0x00 , 0x01 , 0x02 , 0x00 , 0x01 , 0x00 , 0x11 , 0x0C , 0x00 , 0x02 , 0x0D , 0x2B , 0x00 , 0x14 , 0x00 , 0x02 , 0x01 , 0x01 , 0x61 , 0x0C , 0x00 , 0x01 , 0x10 , 0x1A , 0x00 , 0x01 , 0x01 , 0x7A , 0x0C , 0x00 , 0x01 , 0x0F , 0x1A , 0x00 , 0x01 , 0x01 , 0x47 , 0x0A , 0x00 , 0x01 , 0x01 , 0x01 , 0x01 , 0x06 , 0x00 , 0x01 , 0x0B , 0x24 , 0x00 , 0x01 , 0x01 , 0x41 , 0x0C , 0x00 , 0x01 , 0x10 , 0x24 , 0x00 , 0x01 , 0x01 , 0x5A , 0x0C , 0x00 , 0x01 , 0x0F , 0x24 , 0x00 , 0x01 , 0x01 , 0x4B , 0x0A , 0x00 , 0x01 , 0x01 , 0x01 , 0x01 , 0x07 , 0x00 , 0x01 , 0x01 , 0x01 , 0x10 , 0x09 , 0x00 , 0x01 , 0x03 , 0x01 , 0x00 , 0x03 , 0x00 , 0x00 , 0x01 , 0x01 , 0x01 , 0x06 , 0x02 , 0x01 , 0x0B , 0x0B , 0x00 , 0x02 , 0x07 , 0x00 , 0x02 , 0x0D , 0x00 , 0x02 , 0x00 , 0x00 , 0x02 , 0x05 , 0x00 , 0x02 , 0x01 , 0x00 , 0x02 , 0x0C , 0x00 , 0x02 , 0x01 , 0x00 , 0x02 , 0x00 , 0x00 , 0x02 , 0x00 , 0x00 , 0x02 , 0x0D , 0x00 , 0x02 , 0x05 , 0x00 , 0x02 , 0x0F , 0x00 , 0x02 , 0x00 , 0x00 , 0x02 , 0x09 , 0x00 , 0x02 , 0x05 , 0x00 , 0x02 , 0x0F , 0x00 , 0x02 , 0x03 , 0x00 , 0x02 , 0x00 , 0x00 , 0x02 , 0x02 , 0x00 , 0x02 , 0x05 , 0x00 , 0x02 , 0x03 , 0x00 , 0x02 , 0x03 , 0x00 , 0x02 , 0x01 , 0x00 , 0x02 , 0x07 , 0x00 , 0x02 , 0x07 , 0x00 , 0x02 , 0x0B , 0x00 , 0x02 , 0x02 , 0x00 , 0x02 , 0x01 , 0x00 , 0x02 , 0x02 , 0x00 , 0x02 , 0x07 , 0x00 , 0x02 , 0x02 , 0x00 , 0x02 , 0x0C , 0x00 , 0x02 , 0x02 , 0x00 , 0x02 , 0x02 , 0x00 , 0x01 , 0x02 , 0x01 , 0x13 , 0x01 , 0x02 , 0x04 , 0x00 , 0x00 , 0x0C , 0x00 , 0x01 , 0x0E , 0x5B , 0x00 , 0x01 , 0x01 , 0x22 , 0x0C , 0x02 , 0x01 , 0x0D , 0x59 , 0x00 , 0x01 , 0x01 , 0x01 , 0x06 , 0x02 , 0x01 , 0x0B , 0x4E , 0x00 , 0x01 , 0x03 , 0x00 , 0x05 , 0x00 , 0x00 , 0xFF , 0x00 , 0x00 , 0x01 , 0x03 , 0x01 , 0x05 , 0x00 , 0x00 , 0xFF , 0x00 , 0x00 , 0x00 ] instructions = parse(opcode) instructions = add_main_prologue_epilogue(instructions) dump(instructions) ``` |
- (2) 处理VM中mem及字符串
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ```python ..... # VM中要使用的内存 def dump_data(): print ( "\n" ) print ( """right:\n .asciz "right" """ ) print ( """wrong:\n .asciz "wrong" """ ) print ( """plz_input:\n .asciz "plz input:" """ ) print ( """hacker:\n .asciz "hacker" """ ) print ( """mem:\n .space 0x100 """ ) if __name__ = = '__main__' : opcode = [...] instructions = parse(opcode) instructions = add_main_prologue_epilogue(instructions) dump(instructions) dump_data() ``` |
- (3) 处理print_str
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | ![图片描述](upload / attach / 202311 / 921830_4U5PDW3GE26ETHF .webp) 我们弄出来的汇编中有这种语句 ```python # case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker' PrintStr = namedtuple( "PrintStr" , [ "addr" ]) ``` 其主要就是根据edx的值,来打印不同的字符串 难以避免的要进行函数调用,我们可以借用pwntools的shellcraft来产生:https: / / docs.pwntools.com / en / stable / shellcraft / i386.html #module-pwnlib.shellcraft.i386.linux ```python from collections import namedtuple from dataclasses import dataclass ..... write_func_call = namedtuple( "write_func_call" , [ "addr" , "str_idx" ]) # 优化(3): 处理print_str def handle_print_str(instructions): """ _0x0000: mov edx, 0x03 _0x0003: print_str _0x0012: mov edx, 0x01 _0x0015: print_str _0x0105: mov edx, 0x00 _0x0108: print_str _0x010e: mov edx, 0x01 _0x0111: print_str """ idx = 0 while idx < len (instructions): match instructions[idx: idx + 2 ]: case [ MovReg(addr1, Regs( 3 ), imm), PrintStr(addr2) ] if (imm = = 0x00 or imm = = 0x01 or imm = = 0x03 or imm = = 0x04 ): instructions[idx: idx + 2 ] = [write_func_call(addr2, imm)] idx + = 1 def dump(instructions): for ins in instructions: match ins: ...... case write_func_call(addr, str_idx): if str_idx = = 0 : print_right = f """/* write(fd=1, buf='right', n=5) */ _0x{addr:04x}: pushad push 1 pop ebx mov ecx, right push 5 pop edx push SYS_write /* 4 */ pop eax int 0x80 popad """ print (print_right) elif str_idx = = 1 : print_wrong = f """/* write(fd=1, buf='wrong', n=5) */ _0x{addr:04x}: pushad push 1 pop ebx mov ecx, wrong push 5 pop edx push SYS_write /* 4 */ pop eax int 0x80 popad """ print (print_wrong) elif str_idx = = 3 : print_plz_input = f """/* write(fd=1, buf='plz input:', n=10) */ _0x{addr:04x}: pushad push 1 pop ebx mov ecx, plz_input push 10 pop edx push SYS_write /* 4 */ pop eax int 0x80 popad """ print (print_plz_input) elif str_idx = = 4 : print_hacker = f """/* write(fd=1, buf='hacker', n=6) */ _0x{addr:04x}: pushad push 1 pop ebx mov ecx, hacker push 6 pop edx push SYS_write /* 4 */ pop eax int 0x80 popad """ print (print_hacker) case Nop(addr): print (f "_0x{addr:04x}: nop" ) case MovReg(addr, dst, imm): print (f "_0x{addr:04x}: mov {dst}, 0x{imm:02x}" ) case PushImm(addr, imm): print (f "_0x{addr:04x}: push 0x{imm:02x}" ) case PushReg(addr, reg): print (f "_0x{addr:04x}: push {reg}" ) case PopReg(addr, reg): print (f "_0x{addr:04x}: pop {reg}" ) case PrintStr(addr): print (f "_0x{addr:04x}: print_str" ) case AddReg(addr, dst, src): print (f "_0x{addr:04x}: add {dst}, {src}" ) case SubReg(addr, dst, src): print (f "_0x{addr:04x}: sub {dst}, {src}" ) case MulReg(addr, dst, src): print (f "_0x{addr:04x}: mul {dst}, {src}" ) case DivReg(addr, dst, src): print (f "_0x{addr:04x}: div {dst}, {src}" ) case XorReg(addr, dst, src): print (f "_0x{addr:04x}: xor {dst}, {src}" ) case Jmp(addr, target): print (f "_0x{addr:04x}: jmp _0x{target:04x}" ) case Cmp (addr, dst, src): print (f "_0x{addr:04x}: cmp {dst}, {src}" ) case Jz(addr, target): print (f "_0x{addr:04x}: jz _0x{target:04x}" ) case Jnz(addr, target): print (f "_0x{addr:04x}: jnz _0x{target:04x}" ) case Jg(addr, target): print (f "_0x{addr:04x}: jg _0x{target:04x}" ) case Jl(addr, target): print (f "_0x{addr:04x}: jl _0x{target:04x}" ) case InputStr(addr): print (f "_0x{addr:04x}: input_str" ) case InitMem(addr, mem_addr, sz): print (f "_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})" ) case MovRegStack(addr, dst, src): print (f "_0x{addr:04x}: mov {dst}, [ebp-{src}]" ) case MovRegMem(addr, dst, src): print (f "_0x{addr:04x}: mov {dst}, mem[{src}]" ) case Exit(addr): print (f "_0x{addr:04x}: exit(0)" ) case _: raise Exception(f "unknown instruction: {ins}" ) break ...... ``` |
- (4) 处理input_str
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | ![图片描述](upload / attach / 202311 / 921830_X3P96QF84NKZ4PR .webp) ```python # case 17: gets(mem); eax=strlen(mem); InputStr = namedtuple( "InputStr" , [ "addr" ]) ``` ![图片描述](upload / attach / 202311 / 921830_JAXQMQ975NMVWXV .webp) ```python from collections import namedtuple from dataclasses import dataclass ...... read_strlen_func_call = namedtuple( "read_func_call" , [ "addr" ]) # 优化(4): 处理input_str def handle_input_str(instructions): """ _0x0006: input_str """ idx = 0 while idx < len (instructions): match instructions[idx: idx + 1 ]: case [ InputStr(addr) ]: instructions[idx: idx + 1 ] = [read_strlen_func_call(addr)] idx + = 1 def dump(instructions): for ins in instructions: match ins: ...... case read_strlen_func_call(addr): print_read_strlen = f """/* read(fd=0, buf=mem, n=0x100) */ _0x{addr:04x}: push eax push ebx push ecx push edx xor ebx, ebx mov ecx, mem push 0x100 pop edx push SYS_read /* 3 */ pop eax int 0x80 /* strlen(mem) */ mov edi, mem xor eax, eax push -1 pop ecx repnz scas al, BYTE PTR [edi] inc ecx inc ecx neg ecx /* moving ecx into ecx, but this is a no-op */ mov edi, ecx pop edx pop ecx pop ebx pop eax mov eax, edi """ print (print_read_strlen) case Nop(addr): print (f "_0x{addr:04x}: nop" ) case MovReg(addr, dst, imm): print (f "_0x{addr:04x}: mov {dst}, 0x{imm:02x}" ) case PushImm(addr, imm): print (f "_0x{addr:04x}: push 0x{imm:02x}" ) case PushReg(addr, reg): print (f "_0x{addr:04x}: push {reg}" ) case PopReg(addr, reg): print (f "_0x{addr:04x}: pop {reg}" ) case PrintStr(addr): print (f "_0x{addr:04x}: print_str" ) case AddReg(addr, dst, src): print (f "_0x{addr:04x}: add {dst}, {src}" ) case SubReg(addr, dst, src): print (f "_0x{addr:04x}: sub {dst}, {src}" ) case MulReg(addr, dst, src): print (f "_0x{addr:04x}: mul {dst}, {src}" ) case DivReg(addr, dst, src): print (f "_0x{addr:04x}: div {dst}, {src}" ) case XorReg(addr, dst, src): print (f "_0x{addr:04x}: xor {dst}, {src}" ) case Jmp(addr, target): print (f "_0x{addr:04x}: jmp _0x{target:04x}" ) case Cmp (addr, dst, src): print (f "_0x{addr:04x}: cmp {dst}, {src}" ) case Jz(addr, target): print (f "_0x{addr:04x}: jz _0x{target:04x}" ) case Jnz(addr, target): print (f "_0x{addr:04x}: jnz _0x{target:04x}" ) case Jg(addr, target): print (f "_0x{addr:04x}: jg _0x{target:04x}" ) case Jl(addr, target): print (f "_0x{addr:04x}: jl _0x{target:04x}" ) case InputStr(addr): print (f "_0x{addr:04x}: input_str" ) case InitMem(addr, mem_addr, sz): print (f "_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})" ) case MovRegStack(addr, dst, src): print (f "_0x{addr:04x}: mov {dst}, [ebp-{src}]" ) case MovRegMem(addr, dst, src): print (f "_0x{addr:04x}: mov {dst}, mem[{src}]" ) case Exit(addr): print (f "_0x{addr:04x}: exit(0)" ) case _: raise Exception(f "unknown instruction: {ins}" ) break # 优化(2): VM中要使用的内存 def dump_data(): print ( "\n" ) print ( """right:\n .asciz "right" """ ) print ( """wrong:\n .asciz "wrong" """ ) print ( """plz_input:\n .asciz "plz input:" """ ) print ( """hacker:\n .asciz "hacker" """ ) print ( """mem:\n .space 0x100 """ ) if __name__ = = '__main__' : opcode = [.....] instructions = parse(opcode) instructions = add_main_prologue_epilogue(instructions) handle_print_str(instructions) handle_input_str(instructions) dump(instructions) dump_data() ``` |
- (5) 处理exit(0)
1 2 3 4 5 6 7 8 9 10 | ![图片描述](upload / attach / 202311 / 921830_XYKFP696Y5UC9UJ .webp) ```python case Exit(addr): print (f """/* exit(status=0) */ _0x{addr:04x}: xor ebx, ebx push SYS_exit /* 1 */ pop eax int 0x80 """ ) ``` |
- (6) 优化mov ebx, [ebp-ecx]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | 这种asm是会报错的 ![图片描述](upload / attach / 202311 / 921830_MVYBK4BXJK84HFF .webp) 换成如下这种 ![图片描述](upload / attach / 202311 / 921830_9RHVZ9B8HUX8NJE .webp) ![图片描述](upload / attach / 202311 / 921830_4BK2G6J8HPXACSM .webp) ```python case MovRegStack(addr, dst, src): # print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]") print (f "_0x{addr:04x}: mov {dst}, ebp" ) print (f " sub {dst}, {src}" ) print (f " mov {dst}, [{dst}]" ) ``` |
- (7) 优化_0x006c: div eax, ebx
1 2 3 4 5 6 7 8 9 10 11 | ![图片描述](upload / attach / 202311 / 921830_2T8PCPWA695R3CV .webp) 正常的div ebx执行之后,商将存储在 eax 寄存器中,余数将存储在 edx 寄存器中 它的div有所不同,是存到eax和ebx中的 ![图片描述](upload / attach / 202311 / 921830_P25BCYVPP4BEK4M .webp) 我们还需要在div eax, ebx后面,加一条mov ebx, edx ![图片描述](upload / attach / 202311 / 921830_PV5TZ5K9T466DWK .webp) |
Ezmachine-disassembler-out.asm
4:调用pwntools make_elf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | from ast import dump from pwn import * code = """ push ebp mov ebp, esp ..... ret right: .asciz "right" wrong: .asciz "wrong" plz_input: .asciz "plz input:" hacker: .asciz "hacker" mem: .space 0x100 """ elf = make_elf_from_assembly(code) print (elf) |
效果
[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课