首页
社区
课程
招聘
[原创]python310新特性->Structural Pattern Matching在VM虚拟机逆向中的妙用
发表于: 2023-11-2 16:24 8712

[原创]python310新特性->Structural Pattern Matching在VM虚拟机逆向中的妙用

2023-11-2 16:24
8712

前言:
这种写法最初是在2022-GoogleCTF-eldar, 国外的DiceGang的hgarrereyn师傅的wp:https://ctf.harrisongreen.me/2022/googlectf/eldar/中首次用到。也是用于解析虚拟机, 不过是用于解析的ELF metadata-driven turing weird machine。
然后后来国内的2022-强网杯中的deeprev又将这个ELF metadata-driven turing weird machine出了一次, 我也用其来写过这个重定位机的解析,确实效果非常好,可以说毫不夸张像魔法一样。
当时就在Todolist中写道, 用Structural Pattern Matching这种新特性去写下正常虚拟机的解析, 肯定属于是轻轻松松。后续工作上的事情就放下了没去完成, 在我的Todolist中吃灰了接近一年,这一年都在被工作推着走,每天就像机器人一样去执行自己头天写的指令,记忆好像也变差了,经常忘事情,年末项目交付了一些了才有时间弄些自己的,创业之路真的很难。
言归正传, 后续在dicectf-2022-breach这道题的wp:https://github.com/reductor/dice-ctf-2022-breach-writeup中,被正式用于了解析常规虚拟机。
直至放到了今日, 才回来写个这个。
其实虚拟机解析之前我在之前已经发过不少:
[原创]对VM逆向的分析(CTF)(比较经典的一个虚拟机逆向题目)
[原创]处理VM的一种特殊方法和思路
[原创]CTF之自动化VM分析
总结来说, 这种方法属于是disassembler的升级版, 远优于之前发的disassembler, 你说它优于decompiler吗? 我无法给出一个肯定答案, 毕竟decompiler属于一种抽象为高级语言的思路。

PEP 634 – Structural Pattern Matching: Specification:介绍 match 语法和支持的模式

PEP 635 – Structural Pattern Matching: Motivation and Rationale:解释语法这么设计的理由

PEP 636 – Structural Pattern Matching: Tutorial:一个教程。介绍概念、语法和语义

match patterns:

匹配一个模式,并绑定到一个name

用来进一步限制匹配模式,如下

给限制条件取别名,使其能够与bind name一起工作

子模式在 match 语法里面是可以灵活组合的。

第一种写法,用逗号分隔:

第二种写法与C语言类似:

第三种写法:

第四种写法:

第五种写法:

使用 Python 自带的基本数据结构,如字符串、数字、布尔值和 None等

Wildcard Pattern 是一种特殊的 capture pattern,它接收任何值,但是不将该值绑定到任何一个变量(其实就是忽略不关心的位置)

这种模式主要匹配常量或者 enum 模块的枚举值:

就是在模式中使用其他变量的值,那么使用的其他变量与 capture 模式的绑定名如何区分呢?用 "." 区分。

目前只能使用带 '.' 的常量。

可以在 match 里使用列表或者元组格式的结果。

不区分 [a, b, c], (a, b, c) 和 a, b, c,它们是等价的,若要明确判断类型则需要 list([a, b, c])
加星号的模式会匹配任意长度的元素,例如 (*, 3, *), 匹配任何含有 3 的列表。
不会迭代整个迭代器,所有的元素以下标和切片的形式访问。

为了效率,key 必须是常量(literals、value patterns)

其实就是 case 后支持使用字典做匹配

Class Patterns 主要实现两个目标:检查对象是某个类的实例、从对象的特定属性中提取数据。

另外一个例子:

namedtuple 例子,也属于是 class pattern:

numbers 的类型指定为 List,元素类型可以是 float 或 int。

可以定义类型别名,类型检查器和程序员都可以识别到这种模式:

Type guards 用于缩小 type union 的范围。

一般这种disassembler都是逐渐去优化的,优化到最后能使用https://docs.pwntools.com/en/stable/asm.html#pwnlib.asm.make_elf_from_assembly
图片描述
直接装配成一个elf

Ezmachine-disassembler-parsefunc.py

Ezmachine-disassembler-parsefunc.out

拿parsefunc.out的原因是检查parse及指定类型定义是否合理。

Ezmachine-disassembler-version0.py

Ezmachine-disassembler-dumpfunc-version0.out

其实这里拿到的Ezmachine-disassembler-dumpfunc-version0.out,就跟以前我们的disassembler得到的差不多。

拿这个dumpfunc-version0.out的目的,就是为了参考这个去做优化。

Ezmachine-disassembler.py

Ezmachine-disassembler-out.asm

Ezmachine-asm_compile.py

图片描述

图片描述
图片描述

Mapping patterns: match mapping structures like dictionaries.
Sequence patterns: match sequence structures like tuples and lists.
Capture patterns: bind values to names.
AS patterns: bind the value of subpatterns to names.
OR patterns: match one of several different subpatterns.
Wildcard patterns: match anything.
Class patterns: match class structures.
Value patterns: match values stored in attributes.
Literal patterns: match literal values.
Mapping patterns: match mapping structures like dictionaries.
Sequence patterns: match sequence structures like tuples and lists.
Capture patterns: bind values to names.
AS patterns: bind the value of subpatterns to names.
OR patterns: match one of several different subpatterns.
Wildcard patterns: match anything.
Class patterns: match class structures.
Value patterns: match values stored in attributes.
Literal patterns: match literal values.
def sum_list(numbers):
    match numbers:
        case []: # 匹配空列表
            return 0
        case [first, *rest]:# sequence pattern, 由两个capture pattern 组成的 sequence pattern。*rest 匹配剩下的所有元素
            return first + sum_list(rest)
def sum_list(numbers):
    match numbers:
        case []: # 匹配空列表
            return 0
        case [first, *rest]:# sequence pattern, 由两个capture pattern 组成的 sequence pattern。*rest 匹配剩下的所有元素
            return first + sum_list(rest)
def average(*args):
    match args:
        case [x, y]:           # captures the two elements of a sequence
            return (x + y) / 2
        case [x]:              # captures the only element of a sequence
            return x
        case []:
            return 0
        case a:                # captures the entire sequence
            return sum(a) / len(a)
def average(*args):
    match args:
        case [x, y]:           # captures the two elements of a sequence
            return (x + y) / 2
        case [x]:              # captures the only element of a sequence
            return x
        case []:
            return 0
        case a:                # captures the entire sequence
            return sum(a) / len(a)
# 从小到大排序
def sort(seq):
    match seq:
        case [] | [_]:   # 匹配空序列[] 或者 非空列表中的任何单个元素[_]
            return seq
        case [x, y] if x <= y:
            return seq
        case [x, y]:
            return [y, x]
        case [x, y, z] if x <= y <= z:
            return seq
        case [x, y, z] if x >= y >= z:
            return [z, y, x]
        case [p, *rest]:
            a = sort([x for x in rest if x <= p])     # 比p小的去排序
            b = sort([x for x in rest if p < x])      # 比p大的去排序
            return a + [p] + b
# 从小到大排序
def sort(seq):
    match seq:
        case [] | [_]:   # 匹配空序列[] 或者 非空列表中的任何单个元素[_]
            return seq
        case [x, y] if x <= y:
            return seq
        case [x, y]:
            return [y, x]
        case [x, y, z] if x <= y <= z:
            return seq
        case [x, y, z] if x >= y >= z:
            return [z, y, x]
        case [p, *rest]:
            a = sort([x for x in rest if x <= p])     # 比p小的去排序
            b = sort([x for x in rest if p < x])      # 比p大的去排序
            return a + [p] + b
In : def as_pattern(obj):
...:     match obj:
...:         case str() as s:
...:             print(f'Got str: {s=}')
...:         case [0, int() as i]:
...:             print(f'Got int: {i=}')
...:         case [tuple() as tu]:
...:             print(f'Got tuple: {tu=}')
...:         case list() | set() | dict() as iterable:
...:             print(f'Got iterable: {iterable=}')
...:
...:
 
In : as_pattern('sss')
Got str: s='sss'
 
In : as_pattern([0, 1])
Got int: i=1
 
In : as_pattern([(1,)])
Got tuple: tu=(1,)
 
In : as_pattern([1, 2, 3])
Got iterable: iterable=[1, 2, 3]
 
In : as_pattern({'a': 1})
Got iterable: iterable={'a': 1}
In : def as_pattern(obj):
...:     match obj:
...:         case str() as s:
...:             print(f'Got str: {s=}')
...:         case [0, int() as i]:
...:             print(f'Got int: {i=}')
...:         case [tuple() as tu]:
...:             print(f'Got tuple: {tu=}')
...:         case list() | set() | dict() as iterable:
...:             print(f'Got iterable: {iterable=}')
...:
...:
 
In : as_pattern('sss')
Got str: s='sss'
 
In : as_pattern([0, 1])
Got int: i=1
 
In : as_pattern([(1,)])
Got tuple: tu=(1,)
 
In : as_pattern([1, 2, 3])
Got iterable: iterable=[1, 2, 3]
 
In : as_pattern({'a': 1})
Got iterable: iterable={'a': 1}
def simplify_expr(tokens):
    match tokens:
        case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
            return simplify_expr(expr)
        case [0, ('+'|'-') as op, right]:
            return UnaryOp(op, right)
        case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
            return Num(left + right)
        case [(int() | float()) as value]:
            return Num(value)
def simplify_expr(tokens):
    match tokens:
        case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
            return simplify_expr(expr)
        case [0, ('+'|'-') as op, right]:
            return UnaryOp(op, right)
        case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
            return Num(left + right)
        case [(int() | float()) as value]:
            return Num(value)
case 401, 403, 404:
    print("Some HTTP error")
case 401, 403, 404:
    print("Some HTTP error")
case 401:
case 403:
case 404:
    print("Some HTTP error")
case 401:
case 403:
case 404:
    print("Some HTTP error")
case in 401, 403, 404:
    print("Some HTTP error")
case in 401, 403, 404:
    print("Some HTTP error")
case ("a"|"b"|"c"):
case ("a"|"b"|"c"):
case ("a"|"b"|"c") as letter:
case ("a"|"b"|"c") as letter:
match number:
    case 0:
        print('zero')
    case 1:
        print('one')
    case 2:
        print('two')
match number:
    case 0:
        print('zero')
    case 1:
        print('one')
    case 2:
        print('two')
def simplify(expr):
    match expr:
        case ('+', 0, x):   # x + 0
            return x
        case ('+' | '-', x, 0):  # x +- 0
            return x
        case ('and', True, x):   # True and x
            return x
        case ('and', False, x):
            return False
        case ('or', False, x):
            return x
        case ('or', True, x):
            return True
        case ('not', ('not', x)):
            return x
    return expr
def simplify(expr):
    match expr:
        case ('+', 0, x):   # x + 0
            return x
        case ('+' | '-', x, 0):  # x +- 0
            return x
        case ('and', True, x):   # True and x
            return x
        case ('and', False, x):
            return False
        case ('or', False, x):
            return x
        case ('or', True, x):
            return True
        case ('not', ('not', x)):
            return x
    return expr
def is_closed(sequence):
    match sequence:
        case [_]:               # any sequence with a single element
            return True
        case [start, *_, end]:  # a sequence with at least two elements
            return start == end
        case _:                 # anything
            return False
def is_closed(sequence):
    match sequence:
        case [_]:               # any sequence with a single element
            return True
        case [start, *_, end]:  # a sequence with at least two elements
            return start == end
        case _:                 # anything
            return False
In : class Color(Enum):
...:     RED = 1
...:     GREEN = 2
...:     BLUE = 3
...:
 
In : class NewColor:
...:     YELLOW = 4
...:
 
In : def constant_value(color):
...:     match color:
...:         case Color.RED:
...:             print('Red')
...:         case NewColor.YELLOW:
...:             print('Yellow')
...:         case new_color:
...:             print(new_color)
...:
 
In : constant_value(Color.RED)  # 匹配第一个case
Red
 
In : constant_value(NewColor.YELLOW)  # 匹配第二个case
Yellow
 
In : constant_value(Color.GREEN)  # 匹配第三个case
Color.GREEN
 
In : constant_value(4# 常量值一样都匹配第二个case
Yellow
 
In : constant_value(10# 其他常量
10
 
这里注意,因为 case 具有绑定的作用,所以不能直接使用 YELLOW 这种常量,例如下面这样:
YELLOW = 4
 
def constant_value(color):
    match color:
        case YELLOW:
            print('Yellow')
# 这样语法是错误的
In : class Color(Enum):
...:     RED = 1
...:     GREEN = 2
...:     BLUE = 3
...:
 
In : class NewColor:
...:     YELLOW = 4
...:
 
In : def constant_value(color):
...:     match color:
...:         case Color.RED:
...:             print('Red')
...:         case NewColor.YELLOW:
...:             print('Yellow')
...:         case new_color:
...:             print(new_color)
...:
 
In : constant_value(Color.RED)  # 匹配第一个case
Red
 
In : constant_value(NewColor.YELLOW)  # 匹配第二个case
Yellow
 
In : constant_value(Color.GREEN)  # 匹配第三个case
Color.GREEN
 
In : constant_value(4# 常量值一样都匹配第二个case
Yellow
 
In : constant_value(10# 其他常量
10
 
这里注意,因为 case 具有绑定的作用,所以不能直接使用 YELLOW 这种常量,例如下面这样:
YELLOW = 4
 
def constant_value(color):
    match color:
        case YELLOW:
            print('Yellow')
# 这样语法是错误的
class Codes:
    SUCCESS = 200
    NOT_FOUND = 404
 
def handle(retcode):
    match retcode:
        case Codes.SUCCESS:
            print('success')
        case Codes.NOT_FOUND:
            print('not found')
        case _:
            print('unknown')
class Codes:
    SUCCESS = 200
    NOT_FOUND = 404
 
def handle(retcode):
    match retcode:
        case Codes.SUCCESS:
            print('success')
        case Codes.NOT_FOUND:
            print('not found')
        case _:
            print('unknown')
In : def sequence(collection):
...:     match collection:
...:         case 1, [x, *others]:
...:             print(f"Got 1 and a nested sequence: {x=}, {others=}")
...:         case (1, x):
...:             print(f"Got 1 and {x}")
...:         case [x, y, z]:
...:             print(f"{x=}, {y=}, {z=}")
...:
 
In : sequence([1])
 
In : sequence([1, 2])
Got 1 and 2
 
In : sequence([1, 2, 3])
x=1, y=2, z=3
 
In : sequence([1, [2, 3]])
Got 1 and a nested sequence: x=2, others=[3]
 
In : sequence([1, [2, 3, 4]])
Got 1 and a nested sequence: x=2, others=[3, 4]
 
In : sequence([2, 3])
 
In : sequence((1, 2))
Got 1 and 2
In : def sequence(collection):
...:     match collection:
...:         case 1, [x, *others]:
...:             print(f"Got 1 and a nested sequence: {x=}, {others=}")
...:         case (1, x):
...:             print(f"Got 1 and {x}")
...:         case [x, y, z]:
...:             print(f"{x=}, {y=}, {z=}")
...:
 
In : sequence([1])
 
In : sequence([1, 2])
Got 1 and 2
 
In : sequence([1, 2, 3])
x=1, y=2, z=3
 
In : sequence([1, [2, 3]])
Got 1 and a nested sequence: x=2, others=[3]
 
In : sequence([1, [2, 3, 4]])
Got 1 and a nested sequence: x=2, others=[3, 4]
 
In : sequence([2, 3])
 
In : sequence((1, 2))
Got 1 and 2
In : def mapping(config):
...:     match config:
...:         case {'sub': sub_config, **rest}:
...:             print(f'Sub: {sub_config}')
...:             print(f'OTHERS: {rest}')
...:         case {'route': route}:
...:             print(f'ROUTE: {route}')
...:
 
In : mapping({})
 
In : mapping({'route': '/auth/login'})
ROUTE: /auth/login
 
# 匹配有sub键的字典,值绑定到sub_config上,字典其他部分绑定到rest上
In : mapping({'route': '/auth/login', 'sub': {'a': 1}})
Sub: {'a': 1}
OTHERS: {'route': '/auth/login'}
In : def mapping(config):
...:     match config:
...:         case {'sub': sub_config, **rest}:
...:             print(f'Sub: {sub_config}')
...:             print(f'OTHERS: {rest}')
...:         case {'route': route}:
...:             print(f'ROUTE: {route}')
...:
 
In : mapping({})
 
In : mapping({'route': '/auth/login'})
ROUTE: /auth/login
 
# 匹配有sub键的字典,值绑定到sub_config上,字典其他部分绑定到rest上
In : mapping({'route': '/auth/login', 'sub': {'a': 1}})
Sub: {'a': 1}
OTHERS: {'route': '/auth/login'}
def change_red_to_blue(json_obj):
    match json_obj:
        case { 'color': ('red' | '#FF0000') }:
            json_obj['color'] = 'blue'
        case { 'children': children }:
            for child in children:
                change_red_to_blue(child)
def change_red_to_blue(json_obj):
    match json_obj:
        case { 'color': ('red' | '#FF0000') }:
            json_obj['color'] = 'blue'
        case { 'children': children }:
            for child in children:
                change_red_to_blue(child)
# case 后支持任何对象做匹配。我们先来一个错误的示例:
 
In : class Point:
...:     def __init__(self, x, y):
...:         self.x = x
...:         self.y = y
...:
 
In : def class_pattern(obj):
...:     match obj:
...:         case Point(x, y):
...:             print(f'Point({x=},{y=})')
...:
 
In : class_pattern(Point(1, 2))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [], in <cell line: 1>()
----> 1 class_pattern(Point(1, 2))
 
Input In [], in class_pattern(obj)
      1 def class_pattern(obj):
      2     match obj:
----> 3         case Point(x, y):
      4             print(f'Point({x=},{y=})')
 
TypeError: Point() accepts 0 positional sub-patterns (2 given)
 
# 这是因为对于匹配来说, 位置需要确定 ,所以需要使用位置参数来标识:
 
In : def class_pattern(obj):
...:     match obj:
...:         case Point(x=1, y=2):
...:             print(f'match')
...:
 
In : class_pattern(Point(1, 2))
match
 
# 另外一个解决这种自定义类不用位置参数的匹配方案,使用 __match_args__ 返回一个位置参数的数组,
# 就像这样:
In : class Point:
...:     __match_args__ = ('x', 'y')
...:
...:     def __init__(self, x, y):
...:         self.x = x
...:         self.y = y
...:
 
# 还有就是用dataclass, 这里的 Point2 使用了标准库的 dataclasses.dataclass 装饰器
# 它会提供 __match_args__ 属性,所以可以直接用
In : from dataclasses import dataclass
 
In : @dataclass
...: class Point2:
...:     x: int
...:     y: int
...:
 
In : def class_pattern(obj):
...:     match obj:
...:         case Point(x, y):
...:             print(f'Point({x=},{y=})')
...:         case Point2(x, y):
...:             print(f'Point2({x=},{y=})')
...:
 
In : class_pattern(Point(1, 2))
Point(x=1,y=2)
 
In : class_pattern(Point2(1, 2))
Point2(x=1,y=2)
# case 后支持任何对象做匹配。我们先来一个错误的示例:
 
In : class Point:
...:     def __init__(self, x, y):
...:         self.x = x
...:         self.y = y
...:
 
In : def class_pattern(obj):
...:     match obj:
...:         case Point(x, y):
...:             print(f'Point({x=},{y=})')
...:
 
In : class_pattern(Point(1, 2))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [], in <cell line: 1>()
----> 1 class_pattern(Point(1, 2))
 
Input In [], in class_pattern(obj)
      1 def class_pattern(obj):
      2     match obj:
----> 3         case Point(x, y):
      4             print(f'Point({x=},{y=})')
 
TypeError: Point() accepts 0 positional sub-patterns (2 given)
 
# 这是因为对于匹配来说, 位置需要确定 ,所以需要使用位置参数来标识:
 
In : def class_pattern(obj):
...:     match obj:
...:         case Point(x=1, y=2):
...:             print(f'match')
...:
 
In : class_pattern(Point(1, 2))
match
 
# 另外一个解决这种自定义类不用位置参数的匹配方案,使用 __match_args__ 返回一个位置参数的数组,
# 就像这样:
In : class Point:
...:     __match_args__ = ('x', 'y')
...:
...:     def __init__(self, x, y):
...:         self.x = x
...:         self.y = y
...:
 
# 还有就是用dataclass, 这里的 Point2 使用了标准库的 dataclasses.dataclass 装饰器
# 它会提供 __match_args__ 属性,所以可以直接用
In : from dataclasses import dataclass
 
In : @dataclass
...: class Point2:
...:     x: int
...:     y: int
...:
 
In : def class_pattern(obj):
...:     match obj:
...:         case Point(x, y):
...:             print(f'Point({x=},{y=})')
...:         case Point2(x, y):
...:             print(f'Point2({x=},{y=})')
...:
 
In : class_pattern(Point(1, 2))
Point(x=1,y=2)
 
In : class_pattern(Point2(1, 2))
Point2(x=1,y=2)
def eval_expr(expr):
    """Evaluate an expression and return the result."""
    match expr:
        case BinaryOp('+', left, right):
            return eval_expr(left) + eval_expr(right)
        case BinaryOp('-', left, right):
            return eval_expr(left) - eval_expr(right)
        case BinaryOp('*', left, right):
            return eval_expr(left) * eval_expr(right)
        case BinaryOp('/', left, right):
            return eval_expr(left) / eval_expr(right)
        case UnaryOp('+', arg):
            return eval_expr(arg)
        case UnaryOp('-', arg):
            return -eval_expr(arg)
        case VarExpr(name):
            raise ValueError(f"Unknown value of: {name}")
        case float() | int():
            return expr
        case _:
            raise ValueError(f"Invalid expression value: {repr(expr)}")
def eval_expr(expr):
    """Evaluate an expression and return the result."""
    match expr:
        case BinaryOp('+', left, right):
            return eval_expr(left) + eval_expr(right)
        case BinaryOp('-', left, right):
            return eval_expr(left) - eval_expr(right)
        case BinaryOp('*', left, right):
            return eval_expr(left) * eval_expr(right)
        case BinaryOp('/', left, right):
            return eval_expr(left) / eval_expr(right)
        case UnaryOp('+', arg):
            return eval_expr(arg)
        case UnaryOp('-', arg):
            return -eval_expr(arg)
        case VarExpr(name):
            raise ValueError(f"Unknown value of: {name}")
        case float() | int():
            return expr
        case _:
            raise ValueError(f"Invalid expression value: {repr(expr)}")
match media_object:
    case Image(type="jpg"):
        return media_object
    case Image(type="png") | Image(type="gif"):
        return render_as(media_object, "jpg")
    case Video():
        raise ValueError("Can't extract frames from video yet")
    case other_type:
        raise Exception(f"Media type {media_object} can't be handled yet")
match media_object:
    case Image(type="jpg"):
        return media_object
    case Image(type="png") | Image(type="gif"):
        return render_as(media_object, "jpg")
    case Video():
        raise ValueError("Can't extract frames from video yet")
    case other_type:
        raise Exception(f"Media type {media_object} can't be handled yet")
from collections import namedtuple
Mov = namedtuple('mov', ['dst', 'src', 'sz', 'ridx'])
switch op:
    case Mov(dst, src, 8, ridx):
        pass
from collections import namedtuple
Mov = namedtuple('mov', ['dst', 'src', 'sz', 'ridx'])
switch op:
    case Mov(dst, src, 8, ridx):
        pass
def mean(numbers: list[float | int]) -> float:
    return sum(numbers) / len(numbers)
def mean(numbers: list[float | int]) -> float:
    return sum(numbers) / len(numbers)
from typing import TypeAlias
 
Card: TypeAlias = tuple[str, str]          # ('', '')
Deck: TypeAlias = list[Card]               # [('', '')]
from typing import TypeAlias
 
Card: TypeAlias = tuple[str, str]          # ('', '')
Deck: TypeAlias = list[Card]               # [('', '')]
Opcode Instruction Notes
0 nop pc+=1
1 mov dst, imm imm is an 1-byte immediate value
2 push imm imm is an 1-byte immediate value
3 push reg put (1-byte)reg in stack
4 pop reg pop (1-byte) from stack to reg
5 PrintStr print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker’
6 add reg1, reg2 reg1 += reg2
7 sub reg1, reg2 reg1 -= reg2
8 mul reg1, reg2 reg1 *= reg2
9 div reg1, reg2 reg1 /= reg2 (Put the quotient into eax and the remainder into ebx)
10 xor reg1, reg2 reg1 ^= reg2
11 jmp addr directly jump to address
12 cmp reg1, reg2 edx=reg1-reg2
13 jz addr jump if edx==0(reg1==reg2)
14 jnz addr jump if edx!=0(reg1≠reg2)
15 jg addr jump if edx>0(reg1>reg2)
16 jl addr jump if edx<0 (reg1 < reg2)
17 InputStr gets(mem); eax=strlen(mem);
18 InitMem memset(mem_addr, 0, sz)
19 MovRegStack mov reg, [ebp-src]
20 MovRegMem mov reg, mem[src]
0xff Exit exit(0)
from collections import namedtuple
from dataclasses import dataclass
 
@dataclass
class Regs(object):
    idx: int
 
    def __repr__(self):
        if self.idx == 0:
            return "eax"
        elif self.idx == 1:
            return "ebx"
        elif self.idx == 2:
            return "ecx"
        elif self.idx == 3:
            return "edx"
        else:
            return "unknown reg {}".format(self.idx)
 
Nop = namedtuple("Nop", ["addr"])  # case 0: nop
MovReg = namedtuple("MovReg", ["addr", "dst", "imm"])  # case 1: mov reg, imm
PushImm = namedtuple("PushImm", ["addr", "imm"])  # case 2: push imm
PushReg = namedtuple("PushReg", ["addr", "reg"])  # case 3: push reg
PopReg = namedtuple("PopReg", ["addr", "reg"])  # case 4: pop reg
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr = namedtuple("PrintStr", ["addr"])
 
AddReg = namedtuple("AddReg", ["addr", "dst", "src"])  # case 6: add reg, reg
SubReg = namedtuple("SubReg", ["addr", "dst", "src"])  # case 7: sub reg, reg
MulReg = namedtuple("MulReg", ["addr", "dst", "src"])  # case 8: mul reg, reg
DivReg = namedtuple("DivReg", ["addr", "dst", "src"])  # case 9: div reg, reg
XorReg = namedtuple("XorReg", ["addr", "dst", "src"])  # case 10: xor reg, reg
 
Jmp = namedtuple("Jmp", ["addr", "target"])  # case 11: jmp addr
Cmp = namedtuple("Cmp", ["addr", "dst", "src"])  # case 12: cmp reg, reg
Jz = namedtuple("Jz", ["addr", "target"])  # case 13: jz addr
Jnz = namedtuple("Jnz", ["addr", "target"])  # case 14: jnz addr
Jg = namedtuple("Jg", ["addr", "target"])  # case 15: jg addr
Jl = namedtuple("Jl", ["addr", "target"])  # case 16: jl addr
 
# case 17: gets(mem); eax=strlen(mem);
InputStr = namedtuple("InputStr", ["addr"])
 
InitMem = namedtuple(
    "InitMem", ["addr", "mem_addr", "sz"]
# case 18: memset(mem_addr, 0, sz)
 
MovRegStack = namedtuple(
    "MovRegStack", ["addr", "dst", "src"]
# case 19: mov reg, [ebp-src]
 
MovRegMem = namedtuple(
    "MovRegMem", ["addr", "dst", "src"]
# case 20: mov reg, mem[src]
 
Exit = namedtuple("Exit", ["addr"])  # case 0xff: exit(0)
 
def parse(buffer):
    instructions = []
 
    pc = 0
    while pc < len(buffer):
        opcode = buffer[pc]
 
        match opcode:
            case 0:
                instructions.append(Nop(pc))
                pc += 1
            case 1:
                dst = buffer[pc + 1]
                imm = buffer[pc + 2]
                instructions.append(MovReg(pc, Regs(dst), imm))
                pc += 3
            case 2:
                imm = buffer[pc + 1]
                instructions.append(PushImm(pc, imm))
                pc += 3
            case 3:
                reg = buffer[pc + 1]
                instructions.append(PushReg(pc, Regs(reg)))
                pc += 3
            case 4:
                reg = buffer[pc + 1]
                instructions.append(PopReg(pc, Regs(reg)))
                pc += 3
            case 5:
                instructions.append(PrintStr(pc))
                pc += 3
            case 6:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(AddReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 7:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(SubReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 8:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(MulReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 9:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(DivReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 10:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(XorReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 11:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jmp(pc, target))
                pc += 3
            case 12:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(Cmp(pc, Regs(dst), Regs(src)))
                pc += 3
            case 13:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jz(pc, target))
                pc += 3
            case 14:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jnz(pc, target))
                pc += 3
            case 15:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jg(pc, target))
                pc += 3
            case 16:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jl(pc, target))
                pc += 3
            case 17:
                instructions.append(InputStr(pc))
                pc += 3
            case 18:
                mem_addr = buffer[pc + 1]
                sz = buffer[pc + 2]
                instructions.append(InitMem(pc, mem_addr, sz))
                pc += 3
            case 19:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(MovRegStack(pc, Regs(dst), Regs(src)))
                pc += 3
            case 20:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(MovRegMem(pc, Regs(dst), Regs(src)))
                pc += 3
            case 255:
                instructions.append(Exit(pc))
                pc += 3
            case _:
                raise Exception(f"unknown opcode: {opcode} at {pc}")
                break
         
    return instructions
 
if __name__ == '__main__':
    opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
    instructions = parse(opcode)
        for ins in instructions:
            print(ins)
from collections import namedtuple
from dataclasses import dataclass
 
@dataclass
class Regs(object):
    idx: int
 
    def __repr__(self):
        if self.idx == 0:
            return "eax"
        elif self.idx == 1:
            return "ebx"
        elif self.idx == 2:
            return "ecx"
        elif self.idx == 3:
            return "edx"
        else:
            return "unknown reg {}".format(self.idx)
 
Nop = namedtuple("Nop", ["addr"])  # case 0: nop
MovReg = namedtuple("MovReg", ["addr", "dst", "imm"])  # case 1: mov reg, imm
PushImm = namedtuple("PushImm", ["addr", "imm"])  # case 2: push imm
PushReg = namedtuple("PushReg", ["addr", "reg"])  # case 3: push reg
PopReg = namedtuple("PopReg", ["addr", "reg"])  # case 4: pop reg
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr = namedtuple("PrintStr", ["addr"])
 
AddReg = namedtuple("AddReg", ["addr", "dst", "src"])  # case 6: add reg, reg
SubReg = namedtuple("SubReg", ["addr", "dst", "src"])  # case 7: sub reg, reg
MulReg = namedtuple("MulReg", ["addr", "dst", "src"])  # case 8: mul reg, reg
DivReg = namedtuple("DivReg", ["addr", "dst", "src"])  # case 9: div reg, reg
XorReg = namedtuple("XorReg", ["addr", "dst", "src"])  # case 10: xor reg, reg
 
Jmp = namedtuple("Jmp", ["addr", "target"])  # case 11: jmp addr
Cmp = namedtuple("Cmp", ["addr", "dst", "src"])  # case 12: cmp reg, reg
Jz = namedtuple("Jz", ["addr", "target"])  # case 13: jz addr
Jnz = namedtuple("Jnz", ["addr", "target"])  # case 14: jnz addr
Jg = namedtuple("Jg", ["addr", "target"])  # case 15: jg addr
Jl = namedtuple("Jl", ["addr", "target"])  # case 16: jl addr
 
# case 17: gets(mem); eax=strlen(mem);
InputStr = namedtuple("InputStr", ["addr"])
 
InitMem = namedtuple(
    "InitMem", ["addr", "mem_addr", "sz"]
# case 18: memset(mem_addr, 0, sz)
 
MovRegStack = namedtuple(
    "MovRegStack", ["addr", "dst", "src"]
# case 19: mov reg, [ebp-src]
 
MovRegMem = namedtuple(
    "MovRegMem", ["addr", "dst", "src"]
# case 20: mov reg, mem[src]
 
Exit = namedtuple("Exit", ["addr"])  # case 0xff: exit(0)
 
def parse(buffer):
    instructions = []
 
    pc = 0
    while pc < len(buffer):
        opcode = buffer[pc]
 
        match opcode:
            case 0:
                instructions.append(Nop(pc))
                pc += 1
            case 1:
                dst = buffer[pc + 1]
                imm = buffer[pc + 2]
                instructions.append(MovReg(pc, Regs(dst), imm))
                pc += 3
            case 2:
                imm = buffer[pc + 1]
                instructions.append(PushImm(pc, imm))
                pc += 3
            case 3:
                reg = buffer[pc + 1]
                instructions.append(PushReg(pc, Regs(reg)))
                pc += 3
            case 4:
                reg = buffer[pc + 1]
                instructions.append(PopReg(pc, Regs(reg)))
                pc += 3
            case 5:
                instructions.append(PrintStr(pc))
                pc += 3
            case 6:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(AddReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 7:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(SubReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 8:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(MulReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 9:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(DivReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 10:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(XorReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 11:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jmp(pc, target))
                pc += 3
            case 12:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(Cmp(pc, Regs(dst), Regs(src)))
                pc += 3
            case 13:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jz(pc, target))
                pc += 3
            case 14:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jnz(pc, target))
                pc += 3
            case 15:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jg(pc, target))
                pc += 3
            case 16:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jl(pc, target))
                pc += 3
            case 17:
                instructions.append(InputStr(pc))
                pc += 3
            case 18:
                mem_addr = buffer[pc + 1]
                sz = buffer[pc + 2]
                instructions.append(InitMem(pc, mem_addr, sz))
                pc += 3
            case 19:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(MovRegStack(pc, Regs(dst), Regs(src)))
                pc += 3
            case 20:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(MovRegMem(pc, Regs(dst), Regs(src)))
                pc += 3
            case 255:
                instructions.append(Exit(pc))
                pc += 3
            case _:
                raise Exception(f"unknown opcode: {opcode} at {pc}")
                break
         
    return instructions
 
if __name__ == '__main__':
    opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
    instructions = parse(opcode)
        for ins in instructions:
            print(ins)
MovReg(addr=0, dst=edx, imm=3)
PrintStr(addr=3)
InputStr(addr=6)
MovReg(addr=9, dst=ebx, imm=17)
Cmp(addr=12, dst=eax, src=ebx)
Jz(addr=15, target=27)
MovReg(addr=18, dst=edx, imm=1)
PrintStr(addr=21)
Exit(addr=24)
MovReg(addr=27, dst=ecx, imm=0)
MovReg(addr=30, dst=eax, imm=17)
Cmp(addr=33, dst=eax, src=ecx)
Jz(addr=36, target=126)
MovRegMem(addr=39, dst=eax, src=ecx)
MovReg(addr=42, dst=ebx, imm=97)
Cmp(addr=45, dst=eax, src=ebx)
Jl(addr=48, target=75)
MovReg(addr=51, dst=ebx, imm=122)
Cmp(addr=54, dst=eax, src=ebx)
Jg(addr=57, target=75)
MovReg(addr=60, dst=ebx, imm=71)
XorReg(addr=63, dst=eax, src=ebx)
MovReg(addr=66, dst=ebx, imm=1)
AddReg(addr=69, dst=eax, src=ebx)
Jmp(addr=72, target=105)
MovReg(addr=75, dst=ebx, imm=65)
Cmp(addr=78, dst=eax, src=ebx)
Jl(addr=81, target=105)
MovReg(addr=84, dst=ebx, imm=90)
Cmp(addr=87, dst=eax, src=ebx)
Jg(addr=90, target=105)
MovReg(addr=93, dst=ebx, imm=75)
XorReg(addr=96, dst=eax, src=ebx)
MovReg(addr=99, dst=ebx, imm=1)
SubReg(addr=102, dst=eax, src=ebx)
MovReg(addr=105, dst=ebx, imm=16)
DivReg(addr=108, dst=eax, src=ebx)
PushReg(addr=111, reg=ebx)
PushReg(addr=114, reg=eax)
MovReg(addr=117, dst=ebx, imm=1)
AddReg(addr=120, dst=ecx, src=ebx)
Jmp(addr=123, target=30)
PushImm(addr=126, imm=7)
PushImm(addr=129, imm=13)
PushImm(addr=132, imm=0)
PushImm(addr=135, imm=5)
PushImm(addr=138, imm=1)
PushImm(addr=141, imm=12)
PushImm(addr=144, imm=1)
PushImm(addr=147, imm=0)
PushImm(addr=150, imm=0)
PushImm(addr=153, imm=13)
PushImm(addr=156, imm=5)
PushImm(addr=159, imm=15)
PushImm(addr=162, imm=0)
PushImm(addr=165, imm=9)
PushImm(addr=168, imm=5)
PushImm(addr=171, imm=15)
PushImm(addr=174, imm=3)
PushImm(addr=177, imm=0)
PushImm(addr=180, imm=2)
PushImm(addr=183, imm=5)
PushImm(addr=186, imm=3)
PushImm(addr=189, imm=3)
PushImm(addr=192, imm=1)
PushImm(addr=195, imm=7)
PushImm(addr=198, imm=7)
PushImm(addr=201, imm=11)
PushImm(addr=204, imm=2)
PushImm(addr=207, imm=1)
PushImm(addr=210, imm=2)
PushImm(addr=213, imm=7)
PushImm(addr=216, imm=2)
PushImm(addr=219, imm=12)
PushImm(addr=222, imm=2)
PushImm(addr=225, imm=2)
MovReg(addr=228, dst=ecx, imm=1)
MovRegStack(addr=231, dst=ebx, src=ecx)
PopReg(addr=234, reg=eax)
Cmp(addr=237, dst=eax, src=ebx)
Jnz(addr=240, target=270)
MovReg(addr=243, dst=ebx, imm=34)
Cmp(addr=246, dst=ecx, src=ebx)
Jz(addr=249, target=264)
MovReg(addr=252, dst=ebx, imm=1)
AddReg(addr=255, dst=ecx, src=ebx)
Jmp(addr=258, target=231)
MovReg(addr=261, dst=edx, imm=0)
PrintStr(addr=264)
Exit(addr=267)
MovReg(addr=270, dst=edx, imm=1)
PrintStr(addr=273)
Exit(addr=276)
Nop(addr=279)
MovReg(addr=0, dst=edx, imm=3)
PrintStr(addr=3)
InputStr(addr=6)
MovReg(addr=9, dst=ebx, imm=17)
Cmp(addr=12, dst=eax, src=ebx)
Jz(addr=15, target=27)
MovReg(addr=18, dst=edx, imm=1)
PrintStr(addr=21)
Exit(addr=24)
MovReg(addr=27, dst=ecx, imm=0)
MovReg(addr=30, dst=eax, imm=17)
Cmp(addr=33, dst=eax, src=ecx)
Jz(addr=36, target=126)
MovRegMem(addr=39, dst=eax, src=ecx)
MovReg(addr=42, dst=ebx, imm=97)
Cmp(addr=45, dst=eax, src=ebx)
Jl(addr=48, target=75)
MovReg(addr=51, dst=ebx, imm=122)
Cmp(addr=54, dst=eax, src=ebx)
Jg(addr=57, target=75)
MovReg(addr=60, dst=ebx, imm=71)
XorReg(addr=63, dst=eax, src=ebx)
MovReg(addr=66, dst=ebx, imm=1)
AddReg(addr=69, dst=eax, src=ebx)
Jmp(addr=72, target=105)
MovReg(addr=75, dst=ebx, imm=65)
Cmp(addr=78, dst=eax, src=ebx)
Jl(addr=81, target=105)
MovReg(addr=84, dst=ebx, imm=90)
Cmp(addr=87, dst=eax, src=ebx)
Jg(addr=90, target=105)
MovReg(addr=93, dst=ebx, imm=75)
XorReg(addr=96, dst=eax, src=ebx)
MovReg(addr=99, dst=ebx, imm=1)
SubReg(addr=102, dst=eax, src=ebx)
MovReg(addr=105, dst=ebx, imm=16)
DivReg(addr=108, dst=eax, src=ebx)
PushReg(addr=111, reg=ebx)
PushReg(addr=114, reg=eax)
MovReg(addr=117, dst=ebx, imm=1)
AddReg(addr=120, dst=ecx, src=ebx)
Jmp(addr=123, target=30)
PushImm(addr=126, imm=7)
PushImm(addr=129, imm=13)
PushImm(addr=132, imm=0)
PushImm(addr=135, imm=5)
PushImm(addr=138, imm=1)
PushImm(addr=141, imm=12)
PushImm(addr=144, imm=1)
PushImm(addr=147, imm=0)
PushImm(addr=150, imm=0)
PushImm(addr=153, imm=13)
PushImm(addr=156, imm=5)
PushImm(addr=159, imm=15)
PushImm(addr=162, imm=0)
PushImm(addr=165, imm=9)
PushImm(addr=168, imm=5)
PushImm(addr=171, imm=15)
PushImm(addr=174, imm=3)
PushImm(addr=177, imm=0)
PushImm(addr=180, imm=2)
PushImm(addr=183, imm=5)
PushImm(addr=186, imm=3)
PushImm(addr=189, imm=3)
PushImm(addr=192, imm=1)
PushImm(addr=195, imm=7)
PushImm(addr=198, imm=7)
PushImm(addr=201, imm=11)
PushImm(addr=204, imm=2)
PushImm(addr=207, imm=1)
PushImm(addr=210, imm=2)
PushImm(addr=213, imm=7)
PushImm(addr=216, imm=2)
PushImm(addr=219, imm=12)
PushImm(addr=222, imm=2)
PushImm(addr=225, imm=2)
MovReg(addr=228, dst=ecx, imm=1)
MovRegStack(addr=231, dst=ebx, src=ecx)
PopReg(addr=234, reg=eax)
Cmp(addr=237, dst=eax, src=ebx)
Jnz(addr=240, target=270)
MovReg(addr=243, dst=ebx, imm=34)
Cmp(addr=246, dst=ecx, src=ebx)
Jz(addr=249, target=264)
MovReg(addr=252, dst=ebx, imm=1)
AddReg(addr=255, dst=ecx, src=ebx)
Jmp(addr=258, target=231)
MovReg(addr=261, dst=edx, imm=0)
PrintStr(addr=264)
Exit(addr=267)
MovReg(addr=270, dst=edx, imm=1)
PrintStr(addr=273)
Exit(addr=276)
Nop(addr=279)
from collections import namedtuple
from dataclasses import dataclass
 
@dataclass
class Regs(object):
    idx: int
 
    def __repr__(self):
        if self.idx == 0:
            return "eax"
        elif self.idx == 1:
            return "ebx"
        elif self.idx == 2:
            return "ecx"
        elif self.idx == 3:
            return "edx"
        else:
            return "unknown reg {}".format(self.idx)
 
Nop = namedtuple("Nop", ["addr"])  # case 0: nop
MovReg = namedtuple("MovReg", ["addr", "dst", "imm"])  # case 1: mov reg, imm
PushImm = namedtuple("PushImm", ["addr", "imm"])  # case 2: push imm
PushReg = namedtuple("PushReg", ["addr", "reg"])  # case 3: push reg
PopReg = namedtuple("PopReg", ["addr", "reg"])  # case 4: pop reg
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr = namedtuple("PrintStr", ["addr"])
 
AddReg = namedtuple("AddReg", ["addr", "dst", "src"])  # case 6: add reg, reg
SubReg = namedtuple("SubReg", ["addr", "dst", "src"])  # case 7: sub reg, reg
MulReg = namedtuple("MulReg", ["addr", "dst", "src"])  # case 8: mul reg, reg
DivReg = namedtuple("DivReg", ["addr", "dst", "src"])  # case 9: div reg, reg
XorReg = namedtuple("XorReg", ["addr", "dst", "src"])  # case 10: xor reg, reg
 
Jmp = namedtuple("Jmp", ["addr", "target"])  # case 11: jmp addr
Cmp = namedtuple("Cmp", ["addr", "dst", "src"])  # case 12: cmp reg, reg
Jz = namedtuple("Jz", ["addr", "target"])  # case 13: jz addr
Jnz = namedtuple("Jnz", ["addr", "target"])  # case 14: jnz addr
Jg = namedtuple("Jg", ["addr", "target"])  # case 15: jg addr
Jl = namedtuple("Jl", ["addr", "target"])  # case 16: jl addr
 
# case 17: gets(mem); eax=strlen(mem);
InputStr = namedtuple("InputStr", ["addr"])
 
InitMem = namedtuple(
    "InitMem", ["addr", "mem_addr", "sz"]
# case 18: memset(mem_addr, 0, sz)
 
MovRegStack = namedtuple(
    "MovRegStack", ["addr", "dst", "src"]
# case 19: mov reg, [ebp-src]
 
MovRegMem = namedtuple(
    "MovRegMem", ["addr", "dst", "src"]
# case 20: mov reg, mem[src]
 
Exit = namedtuple("Exit", ["addr"])  # case 0xff: exit(0)
 
def parse(buffer):
    instructions = []
 
    pc = 0
    while pc < len(buffer):
        opcode = buffer[pc]
 
        match opcode:
            case 0:
                instructions.append(Nop(pc))
                pc += 1
            case 1:
                dst = buffer[pc + 1]
                imm = buffer[pc + 2]
                instructions.append(MovReg(pc, Regs(dst), imm))
                pc += 3
            case 2:
                imm = buffer[pc + 1]
                instructions.append(PushImm(pc, imm))
                pc += 3
            case 3:
                reg = buffer[pc + 1]
                instructions.append(PushReg(pc, Regs(reg)))
                pc += 3
            case 4:
                reg = buffer[pc + 1]
                instructions.append(PopReg(pc, Regs(reg)))
                pc += 3
            case 5:
                instructions.append(PrintStr(pc))
                pc += 3
            case 6:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(AddReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 7:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(SubReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 8:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(MulReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 9:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(DivReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 10:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(XorReg(pc, Regs(dst), Regs(src)))
                pc += 3
            case 11:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jmp(pc, target))
                pc += 3
            case 12:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(Cmp(pc, Regs(dst), Regs(src)))
                pc += 3
            case 13:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jz(pc, target))
                pc += 3
            case 14:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jnz(pc, target))
                pc += 3
            case 15:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jg(pc, target))
                pc += 3
            case 16:
                target = 3 * buffer[pc + 1] - 3
                instructions.append(Jl(pc, target))
                pc += 3
            case 17:
                instructions.append(InputStr(pc))
                pc += 3
            case 18:
                mem_addr = buffer[pc + 1]
                sz = buffer[pc + 2]
                instructions.append(InitMem(pc, mem_addr, sz))
                pc += 3
            case 19:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(MovRegStack(pc, Regs(dst), src))
                pc += 3
            case 20:
                dst = buffer[pc + 1]
                src = buffer[pc + 2]
                instructions.append(MovRegMem(pc, Regs(dst), src))
                pc += 3
            case 255:
                instructions.append(Exit(pc))
                pc += 3
            case _:
                raise Exception(f"unknown opcode: {opcode} at {pc}")
                break
         
    return instructions
 
def dump(instructions):
    for ins in instructions:
        match ins:
            case Nop(addr):
                print(f"_0x{addr:04x}: nop")
            case MovReg(addr, dst, imm):
                print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
            case PushImm(addr, imm):
                print(f"_0x{addr:04x}: push 0x{imm:02x}")
            case PushReg(addr, reg):
                print(f"_0x{addr:04x}: push {reg}")
            case PopReg(addr, reg):
                print(f"_0x{addr:04x}: pop {reg}")
            case PrintStr(addr):
                print(f"_0x{addr:04x}: print_str")
            case AddReg(addr, dst, src):
                print(f"_0x{addr:04x}: add {dst}, {src}")
            case SubReg(addr, dst, src):
                print(f"_0x{addr:04x}: sub {dst}, {src}")
            case MulReg(addr, dst, src):
                print(f"_0x{addr:04x}: mul {dst}, {src}")
            case DivReg(addr, dst, src):
                print(f"_0x{addr:04x}: div {dst}, {src}")
            case XorReg(addr, dst, src):
                print(f"_0x{addr:04x}: xor {dst}, {src}")
            case Jmp(addr, target):
                print(f"_0x{addr:04x}: jmp _0x{target:04x}")
            case Cmp(addr, dst, src):
                print(f"_0x{addr:04x}: cmp {dst}, {src}")
            case Jz(addr, target):
                print(f"_0x{addr:04x}: jz _0x{target:04x}")
            case Jnz(addr, target):
                print(f"_0x{addr:04x}: jnz _0x{target:04x}")
            case Jg(addr, target):
                print(f"_0x{addr:04x}: jg _0x{target:04x}")
            case Jl(addr, target):
                print(f"_0x{addr:04x}: jl _0x{target:04x}")
            case InputStr(addr):
                print(f"_0x{addr:04x}: input_str")
            case InitMem(addr, mem_addr, sz):
                print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
            case MovRegStack(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
            case MovRegMem(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
            case Exit(addr):
                print(f"_0x{addr:04x}: exit(0)")
            case _:
                raise Exception(f"unknown instruction: {ins}")
                break
 
if __name__ == '__main__':
    opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
    instructions = parse(opcode)
    dump(instructions)
from collections import namedtuple
from dataclasses import dataclass
 
@dataclass
class Regs(object):
    idx: int
 
    def __repr__(self):
        if self.idx == 0:
            return "eax"
        elif self.idx == 1:
            return "ebx"
        elif self.idx == 2:
            return "ecx"
        elif self.idx == 3:
            return "edx"
        else:
            return "unknown reg {}".format(self.idx)
 
Nop = namedtuple("Nop", ["addr"])  # case 0: nop
MovReg = namedtuple("MovReg", ["addr", "dst", "imm"])  # case 1: mov reg, imm
PushImm = namedtuple("PushImm", ["addr", "imm"])  # case 2: push imm
PushReg = namedtuple("PushReg", ["addr", "reg"])  # case 3: push reg
PopReg = namedtuple("PopReg", ["addr", "reg"])  # case 4: pop reg
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr = namedtuple("PrintStr", ["addr"])
 
AddReg = namedtuple("AddReg", ["addr", "dst", "src"])  # case 6: add reg, reg
SubReg = namedtuple("SubReg", ["addr", "dst", "src"])  # case 7: sub reg, reg
MulReg = namedtuple("MulReg", ["addr", "dst", "src"])  # case 8: mul reg, reg
DivReg = namedtuple("DivReg", ["addr", "dst", "src"])  # case 9: div reg, reg
XorReg = namedtuple("XorReg", ["addr", "dst", "src"])  # case 10: xor reg, reg
 
Jmp = namedtuple("Jmp", ["addr", "target"])  # case 11: jmp addr
Cmp = namedtuple("Cmp", ["addr", "dst", "src"])  # case 12: cmp reg, reg
Jz = namedtuple("Jz", ["addr", "target"])  # case 13: jz addr
Jnz = namedtuple("Jnz", ["addr", "target"])  # case 14: jnz addr
Jg = namedtuple("Jg", ["addr", "target"])  # case 15: jg addr
Jl = namedtuple("Jl", ["addr", "target"])  # case 16: jl addr
 
# case 17: gets(mem); eax=strlen(mem);
InputStr = namedtuple("InputStr", ["addr"])
 
InitMem = namedtuple(
    "InitMem", ["addr", "mem_addr", "sz"]
# case 18: memset(mem_addr, 0, sz)
 
MovRegStack = namedtuple(

[注意]传递专业知识、拓宽行业人脉——看雪讲师团队等你加入!

最后于 2023-11-2 16:31 被SYJ-Re编辑 ,原因:
收藏
免费 6
支持
分享
最新回复 (2)
雪    币: 2787
活跃值: (30801)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
2
感谢分享
2023-11-3 09:29
2
雪    币:
能力值: ( LV1,RANK:0 )
在线值:
发帖
回帖
粉丝
3
这两个包就是python3.10自带的吗
2024-4-3 12:20
0
游客
登录 | 注册 方可回帖
返回
//