首页
社区
课程
招聘
2
[翻译]野蛮fuzz - part2:提升性能
发表于: 2024-8-29 09:07 6251

[翻译]野蛮fuzz - part2:提升性能

2024-8-29 09:07
6251

简介

在这一期的“野蛮fuzz”中,我们将专注于提升我们之前模糊测试器的性能。这意味着不会有任何大规模的变更,我们只是希望在之前的基础上进行改进。因此,在这篇博客文章结束时,我们仍然会得到一个非常基础的变异模糊测试器(希望它能更快!),并且希望在不同的目标上发现更多的漏洞。我们不会在这篇文章中涉及多线程或多进程的内容,这些将留待后续的模糊测试文章中讨论。

我需要在这里添加一个免责声明,我并不是一个专业的开发人员,离这个目标还很远。目前我在编程方面的经验还不足以像一个更有经验的程序员那样识别出提升性能的机会。我将使用我粗糙的技能和有限的编程知识来改进我们之前的模糊测试器,仅此而已。生成的代码不会很漂亮,也不会很完美,但它会比我们在上一篇文章中的代码更好。还需要提到的是,所有的测试都是在一台配有1个CPU和1个核心的x86 Kali虚拟机上使用 VMWare Workstation 进行的。

我们也需要在本文的上下文中定义“更好”的含义。我在这里所说的“更好”是指我们能够更快地完成n次模糊测试迭代,仅此而已。我们会在以后的时间里重新编写模糊测试器,使用一种酷炫的语言,选择一个强化的目标,并采用更先进的模糊测试技术。:)

显然,如果你没有读过上一篇文章,你会感到迷茫!

分析我们的模糊测试器

我们上一个模糊测试器相当简单,但有效!我们在目标中发现了一些漏洞。但我们知道,当我们交作业时,留下一些优化的空间。让我们再来看看上一篇文章中的模糊测试器(为了测试目的做了一些小改动):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
#!/usr/bin/env python3
import sys
import random
from pexpect import run
from pipes import quote
 
# read bytes from our valid JPEG and return them in a mutable bytearray
def get_bytes(filename):
 
    f = open(filename, "rb").read()
 
    return bytearray(f)
 
def bit_flip(data):
 
    num_of_flips = int((len(data) - 4) * .01)
 
    indexes = range(4, (len(data) - 4))
 
    chosen_indexes = []
 
    # iterate selecting indexes until we've hit our num_of_flips number
    counter = 0
    while counter < num_of_flips:
        chosen_indexes.append(random.choice(indexes))
        counter += 1
 
    for x in chosen_indexes:
        current = data[x]
        current = (bin(current).replace("0b",""))
        current = "0" * (8 - len(current)) + current
         
        indexes = range(0,8)
 
        picked_index = random.choice(indexes)
 
        new_number = []
 
        # our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0']
        for i in current:
            new_number.append(i)
 
        # if the number at our randomly selected index is a 1, make it a 0, and vice versa
        if new_number[picked_index] == "1":
            new_number[picked_index] = "0"
        else:
            new_number[picked_index] = "1"
 
        # create our new binary string of our bit-flipped number
        current = ''
        for i in new_number:
            current += i
 
        # convert that string to an integer
        current = int(current,2)
 
        # change the number in our byte array to our new number we just constructed
        data[x] = current
 
    return data
 
def magic(data):
 
    magic_vals = [
    (1, 255),
    (1, 255),
    (1, 127),
    (1, 0),
    (2, 255),
    (2, 0),
    (4, 255),
    (4, 0),
    (4, 128),
    (4, 64),
    (4, 127)
    ]
 
    picked_magic = random.choice(magic_vals)
 
    length = len(data) - 8
    index = range(0, length)
    picked_index = random.choice(index)
 
    # here we are hardcoding all the byte overwrites for all of the tuples that begin (1, )
    if picked_magic[0] == 1:
        if picked_magic[1] == 255:          # 0xFF
            data[picked_index] = 255
        elif picked_magic[1] == 127:        # 0x7F
            data[picked_index] = 127
        elif picked_magic[1] == 0:          # 0x00
            data[picked_index] = 0
 
    # here we are hardcoding all the byte overwrites for all of the tuples that begin (2, )
    elif picked_magic[0] == 2:
        if picked_magic[1] == 255:          # 0xFFFF
            data[picked_index] = 255
            data[picked_index + 1] = 255
        elif picked_magic[1] == 0:          # 0x0000
            data[picked_index] = 0
            data[picked_index + 1] = 0
 
    # here we are hardcoding all of the byte overwrites for all of the tuples that being (4, )
    elif picked_magic[0] == 4:
        if picked_magic[1] == 255:          # 0xFFFFFFFF
            data[picked_index] = 255
            data[picked_index + 1] = 255
            data[picked_index + 2] = 255
            data[picked_index + 3] = 255
        elif picked_magic[1] == 0:          # 0x00000000
            data[picked_index] = 0
            data[picked_index + 1] = 0
            data[picked_index + 2] = 0
            data[picked_index + 3] = 0
        elif picked_magic[1] == 128:        # 0x80000000
            data[picked_index] = 128
            data[picked_index + 1] = 0
            data[picked_index + 2] = 0
            data[picked_index + 3] = 0
        elif picked_magic[1] == 64:         # 0x40000000
            data[picked_index] = 64
            data[picked_index + 1] = 0
            data[picked_index + 2] = 0
            data[picked_index + 3] = 0
        elif picked_magic[1] == 127:        # 0x7FFFFFFF
            data[picked_index] = 127
            data[picked_index + 1] = 255
            data[picked_index + 2] = 255
            data[picked_index + 3] = 255
         
    return data
 
# create new jpg with mutated data
def create_new(data):
 
    f = open("mutated.jpg", "wb+")
    f.write(data)
    f.close()
 
def exif(counter,data):
 
    command = "exif mutated.jpg -verbose"
 
    out, returncode = run("sh -c " + quote(command), withexitstatus=1)
 
    if b"Segmentation" in out:
        f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
        f.write(data)
        print("Segfault!")
 
    #if counter % 100 == 0:
    #   print(counter, end="\r")
 
if len(sys.argv) < 2:
    print("Usage: JPEGfuzz.py ")
 
else:
    filename = sys.argv[1]
    counter = 0
    while counter < 1000:
        data = get_bytes(filename)
        functions = [0, 1]
        picked_function = random.choice(functions)
        picked_function = 1
        if picked_function == 0:
            mutated = magic(data)
            create_new(mutated)
            exif(counter,mutated)
        else:
            mutated = bit_flip(data)
            create_new(mutated)
            exif(counter,mutated)
 
        counter += 1

你可能注意到了一些变化。我们做了以下改动:

  • 注释掉了每100次迭代打印一次计数器的语句。
  • 添加了打印语句,用于通知我们是否发生了段错误(Segfault)。
  • 硬编码了1000次迭代。
  • 临时添加了这行代码:picked_function = 1,以便在测试中消除任何随机性,我们只使用一种变异方法(bit_flip())。

让我们使用一些性能分析工具运行这个版本的模糊测试器,这样我们可以真正分析程序执行过程中在哪些地方花费了最多的时间。

我们可以利用Python的cProfile模块,看看在1000次模糊测试迭代中,我们在哪些地方花费了时间。如果你还记得,这个程序需要一个有效的JPEG文件路径作为参数,所以完整的命令行语法将是:python3 -m cProfile -s cumtime JPEGfuzzer.py ~/jpegs/Canon_40D.jpg

还需要注意的是,添加这个cProfile性能分析工具可能会降低性能。我在没有使用它的情况下进行了测试,对于我们在本文中使用的迭代次数,它似乎没有显著的影响。

运行这个程序后,我们可以看到程序的输出,并了解到执行过程中花费时间最多的地方。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
2476093 function calls (2474812 primitive calls) in 122.084 seconds
 
   Ordered by: cumulative time
 
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     33/1    0.000    0.000  122.084  122.084 {built-in method builtins.exec}
        1    0.108    0.108  122.084  122.084 blog.py:3()
     1000    0.090    0.000  118.622    0.119 blog.py:140(exif)
     1000    0.080    0.000  118.452    0.118 run.py:7(run)
     5432  103.761    0.019  103.761    0.019 {built-in method time.sleep}
     1000    0.028    0.000  100.923    0.101 pty_spawn.py:316(close)
     1000    0.025    0.000  100.816    0.101 ptyprocess.py:387(close)
     1000    0.061    0.000    9.949    0.010 pty_spawn.py:36(__init__)
     1000    0.074    0.000    9.764    0.010 pty_spawn.py:239(_spawn)
     1000    0.041    0.000    8.682    0.009 pty_spawn.py:312(_spawnpty)
     1000    0.266    0.000    8.641    0.009 ptyprocess.py:178(spawn)
     1000    0.011    0.000    7.491    0.007 spawnbase.py:240(expect)
     1000    0.036    0.000    7.479    0.007 spawnbase.py:343(expect_list)
     1000    0.128    0.000    7.409    0.007 expect.py:91(expect_loop)
     6432    6.473    0.001    6.473    0.001 {built-in method posix.read}
     5432    0.089    0.000    3.818    0.001 pty_spawn.py:415(read_nonblocking)
     7348    0.029    0.000    3.162    0.000 utils.py:130(select_ignore_interrupts)
     7348    3.127    0.000    3.127    0.000 {built-in method select.select}
     1000    0.790    0.001    1.777    0.002 blog.py:15(bit_flip)
     1000    0.015    0.000    1.311    0.001 blog.py:134(create_new)
     1000    0.100    0.000    1.101    0.001 pty.py:79(fork)
     1000    1.000    0.001    1.000    0.001 {built-in method posix.forkpty}
-----SNIP-----

对于这种类型的分析,我们并不太关心发生了多少次段错误(segfault),因为我们并没有对变异方法进行太多调整或比较不同的方法。当然,这里会有一些随机性,因为崩溃会导致额外的处理,但目前这样做已经足够了。

我只截取了那些累计花费时间超过1.0秒的代码部分。你可以看到,我们在blog.py:140(exif)上花费了最多的时间。总共122秒中,有118秒花费在这个函数上。显然,我们的exif()函数是性能的主要问题。

我们可以看到,大部分时间都花费在这个函数内部,这直接与函数本身有关。我们看到大量调用了pty模块,这是由于我们使用了pexpect。让我们使用subprocess模块中的Popen重写这个函数,看看是否能在这里提升性能!

以下是重新定义的exif()函数:

1
2
3
4
5
6
7
8
9
10
11
12
def exif(counter,data):
 
    p = Popen(["exif", "mutated.jpg", "-verbose"], stdout=PIPE, stderr=PIPE)
    (out,err) = p.communicate()
 
    if p.returncode == -11:
        f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
        f.write(data)
        print("Segfault!")
 
    #if counter % 100 == 0:
    #   print(counter, end="\r")

以下是我们的性能报告:

1
2
3
4
5
6
7
8
9
10
11
2065580 function calls (2065443 primitive calls) in 2.756 seconds
 
   Ordered by: cumulative time
 
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000    2.756    2.756 {built-in method builtins.exec}
        1    0.038    0.038    2.756    2.756 subpro.py:3()
     1000    0.020    0.000    1.917    0.002 subpro.py:139(exif)
     1000    0.026    0.000    1.121    0.001 subprocess.py:681(__init__)
     1000    0.099    0.000    1.045    0.001 subprocess.py:1412(_execute_child)
 -----SNIP-----

多么大的差别啊。这个模糊测试器在重新定义了exif()函数后,只用了2秒钟就完成了相同的工作量!太不可思议了!旧的模糊测试器需要122秒,而新的只需要2.7秒。

进一步优化Python代码

让我们尝试在Python中继续优化我们的模糊测试器。首先,让我们获得一个好的基准来进行对比。我们将让优化后的Python模糊测试器进行50,000次迭代,并再次使用cProfile模块获取一些细粒度的统计数据,看看我们在哪些地方花费了时间。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
102981395 function calls (102981258 primitive calls) in 141.488 seconds
 
   Ordered by: cumulative time
 
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000  141.488  141.488 {built-in method builtins.exec}
        1    1.724    1.724  141.488  141.488 subpro.py:3()
    50000    0.992    0.000  102.588    0.002 subpro.py:139(exif)
    50000    1.248    0.000   61.562    0.001 subprocess.py:681(__init__)
    50000    5.034    0.000   57.826    0.001 subprocess.py:1412(_execute_child)
    50000    0.437    0.000   39.586    0.001 subprocess.py:920(communicate)
    50000    2.527    0.000   39.064    0.001 subprocess.py:1662(_communicate)
   208254   37.508    0.000   37.508    0.000 {built-in method posix.read}
   158238    0.577    0.000   28.809    0.000 selectors.py:402(select)
   158238   28.131    0.000   28.131    0.000 {method 'poll' of 'select.poll' objects}
    50000   11.784    0.000   25.819    0.001 subpro.py:14(bit_flip)
  7950000    3.666    0.000   10.431    0.000 random.py:256(choice)
    50000    8.421    0.000    8.421    0.000 {built-in method _posixsubprocess.fork_exec}
    50000    0.162    0.000    7.358    0.000 subpro.py:133(create_new)
  7950000    4.096    0.000    6.130    0.000 random.py:224(_randbelow)
   203090    5.016    0.000    5.016    0.000 {built-in method io.open}
    50000    4.211    0.000    4.211    0.000 {method 'close' of '_io.BufferedRandom' objects}
    50000    1.643    0.000    4.194    0.000 os.py:617(get_exec_path)
    50000    1.733    0.000    3.356    0.000 subpro.py:8(get_bytes)
 35866791    2.635    0.000    2.635    0.000 {method 'append' of 'list' objects}
   100000    0.070    0.000    1.960    0.000 subprocess.py:1014(wait)
   100000    0.252    0.000    1.902    0.000 selectors.py:351(register)
   100000    0.444    0.000    1.890    0.000 subprocess.py:1621(_wait)
   100000    0.675    0.000    1.583    0.000 selectors.py:234(register)
   350000    0.432    0.000    1.501    0.000 subprocess.py:1471()
 12074141    1.434    0.000    1.434    0.000 {method 'getrandbits' of '_random.Random' objects}
    50000    0.059    0.000    1.358    0.000 subprocess.py:1608(_try_wait)
    50000    1.299    0.000    1.299    0.000 {built-in method posix.waitpid}
   100000    0.488    0.000    1.058    0.000 os.py:674(__getitem__)
   100000    1.017    0.000    1.017    0.000 {method 'close' of '_io.BufferedReader' objects}
-----SNIP-----

50,000次迭代总共花费了141秒,这相比之前的表现已经非常好了。我们之前需要122秒来完成1,000次迭代!再次过滤掉花费时间超过1.0秒的部分,我们发现大部分时间仍然花费在exif()函数上,但我们也看到在bit_flip()函数上有一些性能问题,因为我们在那里累计花费了25秒。让我们尝试优化一下这个函数。

下面我们来回顾一下旧的bit_flip()函数的样子:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def bit_flip(data):
 
    num_of_flips = int((len(data) - 4) * .01)
 
    indexes = range(4, (len(data) - 4))
 
    chosen_indexes = []
 
    # iterate selecting indexes until we've hit our num_of_flips number
    counter = 0
    while counter < num_of_flips:
        chosen_indexes.append(random.choice(indexes))
        counter += 1
 
    for x in chosen_indexes:
        current = data[x]
        current = (bin(current).replace("0b",""))
        current = "0" * (8 - len(current)) + current
         
        indexes = range(0,8)
 
        picked_index = random.choice(indexes)
 
        new_number = []
 
        # our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0']
        for i in current:
            new_number.append(i)
 
        # if the number at our randomly selected index is a 1, make it a 0, and vice versa
        if new_number[picked_index] == "1":
            new_number[picked_index] = "0"
        else:
            new_number[picked_index] = "1"
 
        # create our new binary string of our bit-flipped number
        current = ''
        for i in new_number:
            current += i
 
        # convert that string to an integer
        current = int(current,2)
 
        # change the number in our byte array to our new number we just constructed
        data[x] = current
 
    return data

这个函数确实有点笨拙。通过使用更好的逻辑,我们可以大大简化它。根据我有限的编程经验,我发现这种情况经常发生:你可以拥有所有复杂难懂的编程知识,但如果程序背后的逻辑不合理,那么程序的性能就会受到影响。

让我们减少类型转换的次数,例如从整数转换为字符串或反之亦然,并且减少代码量。我们可以通过重新定义bit_flip()函数来实现我们的目标,如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def bit_flip(data):
 
    length = len(data) - 4
 
    num_of_flips = int(length * .01)
 
    picked_indexes = []
     
    flip_array = [1,2,4,8,16,32,64,128]
 
    counter = 0
    while counter < num_of_flips:
        picked_indexes.append(random.choice(range(0,length)))
        counter += 1
 
 
    for x in picked_indexes:
        mask = random.choice(flip_array)
        data[x] = data[x] ^ mask
 
    return data

如果我们采用这个新函数并监控结果,我们得到的性能评分是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
59376275 function calls (59376138 primitive calls) in 135.582 seconds
 
   Ordered by: cumulative time
 
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     15/1    0.000    0.000  135.582  135.582 {built-in method builtins.exec}
        1    1.940    1.940  135.582  135.582 subpro.py:3()
    50000    0.978    0.000  107.857    0.002 subpro.py:111(exif)
    50000    1.450    0.000   64.236    0.001 subprocess.py:681(__init__)
    50000    5.566    0.000   60.141    0.001 subprocess.py:1412(_execute_child)
    50000    0.534    0.000   42.259    0.001 subprocess.py:920(communicate)
    50000    2.827    0.000   41.637    0.001 subprocess.py:1662(_communicate)
   199549   38.249    0.000   38.249    0.000 {built-in method posix.read}
   149537    0.555    0.000   30.376    0.000 selectors.py:402(select)
   149537   29.722    0.000   29.722    0.000 {method 'poll' of 'select.poll' objects}
    50000    3.993    0.000   14.471    0.000 subpro.py:14(bit_flip)
  7950000    3.741    0.000   10.316    0.000 random.py:256(choice)
    50000    9.973    0.000    9.973    0.000 {built-in method _posixsubprocess.fork_exec}
    50000    0.163    0.000    7.034    0.000 subpro.py:105(create_new)
  7950000    3.987    0.000    5.952    0.000 random.py:224(_randbelow)
   202567    4.966    0.000    4.966    0.000 {built-in method io.open}
    50000    4.042    0.000    4.042    0.000 {method 'close' of '_io.BufferedRandom' objects}
    50000    1.539    0.000    3.828    0.000 os.py:617(get_exec_path)
    50000    1.843    0.000    3.607    0.000 subpro.py:8(get_bytes)
   100000    0.074    0.000    2.133    0.000 subprocess.py:1014(wait)
   100000    0.463    0.000    2.059    0.000 subprocess.py:1621(_wait)
   100000    0.274    0.000    2.046    0.000 selectors.py:351(register)
   100000    0.782    0.000    1.702    0.000 selectors.py:234(register)
    50000    0.055    0.000    1.507    0.000 subprocess.py:1608(_try_wait)
    50000    1.452    0.000    1.452    0.000 {built-in method posix.waitpid}
   350000    0.424    0.000    1.436    0.000 subprocess.py:1471()
 12066317    1.339    0.000    1.339    0.000 {method 'getrandbits' of '_random.Random' objects}
   100000    0.466    0.000    1.048    0.000 os.py:674(__getitem__)
   100000    1.014    0.000    1.014    0.000 {method 'close' of '_io.BufferedReader' objects}
-----SNIP-----

从指标中可以看出,我们现在在bit_flip()函数上累计只花费了14秒!在上一次测试中,我们在这里花费了25秒,现在几乎快了一倍。我认为我们在这里的优化做得很好。

现在我们有了理想的Python基准测试(请记住,可能还有多进程或多线程的机会,但我们将这个想法留到以后),让我们将模糊测试器移植到一个新语言——C++,并测试其性能。

C++中的新模糊测试器

首先,让我们直接运行我们新优化的Python模糊测试器进行100,000次迭代,看看总共需要多长时间。

1
118749892 function calls (118749755 primitive calls) in 256.881 seconds

100,000次迭代只用了256秒!这比我们之前的模糊测试器快得多。

这将是我们在C++中尝试超越的基准。现在,尽管我对Python开发的细微差别不太熟悉,但如果将这种不熟悉程度乘以10,你就会知道我对C++的不熟悉程度。这段代码可能对某些人来说很可笑,但这是我目前能做到的最好水平,我们可以解释每个函数与之前的Python代码的关系。

让我们逐个函数地描述它们的实现。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
//
// this function simply creates a stream by opening a file in binary mode;
// finds the end of file, creates a string 'data', resizes data to be the same
// size as the file moves the file pointer back to the beginning of the file;
// reads the data from the into the data string;
//
std::string get_bytes(std::string filename)
{
    std::ifstream fin(filename, std::ios::binary);
 
    if (fin.is_open())
    {
        fin.seekg(0, std::ios::end);
        std::string data;
        data.resize(fin.tellg());
        fin.seekg(0, std::ios::beg);
        fin.read(&data[0], data.size());
 
        return data;
    }
 
    else
    {
        std::cout << "Failed to open " << filename << ".\n";
        exit(1);
    }
 
}

正如我的注释所说,这个函数只是从目标文件中检索一个字节字符串,在我们的测试中,目标文件仍然是Canon_40D.jpg

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
//
// this will take 1% of the bytes from our valid jpeg and
// flip a random bit in the byte and return the altered string
//
std::string bit_flip(std::string data)
{
     
    int size = (data.length() - 4);
    int num_of_flips = (int)(size * .01);
 
    // get a vector full of 1% of random byte indexes
    std::vector<int> picked_indexes;
    for (int i = 0; i < num_of_flips; i++)
    {
        int picked_index = rand() % size;
        picked_indexes.push_back(picked_index);
    }
 
    // iterate through the data string at those indexes and flip a bit
    for (int i = 0; i < picked_indexes.size(); ++i)
    {
        int index = picked_indexes[i];
        char current = data.at(index);
        int decimal = ((int)current & 0xff);
         
        int bit_to_flip = rand() % 8;
         
        decimal ^= 1 << bit_to_flip;
        decimal &= 0xff;
         
        data[index] = (char)decimal;
    }
 
    return data;
}

这个函数是我们Python脚本中bit_flip()函数的直接等效实现。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
//
// takes mutated string and creates new jpeg with it;
//
void create_new(std::string mutated)
{
    std::ofstream fout("mutated.jpg", std::ios::binary);
 
    if (fout.is_open())
    {
        fout.seekp(0, std::ios::beg);
        fout.write(&mutated[0], mutated.size());
    }
    else
    {
        std::cout << "Failed to create mutated.jpg" << ".\n";
        exit(1);
    }
}

这个函数将简单地创建一个临时的mutated.jpg文件,类似于我们在Python脚本中使用的create_new()函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
//
// function to run a system command and store the output as a string;
// https://www.jeremymorgan.com/tutorials/c-programming/how-to-capture-the-output-of-a-linux-command-in-c/
//
std::string get_output(std::string cmd)
{
    std::string output;
    FILE * stream;
    char buffer[256];
 
    stream = popen(cmd.c_str(), "r");
    if (stream)
    {
        while (!feof(stream))
            if (fgets(buffer, 256, stream) != NULL) output.append(buffer);
                pclose(stream);
    }
 
    return output;
 
}
 
//
// we actually run our exiv2 command via the get_output() func;
// retrieve the output in the form of a string and then we can parse the string;
// we'll save all the outputs that result in a segfault or floating point except;
//
void exif(std::string mutated, int counter)
{
    std::string command = "exif mutated.jpg -verbose 2>&1";
 
    std::string output = get_output(command);
 
    std::string segfault = "Segmentation";
    std::string floating_point = "Floating";
 
    std::size_t pos1 = output.find(segfault);
    std::size_t pos2 = output.find(floating_point);
 
    if (pos1 != -1)
    {
        std::cout << "Segfault!\n";
        std::ostringstream oss;
        oss << "/root/cppcrashes/crash." << counter << ".jpg";
        std::string filename = oss.str();
        std::ofstream fout(filename, std::ios::binary);
 
        if (fout.is_open())
            {
                fout.seekp(0, std::ios::beg);
                fout.write(&mutated[0], mutated.size());
            }
        else
        {
            std::cout << "Failed to create " << filename << ".jpg" << ".\n";
            exit(1);
        }
    }
    else if (pos2 != -1)
    {
        std::cout << "Floating Point!\n";
        std::ostringstream oss;
        oss << "/root/cppcrashes/crash." << counter << ".jpg";
        std::string filename = oss.str();
        std::ofstream fout(filename, std::ios::binary);
 
        if (fout.is_open())
            {
                fout.seekp(0, std::ios::beg);
                fout.write(&mutated[0], mutated.size());
            }
        else
        {
            std::cout << "Failed to create " << filename << ".jpg" << ".\n";
            exit(1);
        }
    }
}

这两个函数协同工作。get_output函数将一个C++字符串作为参数,并在操作系统上运行该命令并捕获输出。然后,该函数将输出作为字符串返回给调用函数exif()

exif()函数将接收输出并查找分段错误(Segmentation fault)或浮点异常(Floating point exception)错误,如果发现这些错误,将把这些字节写入一个文件并保存为crash..jpg文件。这与我们的Python模糊测试器非常相似。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
//
// simply generates a vector of strings that are our 'magic' values;
//
std::vector vector_gen()
{
    std::vector magic;
 
    using namespace std::string_literals;
 
    magic.push_back("\xff");
    magic.push_back("\x7f");
    magic.push_back("\x00"s);
    magic.push_back("\xff\xff");
    magic.push_back("\x7f\xff");
    magic.push_back("\x00\x00"s);
    magic.push_back("\xff\xff\xff\xff");
    magic.push_back("\x80\x00\x00\x00"s);
    magic.push_back("\x40\x00\x00\x00"s);
    magic.push_back("\x7f\xff\xff\xff");
 
    return magic;
}
 
//
// randomly picks a magic value from the vector and overwrites that many bytes in the image;
//
std::string magic(std::string data, std::vector magic)
{
     
    int vector_size = magic.size();
    int picked_magic_index = rand() % vector_size;
    std::string picked_magic = magic[picked_magic_index];
    int size = (data.length() - 4);
    int picked_data_index = rand() % size;
    data.replace(picked_data_index, magic[picked_magic_index].length(), magic[picked_magic_index]);
 
    return data;
 
}
 
//
// returns 0 or 1;
//
int func_pick()
{
    int result = rand() % 2;
 
    return result;
}

这些函数也与我们的Python实现非常相似。vector_gen()基本上只是创建了我们的“魔术值”向量,然后后续的函数如magic()使用该向量随机选择一个索引,并相应地用变异数据覆盖有效的JPEG数据。

func_pick()非常简单,只返回01,这样我们的模糊测试器可以随机选择bit_flip()magic()来变异我们的有效JPEG。为了保持一致性,让我们的模糊测试器暂时只选择bit_flip(),通过在程序中添加一行临时代码function = 1,这样我们就能与Python测试匹配。

以下是我们的main()函数,它执行我们目前为止的所有代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
int main(int argc, char** argv)
{
 
    if (argc < 3)
    {
        std::cout << "Usage: ./cppfuzz \n";
        std::cout << "Usage: ./cppfuzz Canon_40D.jpg 10000\n";
        return 1;
    }
 
    // start timer
    auto start = std::chrono::high_resolution_clock::now();
 
    // initialize our random seed
    srand((unsigned)time(NULL));
 
    // generate our vector of magic numbers
    std::vector magic_vector = vector_gen();
 
    std::string filename = argv[1];
    int iterations = atoi(argv[2]);
 
    int counter = 0;
    while (counter < iterations)
    {
 
        std::string data = get_bytes(filename);
 
        int function = func_pick();
        function = 1;
        if (function == 0)
        {
            // utilize the magic mutation method; create new jpg; send to exiv2
            std::string mutated = magic(data, magic_vector);
            create_new(mutated);
            exif(mutated,counter);
            counter++;
        }
        else
        {
            // utilize the bit flip mutation; create new jpg; send to exiv2
            std::string mutated = bit_flip(data);
            create_new(mutated);
            exif(mutated,counter);
            counter++;
        }
    }
 
    // stop timer and print execution time
    auto stop = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast(stop - start);
    std::cout << "Execution Time: " << duration.count() << "ms\n";
 
    return 0;
}

[招生]科锐逆向工程师培训(2025年3月11日实地,远程教学同时开班, 第52期)!

最后于 2024-8-29 09:11 被pureGavin编辑 ,原因: 修改内容
收藏
免费 2
支持
分享
赞赏记录
参与人
雪币
留言
时间
PLEBFE
感谢你的贡献,论坛因你而更加精彩!
2024-12-10 01:49
Arahat0
感谢你分享这么好的资源!
2024-8-29 19:53
最新回复 (0)
游客
登录 | 注册 方可回帖
返回

账号登录
验证码登录

忘记密码?
没有账号?立即免费注册