首页
社区
课程
招聘
[原创] 从代码的角度学习AFLSMART的特性(The Novelties of AFLSMART From a code perspective)
2021-11-11 22:01 20656

[原创] 从代码的角度学习AFLSMART的特性(The Novelties of AFLSMART From a code perspective)

2021-11-11 22:01
20656

从代码的角度学习AFLSMART的特性(The Novelties of AFLSMART From a code perspective)

一、简介

基于覆盖率的灰盒测试工具,诸如AFL、AFLFast、AFLGo、Angora等,通过预先定义好的通用变异操作符对种子进行突变,检查输入在经过变异后是否产生了新的分支覆盖,是否足够“有趣”。轻量级的程序插桩将辅助fuzzer对输入用例的“有趣”性做出评估,并挑选其中足够新颖的输入作为下一轮突变的种子文件。

 

由于传统的基于覆盖率的灰盒测试工具,其突变都是比特级别的种子文件内容突变,缺乏对输入结构层面的一定感知能力,因此对于有高度结构化输入的程序,fuzzer经过突变产生的大量畸形输入难以通过格式文件的文件解析阶段。目前聚焦此类问题有两种主要的解决方案:基于字典的突变及动态污点分析。但这两种方案都只是进行文件内容上比特位层级的突变,且字典操作上难以构建合法语义块,故均未能实现对文件结构层级上的突变。

 

因此,针对当前工作对文件结构缺乏感知的问题,Van-Thuan Pham等人在AFL的基础上推出了AFLSMART,可以根据文件内部的层级结构信息产生新的种子文件,提升fuzzer的有效性与可用性。在AFL的基础上,AFLSMART参照了 Peach 的 File Cracker 组件,将文件按块划分,抽象为一个可以用 xml 文件描述的树形的结构,并引入了如下特性:

  • 虚拟结构表示(Virtual Structure)

  • 文件块层次的突变(Smart Mutation Operators)

  • 基于有效性的种子能量调度(Valid-based Power Schedule)

二、虚拟结构表示(Virtual Structure)

针对有高度结构化输入的被测程序,AFLSMART的有效性来自于其文件块层级的突变。为了此种突变的实现,需要对文件的层级结构的表示进行设计。AFLSMART参照了 Peach 的 File Cracker 组件,将文件按块划分,块与块之间按层级结构组成一颗二叉树。File Cracker组件的代码可见于peach-3.0.202.patch。根据在smart-chunks.h中的声明,块内数据域包含id、类型、起始字节偏移、末尾字节偏移、修改标志位,指针域包含同级块、后继块指针:

1
2
3
4
5
6
7
8
9
10
11
12
struct chunk {
  unsigned long
      id; /* The id of the chunk, which either equals its pointer value or, when
             loaded from chunks file, equals to the hashcode of its chunk
             identifer string casted to unsigned long. */
  int type;                /* The hashcode of the chunk type. */
  int start_byte;          /* The start byte, negative if unknown. */
  int end_byte;            /* The last byte, negative if unknown. */
  char modifiable;         /* The modifiable flag. */
  struct chunk *next;      /* The next sibling child. */
  struct chunk *children;  /* The children chunks linked list. */
};

由上可知,chunk在数据结构的层面上其实就是一个二叉树的节点。

 

在AFLSMART的启动过程中,需要用-g选项来指定.xml模板格式文件,此处-g的处理代码在afl-fuzz.c中:

 

在一开始AFLSMART的参数配置阶段,仅用fopen()的方式,检查指定的模板格式文件是否存在

1
2
3
4
5
6
7
8
9
10
11
case 'g':        //han:set up input_model_file
  if (input_model_file) FATAL("Multiple -g options not supported");
 
  input_model_file = optarg;
 
  /* Check if exists */
  FILE * f = fopen(input_model_file, "r");
  if (f) fclose(f);
  else FATAL("File %s does not exist!", input_model_file);
 
  break;

接着,虚拟结构树的构建发生在在afl-fuzz.cfuzz_one()CALIBRATION阶段,其调用代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/*******************************************
 * CALIBRATION (only if failed earlier on) *
 *******************************************/
 
if (queue_cur->cal_failed) {
 
  u8 res = FAULT_TMOUT;
 
  if (queue_cur->cal_failed < CAL_CHANCES) {
 
    res = calibrate_case(argv, queue_cur, in_buf, queue_cycle - 1, 0);
 
    if (res == FAULT_ERROR)
      FATAL("Unable to execute target application");
 
  }
 
  if (stop_soon || res != crash_mode) {
    cur_skipped_paths++;
    goto abandon_entry;
  }
 
}
 
/* Deferred cracking */
if (smart_mode && !queue_cur->chunk && (initial_queue_num > 0
    || UR(100) < (get_cur_time() - last_path_time) / 50)) {
  update_input_structure(queue_cur->fname, queue_cur); 
  --initial_queue_num;
}

其中update_input_structure()进行的即使虚拟结构树的构建工作,其函数实现定义在afl-fuzz.c中,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
/* Add input structure information to the queue entry */
static void update_input_structure(u8* fname, struct queue_entry* q) {
  pid_t pid = 0;
  int pipefd[2];
  FILE* output;
  char line[256];
  int status;
  u8* ifname;
  u8* ofname;
 
  if (model_type == MODEL_PEACH) {
 
    if (pipe(pipefd) < 0) {
            PFATAL("AFLSmart cannot create a pipe to communicate with Peach");
      exit(1);
    }
 
    pid = fork();
    if (pid == 0) {
      close(pipefd[0]);
      dup2(pipefd[1], STDOUT_FILENO);
      dup2(pipefd[1], STDERR_FILENO);
      ifname = alloc_printf("-inputFilePath=%s", fname);
      ofname = alloc_printf("-outputFilePath=%s/chunks/%s.repaired", out_dir,
                            basename(fname));
 
      execlp("peach", "peach", "-1", ifname, ofname, input_model_file, (char*) NULL);
      exit(1); /* Stop the child process upon failure. */
    } else {
      close(pipefd[1]);
      output = fdopen(pipefd[0], "r");
 
      while (fgets(line, sizeof(line), output)) {
        /* Extract validity percentage and update the current queue entry. */
        q->validity = 0;
        if (!strncmp(line, "ok", 2)) {
          q->validity = 100;
          break;
        } else if (!strncmp(line, "error", 5)) {
          char *s = line + 5;
          while (isspace(*s)) { s++; }
          char *start = s;
          while (isdigit(*s)) { s++; }
          *s = '\0';
          if (s != start) {
            q->validity = (u8) atoi(start);
          }
          break;
        }
      }
 
      waitpid(pid, &status, 0);
 
      u8* chunks_fname = alloc_printf("%s/chunks/%s.repaired.chunks", out_dir, basename(fname));
      struct chunk *chunk;
 
      get_chunks(chunks_fname, &chunk);
      q->chunk = chunk;
      q->cached_chunk = copy_chunks(chunk);
 
      fclose(output);
      ck_free(chunks_fname);
    }
 
  } else {
    /// NOT SUPPORTED
    PFATAL("AFLSmart currently only supports Peach models! Please use -w peach option");
  }
 
  parsed_inputs++;
  validity_avg += (s8)(q->validity - validity_avg) / parsed_inputs;
  q->parsed = 1;
}

由此,我们可以看到,在CALIBRATION阶段,AFLSMART从种子队列获取一个种子文件,根据先前指定的.xml模板文件,利用peach的File Cracker工具解析该种子文件,并将解析结果格式化地输出为一个.repaired.chunks文件。以wav文件为例,解析完毕后的输出文件内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0,352119,Wav,Enabled
0,3,Wav~Signature,Enabled
4,7,Wav~Number,Enabled
8,11,Wav~WAVE,Enabled
12,352119,Wav~Chunks,Enabled
12,35,Wav~Chunks~Chunks,Enabled
12,35,Wav~Chunks~Chunks~OtherChunk,Enabled
12,15,Wav~Chunks~Chunks~OtherChunk~ID,Enabled
16,19,Wav~Chunks~Chunks~OtherChunk~Size,Enabled
20,35,Wav~Chunks~Chunks~OtherChunk~Data,Enabled
36,352031,Wav~Chunks~Chunks_1,Enabled
36,352031,Wav~Chunks~Chunks_1~DataChunk,Enabled
36,39,Wav~Chunks~Chunks_1~DataChunk~ID,Enabled
40,43,Wav~Chunks~Chunks_1~DataChunk~Size,Enabled
44,352031,Wav~Chunks~Chunks_1~DataChunk~Data,Enabled
352032,352071,Wav~Chunks~Chunks_2,Enabled
352032,352071,Wav~Chunks~Chunks_2~OtherChunk,Enabled
352032,352035,Wav~Chunks~Chunks_2~OtherChunk~ID,Enabled
352036,352039,Wav~Chunks~Chunks_2~OtherChunk~Size,Enabled
352040,352071,Wav~Chunks~Chunks_2~OtherChunk~Data,Enabled
352072,352119,Wav~Chunks~Chunks_3,Enabled
352072,352119,Wav~Chunks~Chunks_3~OtherChunk,Enabled
352072,352075,Wav~Chunks~Chunks_3~OtherChunk~ID,Enabled
352076,352079,Wav~Chunks~Chunks_3~OtherChunk~Size,Enabled
352080,352119,Wav~Chunks~Chunks_3~OtherChunk~Data,Enabled

接着在update_input_structure()内,AFLSMART会根据解析过程File Cracker的输出信息,计算种子文件的有效度,并根据.repaired.chunks文件,调用get_chunks()展开虚拟结构树的构建工作。get_chunks()的实现在smart-chunks.c中,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
void get_chunks(char *filespec, struct chunk **data_chunks) {
  FILE *chunk_file;
  char *line;
  size_t n;
 
  *data_chunks = NULL;
 
  if ((chunk_file = fopen(filespec, "r")) == NULL)
    return;
 
  do {
    line = NULL;
    n = 0;
    if (getline(&line, &n, chunk_file) == -1) {
      free(line);
      line = NULL;
    } else {
      add_path(data_chunks, line);
      if (line != NULL) {
        free(line);
      }
    }
  } while (line != NULL);
 
  fclose(chunk_file);
}

可以看出,get_chunks()内调用add_path(),逐行解析.repaired.chunks文件的内容,构建相应的虚拟结构树,其构建过程即数据结构中常见的二叉树的构建,在此不再赘述。以上文中的.repaired.chunks文件为例,其虚拟结构树的图示如下:

 

p01

三、文件块层次的突变(Smart Mutation Operators)

基于上节的虚拟结构表示,AFLSMART定义了三种文件块层次的突变:块删除(smart deletion)、同类块添加(smart addition)、同类块替换(smart splicing)。

 

其实现代码可见afl-fuzz.c中的higher_order_fuzzing(),代码很长,在此不完全列举,下文介绍各个文件块突变因子时逐步列举。

 

流程上,higher_order_fuzzing()的主要步骤有如下几个:

  1. 随机挑选突变因子
  2. 虚拟结构树结构的线性化收集
  3. 根据突变因子,从各级数组中随机挑选一个块,进行突变操作
  4. 调整块起始字节偏移与末尾字节偏移

3.1 随机挑选突变因子

有50%的概率不进行突变,块删除、块添加、同类块替换的概率均为16.7%,其实现代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
u32 r = UR(12);
    u32 s = 3;
 
    if (model_type == MODEL_PEACH) {
      if (r <= 5) {
        if (r <= 1) {
          s = 0;
        } else if (r <= 3) {
          s = 1;
        } else {
          s = 2;
        }
      }
    }
 
    switch (s) {
 
    /* 50% chance of no higher-order mutation */
    /*han: r = 0 ... 1 ->  s = 0, delete chunk, 16.7% chance
      han: r = 2 ... 3 ->  s = 1, splice chunk, 16.7% chance
      han: r = 4 ... 5 ->  s = 2, add chunk,    16.7% chance*/
    case 3 ... 5:
      break;
 
    case 0: {
        /* Delete chunk */
    }break;
    case 1: {
        /* Splice chunk */
    }break;
    case 2: {
        /* Adopt a child from a chunk of the same type */
    }break;

3.2 虚拟结构树节点的线性化收集

higher_order_fuzzing()中,AFLSMART调用linearize_chunks(),将当前虚拟结构树中第一、二层级及更深层级的chunk的指针分别收集到first_chunks_arraysecond_chunks_arraydeeper_chunks_array中,并将对应的chunk数目收集到total_first_chunkstotal_second_chunkstotal_deeper_chunks

 

linearize_chunks()的实现代码在afl-fuzz.c中,实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
void linearize_chunks(struct chunk *c, struct chunk ***first_chunks_arr,
                      struct chunk ***second_chunks_arr,
                      struct chunk ***deeper_chunks_arr,
                      u32 *first_chunks_number, u32 *second_chunks_number,
                      u32 *deeper_chunks_number) {
  u32 first_level, second_level;
 
  *first_chunks_number = 0;
  *second_chunks_number = 0;
  *deeper_chunks_number = 0;
  *first_chunks_arr = (struct chunk **)malloc((1 << LINEARIZATION_UNIT) *
                                              sizeof(struct chunk *));
  *second_chunks_arr = (struct chunk **)malloc((1 << LINEARIZATION_UNIT) *
                                               sizeof(struct chunk *));
  *deeper_chunks_arr = (struct chunk **)malloc((1 << LINEARIZATION_UNIT) *
                                               sizeof(struct chunk *));
  if (model_type == MODEL_PEACH) {
    first_level = 1;
    if (composite_mode) first_level += 2;
  }
 
  second_level = first_level + 1;
 
  linearize_chunks_recursively(first_level, second_level, c, first_chunks_arr,
                               second_chunks_arr, deeper_chunks_arr,
                               first_chunks_number, second_chunks_number,
                               deeper_chunks_number, 0);
}
 
struct chunk *copy_children_with_new_offset(int new_start_byte,
                                            int old_start_byte,
                                            struct chunk *c) {
  struct chunk *sibling = c;
  struct chunk *ret = NULL;
 
  while (sibling) {
    struct chunk *children = copy_children_with_new_offset(
        new_start_byte, old_start_byte, sibling->children);
 
    struct chunk *new = (struct chunk *)malloc(sizeof(struct chunk));
    new->id = sibling->id;
    new->type = sibling->type;
    new->start_byte = (sibling->start_byte - old_start_byte) + new_start_byte;
    new->end_byte = (sibling->end_byte - old_start_byte) + new_start_byte;
    new->modifiable = sibling->modifiable;
    new->next = ret;
    new->children = children;
    ret = new;
 
    sibling = sibling->next;
  }
 
  return ret;
}
 
void linearize_chunks_recursively(
    u32 first_level, u32 second_level, struct chunk *c,
    struct chunk ***first_chunks_arr, struct chunk ***second_chunks_arr,
    struct chunk ***deeper_chunks_arr, u32 *first_chunks_number,
    u32 *second_chunks_number, u32 *deeper_chunks_number, u32 depth) {
  struct chunk *sibling = c;
 
  while (sibling) {
    linearize_chunks_recursively(
        first_level, second_level, sibling->children, first_chunks_arr,
        second_chunks_arr, deeper_chunks_arr, first_chunks_number,
        second_chunks_number, deeper_chunks_number, depth + 1);
 
    if (depth == first_level) {
      if (add_chunk_to_array(sibling, first_chunks_arr, first_chunks_number))
        return;
    } else if (depth == second_level) {
      if (add_chunk_to_array(sibling, second_chunks_arr, second_chunks_number))
        return;
    } else if (depth > second_level) {
      if (add_chunk_to_array(sibling, deeper_chunks_arr, deeper_chunks_number))
        return;
    }
    sibling = sibling->next;
  }
}

由此可见,linearize_chunks()会根据虚拟结构树中各个chunk的层级,将其指针递归地收集到各级数组中,实现线性化。倘若在执行命令中加入了-c选项,即启用composite_mode时,AFLSMART将对更深层级的块进行线性化收集工作。

3.3 突变操作:块删除(Smart Deletion)

顾名思义,块删除操作即移除种子文件中的某个块。首先会进行虚拟树节点的线性化收集工作,即调用linearize_chunks()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
u32 del_from, del_len;
struct chunk **first_chunks_array = NULL;
struct chunk **second_chunks_array = NULL;
struct chunk **deeper_chunks_array = NULL;
u32 total_first_chunks = 0;
u32 total_second_chunks = 0;
u32 total_deeper_chunks = 0;
 
if (*temp_len < 2)
  break;
 
del_from = del_len = 0;
linearize_chunks(current_chunk, &first_chunks_array, &second_chunks_array,
                 &deeper_chunks_array, &total_first_chunks,
                 &total_second_chunks, &total_deeper_chunks);
if (total_first_chunks <= 0) {
  if (first_chunks_array != NULL)
    free(first_chunks_array);
 
  if (second_chunks_array != NULL)
    free(second_chunks_array);
 
  if (deeper_chunks_array != NULL)
    free(deeper_chunks_array);
 
  break;
}

在完成线性化收集工作后,AFLSMART将依次从first_chunks_arraysecond_chunks_arraydeeper_chunks_array中,调用get_chunk_to_delete(),随机挑选出一个起始字节偏移、末尾字节偏移已知的块,作为待删除的目标块,并获取该块的起始字节偏移以及长度,便于块删除后的字节偏移调整工作。这个过程的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
struct chunk *chunk_to_delete = NULL;
 
/* Make sure we initialize */
del_len = 0;
 
if (total_first_chunks > 1)
  chunk_to_delete = get_chunk_to_delete(
      first_chunks_array, total_first_chunks, &del_from, &del_len);
 
if (first_chunks_array != NULL)
  free(first_chunks_array);
 
/* If chunk not found, we try the second-level chunks */
if (del_len == 0 &&  total_second_chunks > 1) {
  chunk_to_delete = get_chunk_to_delete(
      second_chunks_array, total_second_chunks, &del_from, &del_len);
}
 
if (second_chunks_array != NULL)
  free(second_chunks_array);
 
/* If chunk not found, we try the deeper-level chunks */
if (del_len == 0 && total_deeper_chunks > 1) {
  chunk_to_delete = get_chunk_to_delete(
      deeper_chunks_array, total_deeper_chunks, &del_from, &del_len);
}
 
if (deeper_chunks_array != NULL)
  free(deeper_chunks_array);
 
if (del_len != 0 && del_len < *temp_len) {
  if (smart_log_mode) {
    smart_log("BEFORE DELETION:\n");
    if (model_type == MODEL_PEACH)
      smart_log_tree_with_data_hex(current_queue_entry->chunk, (*out_buf));
 
    smart_log("DELETED CHUNK:\n");
    smart_log("Type: %d Start: %d End: %d Modifiable: %d\n",
              chunk_to_delete->type, chunk_to_delete->start_byte,
              chunk_to_delete->end_byte, chunk_to_delete->modifiable);
    if (model_type == MODEL_PEACH)
      smart_log_n_hex(del_len, (*out_buf) + del_from);
  }

get_chunk_to_delete()实现在afl-fuzz.c中,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
struct chunk *get_chunk_to_delete(struct chunk **chunks_array, u32 total_chunks,
                                  u32 *del_from, u32 *del_len) {
  struct chunk *chunk_to_delete = NULL;
  u8 i;
 
  *del_from = 0;
  *del_len = 0;
 
  for (i = 0; i < 3; ++i) {
    int start_byte;
    u32 chunk_id = UR(total_chunks);
 
    chunk_to_delete = chunks_array[chunk_id];
    start_byte = chunk_to_delete->start_byte;
 
    /* It is possible that either the start or the end bytes are
       unknown (has negative values), so we actually perform the
       deletion only when these bounds are known. */
    if (chunk_to_delete->modifiable && start_byte >= 0 &&
        chunk_to_delete->end_byte >= start_byte) {
      /* Note that the length to be deleted here is 1 more than
         end_byte - start_byte, since the end_byte is exactly the
         position of the last byte, not one more than the last
         byte. */
      *del_from = start_byte;
      *del_len = chunk_to_delete->end_byte - start_byte + 1;
      break;
    }
  }
 
  return chunk_to_delete;
}

待删除的块挑选完毕后,即开始块删除操作。调用memmove(),将待删除目标块之后的内存,拷贝至待删目标块的位置;调用search_and_destory_chunk(),在虚拟结构树上删除待删除目标块对应的子树,并调整相关chunk的起始字节偏移与末尾字节偏移。此部分的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
memmove((*out_buf) + del_from, (*out_buf) + del_from + del_len,
        (*temp_len) - del_from - del_len + 1);
(*temp_len) -= del_len;
 
current_queue_entry->chunk = search_and_destroy_chunk(
    current_queue_entry->chunk, chunk_to_delete, del_from, del_len);
changed_structure = 1;
 
if (smart_log_mode) {
  smart_log("AFTER DELETION:\n");
  if (model_type == MODEL_PEACH)
    smart_log_tree_with_data_hex(current_queue_entry->chunk, (*out_buf));

其中,search_and_destory_chunk()负责虚拟结构树上递归地进行目标块及其子块的删除与块偏移调整工作,在smart_chunks.c中实现,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
struct chunk *search_and_destroy_chunk(struct chunk *c,
                                       struct chunk *target_chunk,
                                       int start_byte, unsigned size) {
  struct chunk *sibling = c;
  struct chunk *ret = c;
  struct chunk *prev = NULL;
 
  while (sibling) {
 
    if (sibling == target_chunk) {
      struct chunk *tmp = sibling->next;
 
      if (ret == sibling)
        ret = tmp;
      else
        prev->next = tmp;
 
      delete_chunks(sibling->children);
      free(sibling);
 
      reduce_byte_positions(tmp, start_byte, size);
      sibling = tmp;
      continue;
    }
 
    if (sibling->start_byte >= 0 && sibling->start_byte > start_byte)
      (sibling->start_byte) -= size;
 
    if (sibling->end_byte >= 0 && sibling->end_byte >= start_byte)
      (sibling->end_byte) -= size;
 
    sibling->children = search_and_destroy_chunk(
        sibling->children, target_chunk, start_byte, size);
 
    prev = sibling;
    sibling = sibling->next;
  }
 
  return ret;
}

最终实现的效果如其论文中所示的一致:

 

p02

3.4 突变操作:同类块替换(Smart Deletion)

同类块替换操作,可以把当前种子文件中的某一块替换为另一种子文件的同类型块。

 

该突变操作首先会随机选择一个种子文件作为源种子文件,其代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
struct queue_entry *source_entry;
u32 tid;
u8 attempts = 20;
u32 type, target_len;
u32 smart_splicing_with = -1;
int target_start_byte = 0;
int source_start_byte = 0;
struct worklist *source;
struct chunk **first_chunks_array = NULL;
struct chunk **second_chunks_array = NULL;
struct chunk **deeper_chunks_array = NULL;
u32 total_first_chunks = 0;
u32 total_second_chunks = 0;
u32 total_deeper_chunks = 0;
 
do {
  tid = UR(queued_paths);
  smart_splicing_with = tid;
  source_entry = queue;
 
  while (tid >= 100) {
    source_entry = source_entry->next_100;
    tid -= 100;
  }
  while (tid--)
    source_entry = source_entry->next;
 
  while (source_entry &&
         (!source_entry->chunk || source_entry == current_queue_entry)) {
    source_entry = source_entry->next;
    smart_splicing_with++;
  }
  attempts--;
 
} while (!source_entry && attempts);
 
if (attempts == 0)
  break;

可以看出,代码中通过随机选择一个tid、在种子队列中遍历查找的形式来随机挑选一个测试用例,作为同类块替换的源种子文件。

 

接着,进行虚拟树节点的线性化收集工作,调用linearize_chunks(),此部分在上文中已有叙述。

 

在线性化收集虚拟树节点后,AFLSMART开始随机地从当前种子文件中挑选一个将被替换的目标块,其流程代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
struct chunk *target_chunk =
    get_target_to_splice(first_chunks_array, total_first_chunks,
                         &target_start_byte, &target_len, &type);
 
if (first_chunks_array != NULL)
  free(first_chunks_array);
 
if (target_len <= 0 && total_second_chunks > 0) {
  target_chunk =
      get_target_to_splice(second_chunks_array, total_second_chunks,
                           &target_start_byte, &target_len, &type);
}
 
if (second_chunks_array != NULL)
  free(second_chunks_array);
 
if (target_len <= 0 && total_deeper_chunks > 0) {
  target_chunk =
      get_target_to_splice(deeper_chunks_array, total_deeper_chunks,
                           &target_start_byte, &target_len, &type);
}
 
if (deeper_chunks_array != NULL)
  free(deeper_chunks_array);

其中get_target_to_splice()是一个实现在afl-fuzz.c的一个功能函数,其功能在于随机地从块指针数组中挑选一个可修改的块,输出该块的指针、起始字节偏移、块长度以及块的类型。get_target_to_splice()的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
struct chunk *get_target_to_splice(struct chunk **chunks_array,
                                   u32 total_chunks, int *target_start_byte,
                                   u32 *target_len, u32 *type) {
  struct chunk *target_chunk = NULL;
  u8 i;
 
  *target_start_byte = 0;
  *target_len = 0;
  *type = 0;
 
  for (i = 0; i < 3; ++i) {
    u32 chunk_id = UR(total_chunks);
    target_chunk = chunks_array[chunk_id];
    *target_start_byte = target_chunk->start_byte;
 
    if (target_chunk->modifiable && *target_start_byte >= 0 &&
        target_chunk->end_byte >= *target_start_byte) {
      *target_len = target_chunk->end_byte - *target_start_byte + 1;
      *type = target_chunk->type;
      break;
    }
  }
 
  return target_chunk;
}

挑选完当前种子文件的待替换块后,根据待替换块的类型,AFLSMART调用get_chunks_of_type(),在源种子文件的虚拟结构树中递归地寻找与源块类型相同、长度不大于目标块的块,复制这些块并将其副本链接成单链表结构的worklist。同类块替换操作的源块就在这个worklist中随机挑选。其流程代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
struct worklist *source_init;
int same_type_chunks_num = 0;
u32 source_len = 0;
 
/* Find same type and non-bigger size in source */
source = get_chunks_of_type(source_entry->chunk, type, target_len,
                            &same_type_chunks_num);
source_init = source;
 
if (source != NULL && same_type_chunks_num > 0) {
  /* Insert source chunk into out_buf. */
  u32 chunk_id = UR(same_type_chunks_num);
 
  source_len = 0;
  while (source) {
    if (chunk_id == 0) {
      source_start_byte = source->chunk->start_byte;
      if (source_start_byte >= 0 &&
          source->chunk->end_byte >= source_start_byte) {
        source_len = source->chunk->end_byte - source_start_byte + 1;
      }
      break;
    }
 
    chunk_id--;
    source = source->next;
  }

随后,同类型的源块以及目标块已经挑选完毕,正式开始同类块的替换操作:

  1. 读取源种子文件,便于获取源块的数据内容,代码如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    s32 fd;
    u8 *source_buf;
     
    /* Read the testcase into a new buffer. */
     
    fd = open(source_entry->fname, O_RDONLY);
     
    if (fd < 0)
      PFATAL("Unable to open '%s'", source_entry->fname);
     
    source_buf = ck_alloc_nozero(source_entry->len);
     
    ck_read(fd, source_buf, source_entry->len, source_entry->fname);
     
    close(fd);
  2. 调用memcpy()memmove(),在当前种子文件的缓冲区内进行块内容的替换以及种子文件大小的调整:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    /* Apply the splicing to the output buffer */
    u32 move_amount = target_len - source_len;
     
    memcpy((*out_buf) + target_start_byte,
           source_buf + source_start_byte, source_len);
     
    memmove((*out_buf) + target_start_byte + source_len,
            (*out_buf) + target_start_byte + target_len,
            (*temp_len) - target_start_byte - target_len + 1);
     
    (*temp_len) -= move_amount;
  3. 块内容替换完毕后,调整当前种子文件的虚拟结构树,其中的操作包括删除目标块节点及其子树(delete_chunks())、块内字节偏移的调整(reduce_byte_positions())、新块节点及其子树的整合(copy_children_with_new_offset()):

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    //han: update the virtual structure
    struct chunk *target_next = target_chunk->next;
    unsigned long target_id = target_chunk->id;
    delete_chunks(target_chunk->children);
    target_chunk->children = NULL;
    reduce_byte_positions(current_queue_entry->chunk, target_start_byte,
                          move_amount);
     
    memcpy(target_chunk, source->chunk, sizeof(struct chunk));
    target_chunk->id = target_id;
    target_chunk->start_byte = target_start_byte;
    target_chunk->end_byte = target_start_byte + source_len - 1;
    target_chunk->next = target_next;
    target_chunk->children = copy_children_with_new_offset(
        target_start_byte, source->chunk->start_byte,
        source->chunk->children);
    changed_structure = 1;
     
    /* The source buffer is no longer needed */
    ck_free(source_buf);

最终实现的效果如其论文中所示的一致:

 

p03

 

与图片稍许不同的,代码中要求替换过程中,源块的长度不应当大于目标块的长度,而论文里的图片刚好相反了←_←

3.5 突变操作:同类块添加(smart addition)

同类块添加操作,可以在当前种子文件中的某一块(头块)的末尾,添加另一种子文件的同类型块(尾块)。

 

同样地,该突变操作首先进行虚拟树节点的线性化收集工作,调用linearize_chunks(),此部分在上文中已有叙述。

 

接着调用get_parent_to_insert_child(),从各级块指针数组中随机挑选一个有子块的块作为块添加的父块,并获取其起始字节偏移、块长以及块类别,此部分代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
struct chunk *target_parent_chunk =
    get_parent_to_insert_child(first_chunks_array, first_total_chunks,
                               &target_start_byte, &target_len, &type);
 
if (first_chunks_array != NULL)
  free(first_chunks_array);
 
if (target_len <= 0 && second_total_chunks > 0) {
  target_parent_chunk =
      get_parent_to_insert_child(second_chunks_array, second_total_chunks,
                                 &target_start_byte, &target_len, &type);
}
 
if (second_chunks_array != NULL)
  free(second_chunks_array);
 
if (target_len <= 0 && deeper_total_chunks > 0) {
  target_parent_chunk =
      get_parent_to_insert_child(deeper_chunks_array, deeper_total_chunks,
                                 &target_start_byte, &target_len, &type);
}
 
if (deeper_chunks_array != NULL)
  free(deeper_chunks_array);

get_parent_to_insert_child()afl-fuzz.c中有所实现,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
struct chunk *get_parent_to_insert_child(struct chunk **chunks_array,
                                         u32 total_chunks,
                                         int *target_start_byte,
                                         u32 *target_len, u32 *type) {
  struct chunk *target_parent_chunk = NULL;
 
  *target_start_byte = 0;
  *target_len = 0;
  *type = 0;
 
  for (u8 i = 0; i < 3; ++i) {
    u32 chunk_id = UR(total_chunks);
    target_parent_chunk = chunks_array[chunk_id];
    *target_start_byte = target_parent_chunk->start_byte;
    if (*target_start_byte >= 0 &&
        target_parent_chunk->end_byte >= *target_start_byte &&
        target_parent_chunk->children != NULL) {
      *target_len = target_parent_chunk->end_byte - *target_start_byte + 1;
      *type = target_parent_chunk->type;
      break;
    }
  }
 
  return target_parent_chunk;
}

在获取头块成功后,AFLSMART从当前种子队列中随机挑选一个种子文件作为同类块添加的源种子文件,并根据头块的类型,从源种子文件中列举出同类型的worklist块单链表,随机挑选出一个尾块。此步骤与同类块替换中的一致,不再赘述。

 

选定头块和尾块完毕后,则正式开始进行同类块的添加操作:

  1. 读取源种子文件,便于获取尾块的数据内容,代码如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    /* Read the testcase into a new buffer. */
     
    fd = open(source_entry->fname, O_RDONLY);
     
    if (fd < 0)
      PFATAL("Unable to open '%s'", source_entry->fname);
     
    source_buf = ck_alloc_nozero(source_entry->len);
     
    ck_read(fd, source_buf, source_entry->len, source_entry->fname);
     
    close(fd);
  2. 调用memmove()memcpy(),在当前种子文件的缓冲区内腾出头块末尾的内存空间,将尾块数据复制进去,并对种子文件大小进行调整:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    /* Move the data around */
    if (target_parent_chunk->end_byte + 1 < *temp_len) {
      memmove((*out_buf) + target_parent_chunk->end_byte +
                  source_child_chunk_size + 1,
              (*out_buf) + target_parent_chunk->end_byte + 1,
              *temp_len - (target_parent_chunk->end_byte + 1));
    }
    memcpy((*out_buf) + target_parent_chunk->end_byte + 1,
           source_buf + source_child_chunk->start_byte,
           source_child_chunk_size);
     
    *temp_len += source_child_chunk_size;
  3. 当前种子文件缓冲区调整完毕后,即开始对其虚拟结构树进行调整。该过程涉及到块内字节偏移的调整(increase_byte_positions_except_target_children)、新块节点及其子树的构建(copy_children_with_new_offset):

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    /* Update the chunks */
    increase_byte_positions_except_target_children(
        current_queue_entry->chunk, target_parent_chunk,
        target_start_byte, source_child_chunk_size);
     
    /* Create new chunk node */
    struct chunk *new_child_chunk =
        (struct chunk *)malloc(sizeof(struct chunk));
    new_child_chunk->id = (unsigned long)new_child_chunk;
    new_child_chunk->type = source_child_chunk->type;
    new_child_chunk->start_byte = target_parent_end_byte + 1;
    new_child_chunk->end_byte =
        target_parent_end_byte + source_child_chunk_size;
    new_child_chunk->modifiable = source_child_chunk->modifiable;
    new_child_chunk->next = target_parent_chunk->children;
    new_child_chunk->children = copy_children_with_new_offset(
        new_child_chunk->start_byte, source_child_chunk->start_byte,
        source_child_chunk->children);
    target_parent_chunk->children = new_child_chunk;
     
    /* Flag that we have changed the structure */
    changed_structure = 1;
     
    /* Free the source buffer */
    ck_free(source_buf);

最终实现的效果如其论文中所示的一致:

 

p04

四、基于有效性的种子能量调度(Valid-based Power Schedule)

在AFLSMART中,能量指一个种子文件在种子队列中被选中所需的等待时间,能量越高,等待时间越短,优先级越高。在AFL中,体积小、执行时间短的种子文件将被分配更多的能量;而AFLSMART通过Peach获取种子文件的有效性,有效性越高的种子将获得越高的能量,具体的能量分配公式如下:

 

p05

 

其中,$s$表示种子文件;$p_v(s)$表示种子文件$s$在AFLSMART中被分配的能量;$p(s)$表示种子文件$s$在AFL中被分配的能量;$v(s)$表示种子文件$s$的有效性,由peach获取;$U$表示设定的最大能量。

 

可以看出,种子的有效性越高,AFLSMART为其分配的能量越多。

 

种子有效性计算的对应代码位于update_input_structure(),实现在afl-fuzz.c,在前文虚拟结构表示一节中已有表述。

后记

相较于传统的对文件内容进行比特、字节级别突变的fuzzer,AFLSMART实现的块级突变还是挺有意思的,在此写下自己的一些学习总结。新人帖,无论是内容还是排版等,都欢迎大家批评指正。(~ ̄▽ ̄)~

 

AFLSMART项目地址:https://github.com/aflsmart/aflsmart

 

AFLSMART文章:https://thuanpv.github.io/publications/TSE19_aflsmart.pdf


[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课

最后于 2021-11-12 09:25 被gaveu屯烫烫编辑 ,原因: 增加关于Peach中File Cracker组件源码的说明
收藏
点赞5
打赏
分享
最新回复 (3)
雪    币: 12744
活跃值: (16282)
能力值: (RANK:730 )
在线值:
发帖
回帖
粉丝
有毒 10 2021-11-12 11:44
2
0
牛皮!
雪    币: 12744
活跃值: (16282)
能力值: (RANK:730 )
在线值:
发帖
回帖
粉丝
有毒 10 2021-11-12 11:51
4
0
写这论文的大佬的其他几个项目也很厉害,可以把他的论文全部搞一遍
游客
登录 | 注册 方可回帖
返回