首页
社区
课程
招聘
[原创]指令级工具Dobby源码阅读
2022-6-29 20:14 22149

[原创]指令级工具Dobby源码阅读

2022-6-29 20:14
22149

近期由于生活所迫阅读了Dobby的源码,正好在此梳理一下,加深印象,也给后面需要的人一点微小的帮助。Dobby一共两个功能,其一是inlinehook,其二是指令插桩,两者原理差不多,主要介绍指令插桩。所谓指令插桩,就是在任意一条指令(函数头或者函数内部都行),进行插桩,执行到这条指令的时候,会去执行我们定义的回调函数,然后再回来执行原来的指令流。使用方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
int res_instument = DobbyInstrument((void *) addr, offset_name_handler);
//handler即我们自定义的回调
 
//RegisterContext为寄存器上下文,HookEntrtInfo为hook一些必要信息,比如hook地址等
void offset_name_handler(RegisterContext *ctx, const HookEntryInfo *info)
typedef struct _RegisterContext {
  uint32_t dummy_0;
  uint32_t dummy_1;
 
  uint32_t dummy_2;
  uint32_t sp;
 
  union {
    uint32_t r[13];
    struct {
      uint32_t r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12;
    } regs;
  } general;
 
  uint32_t lr;
} RegisterContext
 
//HookEntryInfo为hook地址及id
typedef struct _HookEntryInfo {
  int hook_id;
  union {
    void *target_address;
    void *function_address;
    void *instruction_address;
  };
} HookEntryInfo;

## # 工作原理

所谓听君一席话,胜读十年书;看君一张图,胜过十席话。用图来说明最好不过了,我在阅读的过程中也是一边梳理一边画图。
图片描述
被插桩指令处被替换为

1
2
3
4
5
6
7
8
------------------------------------------------------------------------------
process 6089
0x9d639d32 nop
0x9d639d34 ldr.w pc, [pc, #-0x0]
0x9d639d38  //地址 0xcea0a0ac
0x9d639d38            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  ac a0 a0 ce                                      ....
------------------------------------------------------------------------------

arm处理器采用指令流水技术,即取指译码执行三阶段同步进行,pc寄存器指向的是正在取指的指令,arm模式中为当前执行的指令地址+8,thhumb模式中为当前位置+4,故而上面的ldr执行的时候,pc寄存器值为ldr指令位置+4,所以ldr,pc,[pc,-0x0]刚好是把下一条内容放入pc中,即跳转了;这种跳转方式支持的范围是一个寄存器的宽度,也就是32位,4g内存,linux进程的虚拟地址空间好像也是4g,这样就可以进程全地址跳转了。那会跳转到哪里呢,跳转到
prologue_dispatch_bridge
0xcea0a0ac

1
2
3
4
5
6
0xcea0a0ac ldr ip, [pc]
0xcea0a0b0 ldr pc, [pc]
0xcea0a0b4  //地址 0xa2305b80
0xcea0a0b8  //地址 0xcea0a000
0xcea0a0b4            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  80 5b 30 a2 00 a0 a0 ce

主要做了两件事,第一,把0xa2305b80放到ip寄存器,第二,跳转到0xcea0a000;注意,这里的是arm模式的指令,pc偏移是8
其中,0xcea0a000就是
closure bridge上半场

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
0xcea0a000 sub sp, sp, #0x38
0xcea0a004 str lr, [sp, #0x34]
0xcea0a008 str ip, [sp, #0x30]
0xcea0a00c str fp, [sp, #0x2c]
0xcea0a010 str sl, [sp, #0x28]
0xcea0a014 str sb, [sp, #0x24]
0xcea0a018 str r8, [sp, #0x20]
0xcea0a01c str r7, [sp, #0x1c]
0xcea0a020 str r6, [sp, #0x18]
0xcea0a024 str r5, [sp, #0x14]
0xcea0a028 str r4, [sp, #0x10]
0xcea0a02c str r3, [sp, #0xc]
0xcea0a030 str r2, [sp, #8]
0xcea0a034 str r1, [sp, #4]
0xcea0a038 str r0, [sp]
0xcea0a03c add r0, sp, #0x38
0xcea0a040 sub sp, sp, #8
0xcea0a044 str r0, [sp, #4]
0xcea0a048 sub sp, sp, #8
0xcea0a04c mov r0, sp
0xcea0a050 mov r1, ip
0xcea0a054 bl #0xcea0a05c
0xcea0a058 b #0xcea0a064
0xcea0a05c ldr pc, [pc, #-4]
0xcea0a060 //地址 0x9d2b43e1
0xcea0a060            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  e1 43 2b 9d

这里0xcea0a060处的0x9d2b43e1是高层handler,高层handler会调用我们自定义的handler,就是它
instrument_call_forward_handler

1
2
3
4
5
6
7
8
9
10
11
12
13
14
void instrument_call_forward_handler(RegisterContext *ctx, HookEntry *entry) {
  DynamicBinaryInstrumentRouting *route = (DynamicBinaryInstrumentRouting *)entry->route;
  if (route->handler) {
    DBICallTy handler;
    HookEntryInfo entry_info;
    entry_info.hook_id = entry->id;
    entry_info.instruction_address = entry->instruction_address;
    handler = (DBICallTy)route->handler;
    (*handler)(ctx, (const HookEntryInfo *)&entry_info);
  }
 
  // set prologue bridge next hop address with origin instructions that have been relocated(patched)
  set_routing_bridge_next_hop(ctx, entry->relocated_origin_instructions);
}

这个handler除了调用我们的handler,还做了一件苟且的事情,后面会说到
梳理一下这个closure bridge,首先保存寄存器环境,然后到地址0xcea0a054时,用bl指令跳到 0xcea0a05c,0xcea0a05c通过ldr方式找到高层handler地址并且调用,注意,bl指令会把下一条指令地址,即0xcea0a058放入lr寄存器,当bl跳到指定函数并且执行之后,函数会返回到lr寄存器保存的地址,即0xcea0a058 b #0xcea0a064,看看0xcea0a064内容
closure bridge下半场

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
0xcea0a064 add sp, sp, #8
0xcea0a068 add sp, sp, #8
0xcea0a06c pop {r0}
0xcea0a070 pop {r1}
0xcea0a074 pop {r2}
0xcea0a078 pop {r3}
0xcea0a07c pop {r4}
0xcea0a080 pop {r5}
0xcea0a084 pop {r6}
0xcea0a088 pop {r7}
0xcea0a08c pop {r8}
0xcea0a090 pop {sb}
0xcea0a094 pop {sl}
0xcea0a098 pop {fp}
0xcea0a09c pop {ip}
0xcea0a0a0 pop {lr}
0xcea0a0a4 mov pc, ip

做的事情很平常,就是把之前上半场保存的寄存器出栈,同时恢复栈平衡;只有一点不平常,就是最后一条mov,pc,ip,跳到ip寄存存保存的地址,那么ip寄存起保存的地址是啥呢,还记得上文说的苟且之事吗?instrument_call_forward_handler函数的最后一句

1
2
3
4
5
6
// set prologue bridge next hop address with origin instructions that have been relocated(patched)
  set_routing_bridge_next_hop(ctx, entry->relocated_origin_instructions);
 
void set_routing_bridge_next_hop(RegisterContext *ctx, void *address) {
  *reinterpret_cast<void **>(&ctx->general.regs.r12) = address;
}

就是把entry->relocated_origin_instructions的内容赋给r12寄存器,这个entry->relocated_origin_instructions就是原始指令的重定位之后的位置,因为原始指令被我们patch成了ldr pc,[pc,-4]以及一条地址,这些被patch的指令会被修复好,放在entry->relocated_origin_instructions(指令修复问题后文继续说),执行完修复好的原始指令之后,会跳回到被patch的原始指令之后的那些指令,继续执行,这个过程大致如下
原指令
图片描述
因为是patch需要至少8字节,而这里原始指令是thumb,所以patch了四条,修复好的
重定位后的指令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
------------------------------------------------------------------------------
process 6089
0xcea0a0c0 nop
0xcea0a0c2 nop
0xcea0a0c4 push {r0, r1, r2, lr}
0xcea0a0c6 nop
0xcea0a0c8 cbz r0, #0xcea0a0cc
0xcea0a0ca nop
0xcea0a0cc b.w #0xcea0a0d0
0xcea0a0d0 ldr.w pc, [pc, #0x14]  0xcea0a0d0 + 0x14+thumb_pc_offset(4)=0xcea0a0e8,即 0x9d639d45
0xcea0a0d4 nop
0xcea0a0d6 nop
0xcea0a0d8 add r2, sp, #8
0xcea0a0da nop
0xcea0a0dc str r1, [r2, #-0x4]!
0xcea0a0e0 ldr.w pc, [pc, #-0x0]   同理,0x9d639d3d
0xcea0a0e4 //地址 0x9d639d3d
0xcea0a0e8 //地址 0x9d639d45
0xcea0a0e4            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  3d 9d 63 9d 45 9d 63 9d                          =.c.E.c.
------------------------------------------------------------------------------

指令修复的逻辑是,pc相关的指令,采用ldr给它跳回正确的位置;pc无关的指令,直接复制过来,这里push指令,add指令以及str.w指令都被直接复制过来,插入的一些nop指令是为了4字节对齐,至于为啥要对齐我就不知道了,印象中thumb指令似乎是两字节对齐就可以了。原指令中,只有cbz是pc相关的,这条指令的语义是,看r0寄存器是否为零,为零则跳转到到给定位置,这个例子中是跳转到当前位置+0x10,即偏移25ED44处;可以看到,Dobby的修复手段是,修改cbz指令,如果r0为0,则通过ldr(0xcea0a0d0 ldr.w pc, [pc, #0x14])指令,给它跳回偏移25ED44处(0x9d639d45);若不为0,则通过ldr(0xcea0a0e0 ldr.w pc, [pc, #-0x0])跳转到被patch指令之后的指令继续去执行,在这里是偏移25ED3C(0x9d639d3d),这两条地址都加了个1,是因为原指令是thumb指令,arm处理器通过末尾地址是否为1来确定采用arm模式还是thumb模式,为1采用thum模式。至此,整个Dobby指令插桩的逻辑就完了。

代码详解

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
PUBLIC int DobbyInstrument(void *address, DBICallTy handler) {
  if (!address) {
    ERROR_LOG("the function address is 0x0.\n");
    return RS_FAILED;
  }
 
  RAW_LOG(1, "\n\n");
  DLOG(0, "[DobbyInstrument] Initialize at %p", address);
 
  // check if we already instruemnt
  HookEntry *entry = Interceptor::SharedInstance()->FindHookEntry(address);
  if (entry) {
    DynamicBinaryInstrumentRouting *route = (DynamicBinaryInstrumentRouting *)entry->route;
    if (route->handler == handler) {
      ERROR_LOG("instruction %s already been instrumented.", address);
      return RS_FAILED;
    }
  }
 
  entry = new HookEntry();
  entry->id = Interceptor::SharedInstance()->GetHookEntryCount();
  entry->type = kDynamicBinaryInstrument;
  entry->instruction_address = address;
 
  DynamicBinaryInstrumentRouting *route = new DynamicBinaryInstrumentRouting(entry, (void *)handler);
  route->Prepare();
 
  // 重点方法
  route->DispatchRouting();
  Interceptor::SharedInstance()->AddHookEntry(entry);
  route->Commit();
 
  return RS_SUCCESS;
}

先遍历一个HookEntry链表,这个链表保存了每一次插桩的信息;每插桩一条指令,都会生成一个HookEntry结构体并且添加到这个链表。遍历这个链表可以判断当前要插桩的指令是否被插过。route->DispatchRouting();为重点方法,这个方法完成几乎所有的插桩工作,route->DispatchRouting()调用了两个方法BuildDynamicBinaryInstrumentRouting()和GenerateRelocatedCode(trampolinebuffer->getSize())

1
2
3
4
5
6
void DynamicBinaryInstrumentRouting::DispatchRouting() {
  BuildDynamicBinaryInstrumentRouting();
 
  // generate relocated code which size == trampoline size
  GenerateRelocatedCode(trampoline_buffer_->getSize());
}

BuildDynamicBinaryInstrumentRouting()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
void DynamicBinaryInstrumentRouting::BuildDynamicBinaryInstrumentRouting() {
  // create closure trampoline jump to prologue_routing_dispath with the `entry_` data
 
  ClosureTrampolineEntry *closure_trampoline;
 
  void *handler = (void *)instrument_routing_dispatch;
#if __APPLE__
#if __has_feature(ptrauth_calls)
  handler = __builtin_ptrauth_strip(handler, ptrauth_key_asia);
#endif
#endif
 
 
  closure_trampoline = ClosureTrampoline::CreateClosureTrampoline(entry_, handler);
  this->SetTrampolineTarget(closure_trampoline->address);
 
 
  DLOG(0, "[closure bridge] Carry data %p ", entry_);
 
 
  DLOG(0, "[closure bridge] Create prologue_dispatch_bridge %p", closure_trampoline->address);
 
  // generate trampoline buffer, run before `GenerateRelocatedCode`
  GenerateTrampolineBuffer(entry_->target_address, GetTrampolineTarget());
}

其中,closuretrampoline = ClosureTrampoline::CreateClosureTrampoline(entry, handler);会生成
prologue_dispatch_bridge的那些汇编指令,其中__ EmitAddress((uint32_t)get_closure_bridge())是一个重点,closure_bridge的指令在这里生成,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
ClosureTrampolineEntry *ClosureTrampoline::CreateClosureTrampoline(void *carry_data, void *carry_handler) {
 
 
  ClosureTrampolineEntry *entry = nullptr;
  entry = new ClosureTrampolineEntry;
 
#ifdef ENABLE_CLOSURE_TRAMPOLINE_TEMPLATE
#define CLOSURE_TRAMPOLINE_SIZE (7 * 4)
  // use closure trampoline template code, find the executable memory and patch it.
  Code *code = Code::FinalizeCodeFromAddress(closure_trampoline_template, CLOSURE_TRAMPOLINE_SIZE);
#else
 
// use assembler and codegen modules instead of template_code
#include "TrampolineBridge/ClosureTrampolineBridge/AssemblyClosureTrampoline.h"
#define _ turbo_assembler_.
  TurboAssembler turbo_assembler_(0);
 
  PseudoLabel entry_label;
  PseudoLabel forward_bridge_label;
 
 
 
  _ Ldr(r12, &entry_label);
  _ Ldr(pc, &forward_bridge_label);
  _ PseudoBind(&entry_label);
  _ EmitAddress((uint32_t)entry);
  _ PseudoBind(&forward_bridge_label);
  _ EmitAddress((uint32_t)get_closure_bridge());
 
  AssemblyCodeChunk *code = nullptr;
  code = AssemblyCodeBuilder::FinalizeFromTurboAssembler(&turbo_assembler_);
 
  entry->address = (void *)code->raw_instruction_start();
  entry->size = code->raw_instruction_size();
  entry->carry_data = carry_data;
  entry->carry_handler = carry_handler;
 
  delete code;
  return entry;
#endif
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
void *get_closure_bridge() {
 
  // if already initialized, just return.
  if (closure_bridge)
    return closure_bridge;
 
// check if enable the inline-assembly closure_bridge_template
#if ENABLE_CLOSURE_BRIDGE_TEMPLATE
  extern void closure_bridge_tempate();
  closure_bridge = closure_bridge_template;
// otherwise, use the Assembler build the closure_bridge
#else
#define _ turbo_assembler_.
  TurboAssembler turbo_assembler_(0);
 
  _ sub(sp, sp, Operand(14 * 4));
  _ str(lr, MemOperand(sp, 13 * 4));
  _ str(r12, MemOperand(sp, 12 * 4));
  _ str(r11, MemOperand(sp, 11 * 4));
  _ str(r10, MemOperand(sp, 10 * 4));
  _ str(r9, MemOperand(sp, 9 * 4));
  _ str(r8, MemOperand(sp, 8 * 4));
  _ str(r7, MemOperand(sp, 7 * 4));
  _ str(r6, MemOperand(sp, 6 * 4));
  _ str(r5, MemOperand(sp, 5 * 4));
  _ str(r4, MemOperand(sp, 4 * 4));
  _ str(r3, MemOperand(sp, 3 * 4));
  _ str(r2, MemOperand(sp, 2 * 4));
  _ str(r1, MemOperand(sp, 1 * 4));
  _ str(r0, MemOperand(sp, 0 * 4));
 
  // store sp
  _ add(r0, sp, Operand(14 * 4));
  _ sub(sp, sp, Operand(8));
  _ str(r0, MemOperand(sp, 4));
 
  // stack align
  _ sub(sp, sp, Operand(8));
 
  _ mov(r0, Operand(sp));
  _ mov(r1, Operand(r12));
 
  _ CallFunction(ExternalReference((void *)intercept_routing_common_bridge_handler));
 
  // stack align
  _ add(sp, sp, Operand(8));
 
  // restore sp placeholder stack
  _ add(sp, sp, Operand(8));
 
  _ ldr(r0, MemOperand(sp, 4, PostIndex));
  _ ldr(r1, MemOperand(sp, 4, PostIndex));
  _ ldr(r2, MemOperand(sp, 4, PostIndex));
  _ ldr(r3, MemOperand(sp, 4, PostIndex));
  _ ldr(r4, MemOperand(sp, 4, PostIndex));
  _ ldr(r5, MemOperand(sp, 4, PostIndex));
  _ ldr(r6, MemOperand(sp, 4, PostIndex));
  _ ldr(r7, MemOperand(sp, 4, PostIndex));
  _ ldr(r8, MemOperand(sp, 4, PostIndex));
  _ ldr(r9, MemOperand(sp, 4, PostIndex));
  _ ldr(r10, MemOperand(sp, 4, PostIndex));
  _ ldr(r11, MemOperand(sp, 4, PostIndex));
  _ ldr(r12, MemOperand(sp, 4, PostIndex));
  _ ldr(lr, MemOperand(sp, 4, PostIndex));
 
  // auto switch A32 & T32 with `least significant bit`, refer `docs/A32_T32_states_switch.md`
  _ mov(pc, Operand(r12));
 
  AssemblyCodeChunk *code = AssemblyCodeBuilder::FinalizeFromTurboAssembler(&turbo_assembler_);
  closure_bridge = (void *)code->raw_instruction_start();
 
  DLOG(0, "[closure bridge] Build the closure bridge at %p", closure_bridge);
#endif
  return (void *)closure_bridge;
}

BuildDynamicBinaryInstrumentRouting()还调用了这个GenerateTrampolineBuffer(entry_->target_address, GetTrampolineTarget()); 这个方法生成了TrampolineBuffer,也就是用于patch原始指令的那些指令,流程图的第二个小方块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
bool InterceptRouting::GenerateTrampolineBuffer(void *src, void *dst) {
  CodeBufferBase *trampoline_buffer = NULL;
  // if near branch trampoline plugin enabled
  if (RoutingPluginManager::near_branch_trampoline) {
    RoutingPluginInterface *plugin = NULL;
    plugin = reinterpret_cast<RoutingPluginInterface *>(RoutingPluginManager::near_branch_trampoline);
    if (plugin->GenerateTrampolineBuffer(this, src, dst) == false) {
      DLOG(0, "Failed enable near branch trampoline plugin");
    }
  }
 
  if (this->GetTrampolineBuffer() == NULL) {
    trampoline_buffer = GenerateNormalTrampolineBuffer((addr_t)src, (addr_t)dst);
    this->SetTrampolineBuffer(trampoline_buffer);
 
    DLOG(0, "[trampoline] Generate trampoline buffer %p -> %p", src, dst);
  }
  return true;
}

GenerateRelocatedCode(trampolinebuffer->getSize())

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
bool InterceptRouting::GenerateRelocatedCode(int tramp_size) {
  // generate original code
  AssemblyCodeChunk *origin = NULL;
  origin = AssemblyCodeBuilder::FinalizeFromAddress((addr_t)entry_->target_address, tramp_size);
  origin_ = origin;
 
  // generate the relocated code
  AssemblyCodeChunk *relocated = NULL;
  relocated = AssemblyCodeBuilder::FinalizeFromAddress(0, 0);
  relocated_ = relocated;
 
  void *relocate_buffer = NULL;
  relocate_buffer = entry_->target_address;
 
  GenRelocateCodeAndBranch(relocate_buffer, origin, relocated);
  if (relocated->raw_instruction_start() == 0)
    return false;
 
  // set the relocated instruction address
  entry_->relocated_origin_instructions = (void *)relocated->raw_instruction_start();
 
  DLOG(0, "[insn relocate] origin %p - %d", origin->raw_instruction_start(), origin->raw_instruction_size());
 
 
  DLOG(0, "[insn relocate] relocated %p - %d", relocated->raw_instruction_start(), relocated->raw_instruction_size());
 
 
  // save original prologue
  memcpy((void *)entry_->origin_chunk_.chunk_buffer, (void *)origin_->raw_instruction_start(),
         origin_->raw_instruction_size());
  entry_->origin_chunk_.chunk.re_init_region_range(origin_);
  return true;
}

其中GenRelocateCodeAndBranch(relocate_buffer, origin, relocated);是重点,它会生成重定位代码,放在relocated指针指向的地址空间中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
void GenRelocateCodeAndBranch(void *buffer, AssemblyCodeChunk *origin, AssemblyCodeChunk *relocated) {
  CodeBuffer *code_buffer = new CodeBuffer(64);
 
  ThumbTurboAssembler thumb_turbo_assembler_(0, code_buffer);
#define thumb_ thumb_turbo_assembler_.
  TurboAssembler arm_turbo_assembler_(0, code_buffer);
#define arm_ arm_turbo_assembler_.
 
  Assembler *curr_assembler_ = NULL;
 
  AssemblyCodeChunk origin_chunk;
  origin_chunk.init_region_range(origin->raw_instruction_start(), origin->raw_instruction_size());
 
  bool entry_is_thumb = origin->raw_instruction_start() % 2;
  if (entry_is_thumb) {
    origin->re_init_region_range(origin->raw_instruction_start() - THUMB_ADDRESS_FLAG, origin->raw_instruction_size());
  }
 
  LiteMutableArray relo_map(8);
 
relocate_remain:
  addr32_t execute_state_changed_pc = 0;
 
  bool is_thumb = origin_chunk.raw_instruction_start() % 2;
  if (is_thumb) {
    curr_assembler_ = &thumb_turbo_assembler_;
 
    buffer = (void *)((addr_t)buffer - THUMB_ADDRESS_FLAG);
 
    addr32_t origin_code_start_aligned = origin_chunk.raw_instruction_start() - THUMB_ADDRESS_FLAG;
    // remove thumb address flag
    origin_chunk.re_init_region_range(origin_code_start_aligned, origin_chunk.raw_instruction_size());
 
    gen_thumb_relocate_code(&relo_map, &thumb_turbo_assembler_, buffer, &origin_chunk, relocated,
                            &execute_state_changed_pc);
    if (thumb_turbo_assembler_.GetExecuteState() == ARMExecuteState) {
      // relocate interrupt as execute state changed
      if (execute_state_changed_pc < origin_chunk.raw_instruction_start() + origin_chunk.raw_instruction_size()) {
        // re-init the origin
        int relocate_remain_size =
            origin_chunk.raw_instruction_start() + origin_chunk.raw_instruction_size() - execute_state_changed_pc;
        // current execute state is ARMExecuteState, so not need `+ THUMB_ADDRESS_FLAG`
        origin_chunk.re_init_region_range(execute_state_changed_pc, relocate_remain_size);
 
        // update buffer
        buffer = (void *)((addr_t)buffer + (execute_state_changed_pc - origin_code_start_aligned));
 
        // add nop to align ARM
        if (thumb_turbo_assembler_.pc_offset() % 4)
          thumb_turbo_assembler_.t1_nop();
        goto relocate_remain;
      }
    }
  } else {
    curr_assembler_ = &arm_turbo_assembler_;
 
    gen_arm_relocate_code(&relo_map, &arm_turbo_assembler_, buffer, &origin_chunk, relocated,
                          &execute_state_changed_pc);
    if (arm_turbo_assembler_.GetExecuteState() == ThumbExecuteState) {
      // relocate interrupt as execute state changed
      if (execute_state_changed_pc < origin_chunk.raw_instruction_start() + origin_chunk.raw_instruction_size()) {
        // re-init the origin
        int relocate_remain_size =
            origin_chunk.raw_instruction_start() + origin_chunk.raw_instruction_size() - execute_state_changed_pc;
        // current execute state is ThumbExecuteState, add THUMB_ADDRESS_FLAG
        origin_chunk.re_init_region_range(execute_state_changed_pc + THUMB_ADDRESS_FLAG, relocate_remain_size);
 
        // update buffer
        buffer = (void *)((addr_t)buffer + (execute_state_changed_pc - origin_chunk.raw_instruction_start()));
        goto relocate_remain;
      }
    }
  }
 
  // TODO:
  // if last instr is unlink branch, skip
  //dkl 调回插桩点之后继续执行
  addr32_t rest_instr_addr = origin_chunk.raw_instruction_start() + origin_chunk.raw_instruction_size();
  if (curr_assembler_ == &thumb_turbo_assembler_) {
    // Branch to the rest of instructions
    thumb_ AlignThumbNop();
    thumb_ t2_ldr(pc, MemOperand(pc, 0));
    // Get the real branch address
    thumb_ EmitAddress(rest_instr_addr + THUMB_ADDRESS_FLAG);
  } else {
    // Branch to the rest of instructions
    CodeGen codegen(&arm_turbo_assembler_);
    // Get the real branch address
    codegen.LiteralLdrBranch(rest_instr_addr);
  }
 
  // Realize all the Pseudo-Label-Data
  thumb_turbo_assembler_.RelocBind();
 
  // Realize all the Pseudo-Label-Data
  //dkl 在这里会修正之前lable link的ldr指令,
  arm_turbo_assembler_.RelocBind();
 
  // Generate executable code
  {
    // assembler without specific memory address
    AssemblyCodeChunk *cchunk;
    cchunk = MemoryArena::AllocateCodeChunk(code_buffer->getSize());
    if (cchunk == nullptr)
      return;
 
    thumb_turbo_assembler_.SetRealizedAddress(cchunk->address);
    arm_turbo_assembler_.SetRealizedAddress(cchunk->address);
 
    // fixup the instr branch into trampoline(has been modified)
 
    reloc_label_fixup(origin, &relo_map, &thumb_turbo_assembler_, &arm_turbo_assembler_);
 
    AssemblyCodeChunk *code = NULL;
    code = AssemblyCodeBuilder::FinalizeFromTurboAssembler(curr_assembler_);
    relocated->re_init_region_range(code->raw_instruction_start(), code->raw_instruction_size());
    delete code;
  }
 
  // thumb
  if (entry_is_thumb) {
    // add thumb address flag
    relocated->re_init_region_range(relocated->raw_instruction_start() + THUMB_ADDRESS_FLAG,
                                    relocated->raw_instruction_size());
  }
 
  // clean
  {
    thumb_turbo_assembler_.ClearCodeBuffer();
    arm_turbo_assembler_.ClearCodeBuffer();
 
    delete code_buffer;
  }
}

感觉有点啰嗦了,重点说一下指令修复那块吧,我们的例子中,要修复的指令是thumb1指令,最终会调到这里;我省略了其他指令的修复,只看cbz的,细节就不说了,大概思路就是,用ldr pc,[pc,xxx]去跳转,但是第一次生成ldr指令的时候,xxx是没用的,等到全部重定位指令都生成之后,这些ldr都会被修正,因为ldr跳转的地址,都是储存在所有指令之后的,从流程图以及上面说的各个块的汇编指令也可以看出,地址都是存在指令末尾。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
static void Thumb1RelocateSingleInstr(ThumbTurboAssembler *turbo_assembler, LiteMutableArray *thumb_labels,
                                      int16_t instr, addr32_t from_pc, addr32_t to_pc,
                                      addr32_t *execute_state_changed_pc_ptr) {
  bool is_instr_relocated = false;
 
  _ AlignThumbNop();
 
  uint32_t val = 0, op = 0, rt = 0, rm = 0, rn = 0, rd = 0, shift = 0, cond = 0;
  int32_t offset = 0;
 
  int32_t op0 = 0, op1 = 0;
  op0 = bits(instr, 10, 15);
  // [F3.2.3 Special data instructions and branch and exchange]
  if (op0 == 0b010001) {
    op0 = bits(instr, 8, 9);
    // [Add, subtract, compare, move (two high registers)]
    if (op0 != 0b11) {
      int rs = bits(instr, 3, 6);
      // rs is PC register
      if (rs == 15) {
        val = from_pc;
 
        uint16_t rewrite_inst = 0;
        rewrite_inst = (instr & 0xff87) | LeftShift((VOLATILE_REGISTER.code()), 4, 3);
 
        ThumbRelocLabelEntry *label = new ThumbRelocLabelEntry(val, false);
        _ AppendRelocLabelEntry(label);
 
        _ T2_Ldr(VOLATILE_REGISTER, label);
        _ EmitInt16(rewrite_inst);
 
        is_instr_relocated = true;
      }
    }
 
 
 
  // compare branch (cbz, cbnz)
  if ((instr & 0xf500) == 0xb100) {
    uint16_t imm5 = bits(instr, 3, 7);
    uint16_t i = bit(instr, 9);
    uint32_t offset = (i << 6) | (imm5 << 1);
    val = from_pc + offset;
    rn = bits(instr, 0, 2);
 
    //ThumbTurboAssembler 的data_labels_记录所有的ThumbRelocLabelEntry,保存着要跳转的地址,同时绑定了跳转指令,等待后续把要跳转的地址找到合适的内存储存后,一起修复好
//    即,修复前 ldr pc,xxx  修复后 ldr pc, [pc,offset],pc+offset就是存储要跳转地址的内存
    ThumbRelocLabelEntry *label = new ThumbRelocLabelEntry(val + 1, true);
    _ AppendRelocLabelEntry(label);
 
//    imm5 = bits(0x4 >> 1, 1, 5);
    //dkl 修复
      imm5 = bits(0, 1, 5);
    i = bit(0x4 >> 1, 6);
 
    _ EmitInt16((instr & 0xfd07) | imm5 << 3 | i << 9);
    _ t1_nop(); // manual align
    _ t2_b(0);
    //这个label持有要跳转过去的地址,跳转采用ldr pc 的方式,这个label同时又采用PseudoLabelInstruction结构体绑定到指令上,所以,已经具备了跳转的全部信息了,
    // 只差把跳转地址存到合适的位置,然后修复ldr即可,修复工作好像是后面统一处理, thumb_turbo_assembler_.RelocBind();在这里修正
    _ T2_Ldr(pc, label);
 
    is_instr_relocated = true;
  }
 
 
 
  // if the instr do not needed relocate, just rewrite the origin
  if (!is_instr_relocated) {
#if 0
        if (from_pc % Thumb2_INST_LEN)
            _ t1_nop();
#endif
    _ EmitInt16(instr);
  }
}

至此,代码详解也结束了,其实代码修复主要是解析指令,这一块稍微繁琐一点。

收获

最主要的是有了一次完整的源码阅读经验,同时学到了一些工程技巧,比如c++的链表技巧
先定义一个通用链表头,
图片描述
具体数据节点
图片描述

 

这样写的好处是,遍历链表时,直接采用NodHead指针去遍历,然后需要读取数据的时候,把NodHead转为 EntryNod即可,因为结构体指针就是结构体首项地址,这NodHead和EntryNod值都是一样的。这样就可以写出一个通用的链表模板,以后设么链表都可以用这套模板,把EntryNod改改就行

 

第二个收获是,一些经典宏,比如##可以连接字符串,比如这个宏,可以通过类类型,类成员名称,类成员地址取到类的this指针,参考 container_of宏

1
2
3
4
5
6
7
#define offsetof(t, d) __builtin_offsetof(t, d)
 
#define container_of(ptr, type, member)                                                                                \
  ({                                                                                                                   \
    const __typeof(((type *)0)->member) *__mptr = (ptr);                                                               \
    (type *)((char *)__mptr - offsetof(type, member));                                                                 \
  })

同时,Dobby有自己的内存分配模块,他会把每次分配的相同属性的内存记录下来,等到需要申请内存的时候,先查看已经分配的内存是否有可用的,这样就避免了频繁的内存分配。

使用Dobby过程中遇到的问题

我总共遇到了三个问题,第一个问题是插桩的时候,正好那条指令正在执行,这样就会出错。修复办法有两个,第一个是在so加载的第一时间就完成插桩;第二个办法是,通过异常使进程中断,自定义信号处理函数,在异常处理过程中完成插桩。我采用的是第一个办法,在so加载的第一时间就完成插桩.android的so加载最终都是通过linker的do_dlopen加载so,而do_dlopen会调用
soinfo* si = find_library(ns, translated_name, flags, extinfo, caller);在这里可以拿到soinfo指针,有了soinfo就有了一切,所以只需要hook这个函数即可。实际上,在aosp10,这个函数是内联的,所以我hook了find_library中的si->increment_ref_count();这个函数拿到的soinfo指针。
第二个问题是,mproterct问题,因为需要patch 的原指令,但是原指令一般内存属性是只读的,需要使用mprotect去把属性改成可写,mprotect是按页整数倍进行修改的,Dobby会把需要插桩的那条指令所在页面权限修改,大多数情况下没有问题,但是偶尔,被插桩的指令位于页面底部,而patch又需要至少8字节,这就导致了会横跨两个页面,而Dobby只是修改了一个页面,需要注意一下
第三个问题,sigll,这个问题主要是指令修复的时候,没有生成正确的汇编,跳错地方了,这个需要针对性的根据源码来修复了,这也是我去看Dobby源码的原因,太南了。。。

总结

目前逆向工具中,ida是静态分析的王者,frida(估计)是动态分析的王者,但是frida是函数级的工作,粒度不够,需要Dobby配合使用,即可达到指令级的动态分析。
调试器虽然也可以达到目的,但是调试器容易引入很多其他的问题,我开始就是使用gdb的,但是遇到了很多问题,比如gdb把进程暂停了,android一些广播超时,就把我的进程杀了,或者不小心摸了一下屏幕,屏幕响应超时,又把我杀了,有时候gdb识别不出thumb指令,还得给它手动设置模式,体验大大滴坏;不过gdb有个内存断点,估计有时候不得不用一下。


[CTF入门培训]顶尖高校博士及硕士团队亲授《30小时教你玩转CTF》,视频+靶场+题目!助力进入CTF世界

最后于 2022-6-29 20:18 被KerryS编辑 ,原因:
收藏
点赞8
打赏
分享
打赏 + 80.00雪花
打赏次数 1 雪花 + 80.00
 
赞赏  Editor   +80.00 2022/07/29 恭喜您获得“雪花”奖励,安全圈有你而精彩!
最新回复 (12)
雪    币: 3789
活跃值: (2334)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
duanDbg 2022-6-29 20:22
2
0
无数个赞献上。。。
雪    币: 1120
活跃值: (1971)
能力值: ( LV3,RANK:20 )
在线值:
发帖
回帖
粉丝
Oday小斯 2022-6-29 20:44
3
0
感谢分享
雪    币: 5350
活跃值: (5349)
能力值: ( LV9,RANK:170 )
在线值:
发帖
回帖
粉丝
GitRoy 3 2022-6-29 21:17
4
1
感谢楼主分享
雪    币: 2887
活跃值: (2672)
能力值: ( LV7,RANK:111 )
在线值:
发帖
回帖
粉丝
ArmVMP 2022-6-29 22:33
5
1
都说stalker好用,楼主可以研究下分享出来
雪    币: 1154
活跃值: (3409)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
王麻子本人 2022-6-29 22:37
6
0
ArmVMP 都说stalker好用,楼主可以研究下分享出来
stalker我在写了才写一半
雪    币: 149
活跃值: (2013)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
saloyun 2022-6-30 08:58
7
0
frida也可以对任意内存地址做hook呀,实现方案类似。
雪    币: 3350
活跃值: (3372)
能力值: ( LV4,RANK:50 )
在线值:
发帖
回帖
粉丝
fengyunabc 1 2022-6-30 09:39
8
1
感谢分享!
雪    币: 3384
活跃值: (4083)
能力值: ( LV3,RANK:20 )
在线值:
发帖
回帖
粉丝
KerryS 2022-6-30 09:42
9
0
saloyun frida也可以对任意内存地址做hook呀,实现方案类似。
frida我是完成了这个之后才知道可以的,我试了一下,可以是可以,但是hook函数内部指令之后,ctrl+s会变得很慢,要半分钟一分钟才有反应,再一个的话就是不确定大规模插桩的情况下frida稳不稳定,比如插桩几千条指令
雪    币: 149
活跃值: (2013)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
saloyun 2022-6-30 15:24
10
0
KerryS frida我是完成了这个之后才知道可以的,我试了一下,可以是可以,但是hook函数内部指令之后,ctrl+s会变得很慢,要半分钟一分钟才有反应,再一个的话就是不确定大规模插桩的情况下frida稳不稳定 ...
不清楚为啥会要半分钟哦,看实现方案,应该跟你这个类似。也有别人对源码的分析:
https://bbs.pediy.com/thread-273273.htm
然后楼主应该不是windows程序员吧,否则LIST_ENTRY和CONTAINING_RECORD可能第一天就会遇到。总之还是对楼主专研精神点赞。
雪    币: 1461
活跃值: (1352)
能力值: ( LV5,RANK:60 )
在线值:
发帖
回帖
粉丝
hackdaliu 1 2022-8-12 17:11
11
0
ArmVMP 都说stalker好用,楼主可以研究下分享出来
stalker的原理是复制当前函数到新的内存中执行,目前使用发现主要有两个问题:
1.他只复制当前函数,不会进入子函数trace,这个应该原理上就有这个问题吧
2.很多跳转都会崩,好像主要是跳转到函数本身地址范围内的时候
雪    币: 1461
活跃值: (1352)
能力值: ( LV5,RANK:60 )
在线值:
发帖
回帖
粉丝
hackdaliu 1 2022-8-12 17:36
12
0
ArmVMP 都说stalker好用,楼主可以研究下分享出来
大佬,好奇这里dobby是不是也是hook某个指令而后实现什么操作,他有指令trace的功能吗?
雪    币: 3384
活跃值: (4083)
能力值: ( LV3,RANK:20 )
在线值:
发帖
回帖
粉丝
KerryS 2022-8-15 15:11
13
0
hackdaliu 大佬,好奇这里dobby是不是也是hook某个指令而后实现什么操作,他有指令trace的功能吗?
最后于 2022-8-16 00:20 被KerryS编辑 ,原因:
游客
登录 | 注册 方可回帖
返回