首页
社区
课程
招聘
[原创]一 Android ELF系列:ELF文件格式简析到linker的链接so文件原理分析
发表于: 2019-2-22 17:53 14910

[原创]一 Android ELF系列:ELF文件格式简析到linker的链接so文件原理分析

2019-2-22 17:53
14910
  1. Android ELF系列:ELF文件格式简析和linker的链接so文件原理分析
  2. Android ELF系列:实现一个so文件加载器
  3. Android ELF系列:手写一个so文件(包含两个导出函数)
  4. Android ELF系列:实现一些小功能
  5. Android ELF系列:实现一个so的加密壳
  6. Android ELF系列:待续............

目录

ELF 文件格式

文件类型

  1. 可重定位文件
  2. 可执行文件
  3. 共享目标文件

ELF Header

#define EI_NIDENT (16)

typedef struct {
  unsigned char    e_ident[EI_NIDENT];
  Elf32_Half    e_type;
  Elf32_Half    e_machine;
  Elf32_Word    e_version;
  Elf32_Addr    e_entry;
  Elf32_Off        e_phoff;
  Elf32_Off        e_shoff;
  Elf32_Word    e_flags;
  Elf32_Half    e_ehsize;
  Elf32_Half    e_phentsize;
  Elf32_Half    e_phnum;
  Elf32_Half    e_shentsize;
  Elf32_Half    e_shnum;
  Elf32_Half    e_shstrndx;
} Elf32_Ehdr;

ELF Header中各个字段的含义

  • e_ident

    e_ident数组给出了ELF的一些标识信息

名称 取值 说明
EI_MAG0 - EI_MAG3 0 - 3 魔数:标志此文件是一个ELF文件. <br />ELFMAG "\177ELF"
EI_CLASS 4 标识文件的类别<br />ELFCLASSNONE 0 非法类别<br/>ELFCLASS32 1 32 位目标<br />ELFCLASS64 2 64 位目标<br />
EI_DATA 5 给出处理器特定数据的数据编码方式<br />ELFDATANONE 0 非法数据编码 <br/>ELFDATA2LSB 1 高位在前 <br />ELFDATA2MSB 2 低位在前<br />
EI_VERSION 6 ELF 头部的版本号码,目前此值必须是 EV_CURRENT
EI_OSABI 7
EI_ABIVERSION 8
EI_PAD 9 标记 e_ident 中未使用字节的开始。初始化为 0。
  • e_type 目标文件类型
#define ET_NONE        0       //未知目标文件格式
#define ET_REL        1       //重定位文件
#define ET_EXEC        2       //可执行文件
#define ET_DYN        3       //共享目标文件
#define ET_CORE        4       //Core 文件(转存格式)
#define    ET_NUM        5       
#define ET_LOOS        0xfe00
#define ET_HIOS        0xfeff
#define ET_LOPROC    0xff00 //特定处理器文件
#define ET_HIPROC    0xffff //特定处理器文件
  • e_machine 目标体系结构类型
#define EM_NONE         0
#define EM_M32         1
#define EM_SPARC     2
#define EM_386         3
#define EM_68K         4
#define EM_88K         5
#define EM_860         7
#define EM_MIPS         8
#define EM_S370         9
#define EM_MIPS_RS3_LE    10

#define EM_PARISC    15
#define EM_VPP500    17
#define EM_SPARC32PLUS    18
#define EM_960        19
#define EM_PPC        20
#define EM_PPC64    21
#define EM_S390        22

#define EM_V800        36
#define EM_FR20        37
#define EM_RH32        38
#define EM_RCE        39
#define EM_ARM        40
#define EM_FAKE_ALPHA    41
#define EM_SH        42
#define EM_SPARCV9    43
#define EM_TRICORE    44
#define EM_ARC        45
#define EM_H8_300    46
#define EM_H8_300H    47
#define EM_H8S        48
#define EM_H8_500    49
#define EM_IA_64    50
#define EM_MIPS_X    51
#define EM_COLDFIRE    52
#define EM_68HC12    53
#define EM_MMA        54
#define EM_PCP        55
#define EM_NCPU        56
#define EM_NDR1        57
#define EM_STARCORE    58
#define EM_ME16        59
#define EM_ST100    60
#define EM_TINYJ    61
#define EM_X86_64    62
#define EM_PDSP        63

#define EM_FX66        66
#define EM_ST9PLUS    67
#define EM_ST7        68
#define EM_68HC16    69
#define EM_68HC11    70
#define EM_68HC08    71
#define EM_68HC05    72
#define EM_SVX        73
#define EM_ST19        74
#define EM_VAX        75
#define EM_CRIS        76
#define EM_JAVELIN    77
#define EM_FIREPATH    78
#define EM_ZSP        79
#define EM_MMIX        80
#define EM_HUANY    81
#define EM_PRISM    82
#define EM_AVR        83
#define EM_FR30        84
#define EM_D10V        85
#define EM_D30V        86
#define EM_V850        87
#define EM_M32R        88
#define EM_MN10300    89
#define EM_MN10200    90
#define EM_PJ        91
#define EM_OR1K        92
#define EM_ARC_A5    93
#define EM_XTENSA    94
#define EM_AARCH64    183
#define EM_TILEPRO    188
#define EM_MICROBLAZE    189
#define EM_TILEGX    191
#define EM_NUM        192
#define EM_ALPHA    0x9026
  • e_version 目标文件版本
#define EV_NONE        0  //非法版本
#define EV_CURRENT    1  //当前版本
#define EV_NUM        2
  • 剩下
成员 说明
e_entry 程序入口虚拟地址.如果目标文件没有程序入口,可以为 0
e_phoff 程序头部表格(Program Header Table)的偏移量(按字节计算)。如果文件没有程序头部表格,可以为 0
e_shoff 节区头部表格(Section Header Table)的偏移量(按字节计算)。如果文件没有节区头部表格,可以为 0。
e_flags 保存与文件相关的,特定于处理器的标志。标志名称采用 EF_machine_flag的格式。
e_ehsize ELF 头部的大小(以字节计算)。
e_phentsize 程序头部表格的表项大小(按字节计算)。
e_phnum 程序头部表格的表项数目。可以为 0。
e_shentsize 节区头部表格的表项大小(按字节计算)。
e_shnum 节区头部表格的表项数目。可以为 0。
e_shstrndx 节区头部表格中与节区名称字符串表相关的表项的索引。如果文件没有节<br/>区名称字符串表,此参数可以为 SHN_UNDEF。

节区(Sections)

节区包含目标文件的所有信息,节区满足一下条件

  1. 目标文件中的每个节区都有对应的节区头部描述它,反过来,有节区头部不意味着有节区.
  2. 每个节区占用文件中一个连续字节区域(这个区域可能长度为 0)
  3. 文件中的节区不能重叠,不允许一个字节存在于两个节区中的情况发生.
  4. 目标文件中可能包含非活动空间(INACTIVE SPACE)。这些区域不属于任何头部和节区,其内容未指定.

结构

typedef struct {
  Elf32_Word    sh_name;
  Elf32_Word    sh_type;
  Elf32_Word    sh_flags;
  Elf32_Addr    sh_addr;
  Elf32_Off    sh_offset;
  Elf32_Word    sh_size;
  Elf32_Word    sh_link;
  Elf32_Word    sh_info;
  Elf32_Word    sh_addralign;
  Elf32_Word    sh_entsize;
} Elf32_Shdr;

字段说明

成员 说明
sh_name 给出节区名称。是节区头部字符串表节区(Section Header StringTable Section)的索引。名字是一个 NULL 结尾的字符串。
sh_type 为节区的内容和语义进行分类。参见节区类型。
sh_flags 节区支持 1 位形式的标志,这些标志描述了多种属性。
sh_addr 如果节区将出现在进程的内存映像中,此成员给出节区的第一个字节应处的位置。否则,此字段为 0。
sh_offset 此成员的取值给出节区的第一个字节与文件头之间的偏移.
sh_size 此成员给出节区的长度(字节数)。
sh_link 此成员给出节区头部表索引链接。其具体的解释依赖于节区类型
sh_info 此成员给出附加信息,其解释依赖于节区类型。
sh_addralign 某些节区带有地址对齐约束。例如,如果一个节区保存一个doubleword,那么系统必须保证整个节区能够按双字对齐。sh_addr对 sh_addralign 取模,结果必须为 0。目前仅允许取为 0 和 2的幂次数。数值 0 和 1 表示节区没有对齐约束。
sh_entsize 某些节区中包含固定大小的项目,如符号表。对于这类节区,此成员给出每个表项的长度节数。 如果节区中并不包含固定长度表项的表格,此成员取值为 0。
  • sh_type 节区类型
#define SHT_NULL      0             //此值标志节区头部是非活动的,没有对应的节区。此节区头部中的其他成员取值无意义。 
#define SHT_PROGBITS  1             //此节区包含程序定义的信息,其格式和含义都由程序来解释。
#define SHT_SYMTAB      2             //此节区包含一个符号表
#define SHT_STRTAB      3             //此节区包含字符串表。
#define SHT_RELA      4             //此节区包含重定位表项,其中可能会有补齐内容(addend)
#define SHT_HASH      5             //此节区包含符号哈希表。所有参与动态链接的目标都必须包含一个符号哈希表。
#define SHT_DYNAMIC      6             //此节区包含动态链接的信息。
#define SHT_NOTE      7             //此节区包含以某种方式来标记文件的信息。
#define SHT_NOBITS      8             //这种类型的节区不占用文件中的空间
#define SHT_REL          9             //此节区包含重定位表项,其中没有补齐(addends)
#define SHT_SHLIB      10            //此节区被保留,不过其语义是未规定的。包含此类型节区的程 序与 ABI 不兼容。
#define SHT_DYNSYM      11            //其中保存动态链接符号的一个最小集合,以节省空间
#define SHT_INIT_ARRAY      14
#define SHT_FINI_ARRAY      15
#define SHT_PREINIT_ARRAY 16
#define SHT_GROUP      17
#define SHT_SYMTAB_SHNDX  18
#define    SHT_NUM          19
#define SHT_LOOS      0x60000000
#define SHT_GNU_ATTRIBUTES 0x6ffffff5
#define SHT_GNU_HASH      0x6ffffff6
#define SHT_GNU_LIBLIST      0x6ffffff7
#define SHT_CHECKSUM      0x6ffffff8
#define SHT_LOSUNW      0x6ffffffa
#define SHT_SUNW_move      0x6ffffffa
#define SHT_SUNW_COMDAT   0x6ffffffb
#define SHT_SUNW_syminfo  0x6ffffffc
#define SHT_GNU_verdef      0x6ffffffd
#define SHT_GNU_verneed      0x6ffffffe
#define SHT_GNU_versym      0x6fffffff
#define SHT_HISUNW      0x6fffffff
#define SHT_HIOS      0x6fffffff
#define SHT_LOPROC      0x70000000
#define SHT_HIPROC      0x7fffffff
#define SHT_LOUSER      0x80000000  //此值给出保留给应用程序的索引下界
#define SHT_HIUSER      0x8fffffff  //此值给出保留给应用程序的索引上界
  • sh_flags
#define SHF_WRITE         (1 << 0)           //节区包含进程执行过程中将可写的数据(可写)
#define SHF_ALLOC         (1 << 1)           //此节区在进程执行过程中占用内存
#define SHF_EXECINSTR         (1 << 2)       //节区包含可执行的机器指令(可执行)
#define SHF_MERGE         (1 << 4)
#define SHF_STRINGS         (1 << 5)
#define SHF_INFO_LINK         (1 << 6)
#define SHF_LINK_ORDER         (1 << 7)
#define SHF_OS_NONCONFORMING (1 << 8)

#define SHF_GROUP         (1 << 9)
#define SHF_TLS             (1 << 10)
#define SHF_MASKOS         0x0ff00000
#define SHF_MASKPROC         0xf0000000     //所有包含于此掩码中的四位都用于处理器专用的语义
#define SHF_ORDERED         (1 << 30)
#define SHF_EXCLUDE         (1U << 31)
  • sh_link 和 sh_info
sh_type sh_link sh_info
SHT_DYNAMIC 此节区中条目所用到的字符串表格的节区头部索引 0
SHT_HASH 此哈希表所适用的符号表的节区头部索引 0
SHT_REL 或 SHT_RELA 相关符号表的节区头部索引 重定位所适用的节区的 节区部索引
SHT_SYMTAB 或 SHT_DYNSYM 相关联的字符串表的节区头部索引 最后一个局部符号(绑定STB_LOCAL)的符号表索引值加一
其它 SHN_UNDEF 0

程序头部

结构

typedef struct {
  Elf32_Word    p_type;
  Elf32_Off        p_offset;
  Elf32_Addr    p_vaddr;
  Elf32_Addr    p_paddr;
  Elf32_Word    p_filesz;
  Elf32_Word    p_memsz;
  Elf32_Word    p_flags;
  Elf32_Word    p_align;
} Elf32_Phdr;
  • p_type

此数组元素描述的段的类型,或者如何解释此数组元素的信息

#define    PT_NULL        0           //未定义
#define PT_LOAD        1           //给出一个可加载的段
#define PT_DYNAMIC    2           //数组元素给出动态链接信息
#define PT_INTERP    3           //数组元素给出一个 NULL 结尾的字符串的位置和长度,该字符串将被 当作解释器调用
#define PT_NOTE        4           //此数组元素给出附加信息的位置和大小
#define PT_SHLIB    5           //此段类型被保留,不过语义未指定。包含这种类型的段的程序与 ABI 不符。
#define PT_PHDR        6           //此类型的数组元素如果存在,则给出了程序头部表自身的大小和位置, 既包括在文件中也包括在内存中的信息。
#define PT_TLS        7
#define    PT_NUM        8
#define PT_LOOS        0x60000000
#define PT_GNU_EH_FRAME    0x6474e550
#define PT_GNU_STACK    0x6474e551
#define PT_GNU_RELRO    0x6474e552
#define PT_LOSUNW    0x6ffffffa
#define PT_SUNWBSS    0x6ffffffa
#define PT_SUNWSTACK    0x6ffffffb
#define PT_HISUNW    0x6fffffff
#define PT_HIOS        0x6fffffff
#define PT_LOPROC    0x70000000  //此范围的类型保留给处理器专用语义
#define PT_HIPROC    0x7fffffff  //此范围的类型保留给处理器专用语义
  • p_offset

此成员给出从文件头到该段第一个字节的偏移

  • p_vaddr

此成员给出段的第一个字节将被放到内存中的虚拟地址

  • p_paddr

此成员仅用于与物理地址相关的系统中。因为 System V 忽略所 有应用程序的物理地址信息,此字段对与可执行文件和共享目标 文件而言具体内容是未指定的。

  • p_filesz

此成员给出段在文件映像中所占的字节数。可以为 0。

  • p_memsz

此成员给出段在内存映像中占用的字节数。可以为 0。

  • p_flags

此成员给出与段相关的标志。

  • p_align

可加载的进程段的 p_vaddr 和 p_offset 取值必须合适,相对于对页面大小的取模而言。此成员给出段在文件中和内存中如何 对齐。数值 0 和 1 表示不需要对齐。否则 p_align 应该是个 正整数,并且是 2 的幂次数,p_vaddr 和 p_offset 对 p_align 取模后应该相等。

 

目录

ELF表

字符串表

字符串表节区包含以NULL结尾的字符序列.

 

一般,第一个字节(索引为 0)定义为一个空字符串。类似的,字符串表的最后一 个字节也定义为 NULL,以确保所有的字符串都以 NULL 结尾。索引为 0 的字符串在 不同的上下文中可以表示无名或者名字为 NULL 的字符串。

 

例如:

\0name\0chp\0seg\0\0xxx

里面包含3个字符串

  1. name
  2. chp
  3. seg

符号表

目标文件的符号表中包含用来定位、重定位程序中符号定义和引用的信息。符号表 索引是对此数组的索引。索引0表示表中的第一表项,同时也作为未定义符号的索引。

 

符号表定义如下:

typedef struct {
    Elf32_Word  st_name;
    Elf32_Addr  st_value;
    Elf32_Word  st_size;
    unsigned char   st_info;
    unsigned char   st_other;
    Elf32_Half  st_shndx;
} Elf32_sym;
名称 含义
st_name 包含目标文件符号字符串表的索引,其中包含符号名的字符串表示。如 果该值非 0,则它表示了给出符号名的字符串表索引,否则符号表项没 有名称。注:外部 C 符号在 C 语言和目标文件的符号表中具有相同的名称。
st_value 此成员给出相关联的符号的取值。依赖于具体的上下文,它可能是一个 绝对值、一个地址等等。
st_size 很多符号具有相关的尺寸大小。例如一个数据对象的大小是对象中包含 的字节数。如果符号没有大小或者大小未知,则此成员为 0。
st_info 此成员给出符号的类型和绑定属性。下面给出若干取值和含义的绑定关 系。
st_other 该成员当前包含 0,其含义没有定义。
st_shndx 每个符号表项都以和其他节区间的关系的方式给出定义。此成员给出相关的节区头部表索引。某些索引具有特殊含义。
  • st_info的说明

st_info 包含符号类型和绑定信息.

#define ELF32_ST_BIND(i)    ((i)>>4)        //获取绑定信息
#define ELF32_ST_TYPE(i)    ((i)&0xf)       //获取符号类型
#define ELF32_ST_INFO(b, t) (((b)<<4) + ((t)&0xf))
  • 绑定类型(ST_BIND)
名称 取值 说明
STB_LOCAL 0 局部符号在包含该符号定义的目标文件以外不可见。相同名 称的局部符号可以存在于多个文件中,互不影响。
STB_GLOBAL 1 全局符号对所有将组合的目标文件都是可见的。一个文件中 对某个全局符号的定义将满足另一个文件对相同全局符号的 未定义引用。
STB_WEAK 2 弱符号与全局符号类似,不过他们的定义优先级比较低。
 

全局符号与弱符号之间的区别主要有两点:
(1). 当链接编辑器组合若干可重定位的目标文件时,不允许对同名的 STB_GLOBAL 符号给出多个定义。 另一方面如果一个已定义的全局符号已经存在,出现一个同名的弱符号并 不会产生错误。链接编辑器尽关心全局符号,忽略弱符号。 类似地,如果一个公共符号(符号的 st_shndx 中包含 SHN_COMMON),那 么具有相同名称的弱符号出现也不会导致错误。链接编辑器会采纳公共定 义,而忽略弱定义。

 

(2). 当链接编辑器搜索归档库(archive libraries)时,会提取那些包含未定 义全局符号的档案成员。成员的定义可以是全局符号,也可以是弱符号。 连接编辑器不会提取档案成员来满足未定义的弱符号。 未能解析的弱符号取值为 0。

  • 符号类型
名称 取值 说明
STT_NOTYPE 0 符号的类型没有指定
STT_OBJECT 1 符号与某个数据对象相关,比如一个变量、数组等等
STT_FUNC 2 符号与某个函数或者其他可执行代码相关
STT_SECTION 3 符号与某个节区相关。这种类型的符号表项主要用于重定 位,通常具有 STB_LOCAL 绑定。
STT_FILE 4 传统上,符号的名称给出了与目标文件相关的源文件的名 称。文件符号具有 STB_LOCAL 绑定,其节区索引是SHN_ABS,并且它优先于文件的其他 STB_LOCAL 符号 (如果有的话)

重定位信息

重定位是将符号引用与符号定义进行连接的过程.例如,当程序调用了一个函数时,相关的调用指令必须将控制传输到适当的目标执行地址.

 

重定位表项

typedef struct {
    Elf32_Addr r_offset;
    Elf32_Word r_info;
} Elf32_Rel;
typedef struct {
    Elf32_Addr r_offset;
    Elf32_Word r_info;
    Elf32_Word r_addend;
} Elf32_Rela;
名称 说明
r_offset 此成员给出了重定位动作所适用的位置。对于一个可重定位文件而言, 此值是从节区头部开始到将被重定位影响的存储单位之间的字节偏 移。对于可执行文件或者共享目标文件而言,其取值是被重定位影响 到的存储单元的虚拟地址。
r_info 此成员给出要进行重定位的符号表索引,以及将实施的重定位类型。 例如一个调用指令的重定位项将包含被调用函数的符号表索引。如果 索引是 STN_UNDEF,那么重定位使用 0 作为“符号值”。重定位类型是和处理器相关的。当程序代码引用一个重定位项的重定位类型或 者符号表索引,则表示对表项的 r_info 成员应用 ELF32_R_TYPE 或 者 ELF32_R_SYM 的结果。<br />#define ELF32_R_SYM(i) ((i)>>8)<br />#define ELF32_R_TYPE(i) ((unsigned char)(i))<br />#define ELF32_R_INFO(s, t) (((s)<<8) + (unsigned char)(t))
r_addend 此成员给出一个常量补齐,用来计算将被填充到可重定位字段的数值。

so加载原理

开始分析linker

我们知道加载一个so文件的时候调用的是dlopen.经过分析,dlopen其实只是简单的调用位于linker模块的do_dlopen.

void* dlopen(const char* filename, int flags) { 
  ScopedPthreadMutexLocker locker(&gDlMutex); 
  soinfo* result = do_dlopen(filename, flags); 
  if (result == NULL) { 
    __bionic_format_dlerror("dlopen failed", linker_get_error_buffer()); return NULL;
  } 
  return result;
}

我们先看一下do_dlopen(位于安卓系统源码目录xref/bionic/linker/linker.cpp)

soinfo* do_dlopen(const char* name, int flags) {
  if ((flags & ~(RTLD_NOW|RTLD_LAZY|RTLD_LOCAL|RTLD_GLOBAL)) != 0) {
    DL_ERR("invalid flags to dlopen: %x", flags);
    return NULL;
  }
  set_soinfo_pool_protection(PROT_READ | PROT_WRITE);
  soinfo* si = find_library(name);
  if (si != NULL) {
    si->CallConstructors();//调用init函数
  }
  set_soinfo_pool_protection(PROT_READ);
  return si;
}
  • 调用find_library 返回soinfo结构
  • 调用CallConstructors.

我们先看一下soinfo这个结构体,这个结构将贯穿整个分析过程.

struct soinfo {
     public:
      char name[SOINFO_NAME_LEN];
      const Elf32_Phdr* phdr;
      size_t phnum;
      Elf32_Addr entry;
      Elf32_Addr base;
      unsigned size;

      uint32_t unused1;  // DO NOT USE, maintained for compatibility.

      Elf32_Dyn* dynamic;

      uint32_t unused2; // DO NOT USE, maintained for compatibility
      uint32_t unused3; // DO NOT USE, maintained for compatibility

      soinfo* next;
      unsigned flags;

      const char* strtab;
      Elf32_Sym* symtab;

      size_t nbucket;
      size_t nchain;
      unsigned* bucket;
      unsigned* chain;

      unsigned* plt_got;

      Elf32_Rel* plt_rel;
      size_t plt_rel_count;

      Elf32_Rel* rel;
      size_t rel_count;

      linker_function_t* preinit_array;
      size_t preinit_array_count;

      linker_function_t* init_array;
      size_t init_array_count;
      linker_function_t* fini_array;
      size_t fini_array_count;

      linker_function_t init_func;
      linker_function_t fini_func;

    #if defined(ANDROID_ARM_LINKER)
      // ARM EABI section used for stack unwinding.
      unsigned* ARM_exidx;
      size_t ARM_exidx_count;
    #elif defined(ANDROID_MIPS_LINKER)
      unsigned mips_symtabno;
      unsigned mips_local_gotno;
      unsigned mips_gotsym;
    #endif

      size_t ref_count;
      link_map_t link_map;

      bool constructors_called;

      // When you read a virtual address from the ELF file, add this
      // value to get the corresponding address in the process' address space.
      Elf32_Addr load_bias;

      bool has_text_relocations;
      bool has_DT_SYMBOLIC;

      void CallConstructors();
      void CallDestructors();
      void CallPreInitConstructors();

     private:
      void CallArray(const char* array_name, linker_function_t* functions, size_t count, bool reverse);
      void CallFunction(const char* function_name, linker_function_t function);
};

我们来写一个简单的ndk验证一下这个soinfo结构

#include <jni.h>
#include <string>
#include <elf.h>
#include <dlfcn.h>

#define SOINFO_NAME_LEN 128
struct link_map_t {
    uintptr_t l_addr;
    char*  l_name;
    uintptr_t l_ld;
    link_map_t* l_next;
    link_map_t* l_prev;
};
typedef void (*linker_function_t)();
struct soinfo {
public:
    char name[SOINFO_NAME_LEN];
    const Elf32_Phdr* phdr;
    size_t phnum;
    Elf32_Addr entry;
    Elf32_Addr base;
    unsigned size;

    uint32_t unused1;  // DO NOT USE, maintained for compatibility.

    Elf32_Dyn* dynamic;

    uint32_t unused2; // DO NOT USE, maintained for compatibility
    uint32_t unused3; // DO NOT USE, maintained for compatibility

    soinfo* next;
    unsigned flags;

    const char* strtab;
    Elf32_Sym* symtab;

    size_t nbucket;
    size_t nchain;
    unsigned* bucket;
    unsigned* chain;

    unsigned* plt_got;

    Elf32_Rel* plt_rel;
    size_t plt_rel_count;

    Elf32_Rel* rel;
    size_t rel_count;

    linker_function_t* preinit_array;
    size_t preinit_array_count;

    linker_function_t* init_array;
    size_t init_array_count;
    linker_function_t* fini_array;
    size_t fini_array_count;

    linker_function_t init_func;
    linker_function_t fini_func;

#if defined(ANDROID_ARM_LINKER)
    // ARM EABI section used for stack unwinding.
      unsigned* ARM_exidx;
      size_t ARM_exidx_count;
#elif defined(ANDROID_MIPS_LINKER)
    unsigned mips_symtabno;
      unsigned mips_local_gotno;
      unsigned mips_gotsym;
#endif

    size_t ref_count;
    link_map_t link_map;

    bool constructors_called;

    // When you read a virtual address from the ELF file, add this
    // value to get the corresponding address in the process' address space.
    Elf32_Addr load_bias;

    bool has_text_relocations;
    bool has_DT_SYMBOLIC;

    void CallConstructors();
    void CallDestructors();
    void CallPreInitConstructors();

private:
    void CallArray(const char* array_name, linker_function_t* functions, size_t count, bool reverse);
    void CallFunction(const char* function_name, linker_function_t function);
};


extern "C" JNIEXPORT jstring JNICALL
Java_com_example_mi_testlinkers_MainActivity_stringFromJNI(
        JNIEnv *env,
        jobject /* this */) {

    soinfo *somain = (soinfo *)dlopen(0,0);

    std::string hello = "Hello from C++";
    return env->NewStringUTF(hello.c_str());
}

soinfo

find_library

static soinfo* find_library(const char* name) {
  soinfo* si = find_library_internal(name);
  if (si != NULL) {
    si->ref_count++;
  }
  return si;
}
  • 调用find_library_internal
  • 如果加载成功则把引用加1(ref_count)

find_library_internal

static soinfo* find_library_internal(const char* name) {
  if (name == NULL) {
    return somain;
  }

  soinfo* si = find_loaded_library(name);
  if (si != NULL) {
    if (si->flags & FLAG_LINKED) {
      return si;
    }
    DL_ERR("OOPS: recursive link to \"%s\"", si->name);
    return NULL;
  }

  TRACE("[ '%s' has not been loaded yet.  Locating...]", name);
  si = load_library(name);//加载so文件进入内存
  if (si == NULL) {
    return NULL;
  }

  // At this point we know that whatever is loaded @ base is a valid ELF
  // shared library whose segments are properly mapped in.
  TRACE("[ init_library base=0x%08x sz=0x%08x name='%s' ]",
        si->base, si->size, si->name);

  if (!soinfo_link_image(si)) {//链接so文件(修复重定位..)
    munmap(reinterpret_cast<void*>(si->base), si->size);
    soinfo_free(si);
    return NULL;
  }
  return si;
}
  • 如果name为null就返回somain. somain其实是("/system/bin/app_process")
  • 如果name不是null就调用find_loaded_library从已经加载了的so文件中查找,如果查找到则返回.
  • 如果没有查找到则调用load_library加载so文件.
  • 调用soinfo_link_image,链接so文件(修复重定位,获取符号地址,等等).

我们先看一下load_library.

static soinfo* load_library(const char* name) {
    // Open the file.
    int fd = open_library(name);//打开so文件,获取文件描述符
    if (fd == -1) {
        DL_ERR("library \"%s\" not found", name);
        return NULL;
    }

    // Read the ELF header and load the segments.
    ElfReader elf_reader(name, fd);//建立一个Elfeader对象
    if (!elf_reader.Load()) {//加载so文件进入内存
        return NULL;
    }

    const char* bname = strrchr(name, '/');
    soinfo* si = soinfo_alloc(bname ? bname + 1 : name);
    if (si == NULL) {
        return NULL;
    }
    si->base = elf_reader.load_start();
    si->size = elf_reader.load_size();
    si->load_bias = elf_reader.load_bias();
    si->flags = 0;
    si->entry = 0;
    si->dynamic = NULL;
    si->phnum = elf_reader.phdr_count();
    si->phdr = elf_reader.loaded_phdr();
    return si;
}
  • 通过open_library打开so文件获取文件描述符.
  • 建立ElfReader对象
  • 通过ElfReader对象的Load方法加载so文件进入内存.
  • 后面就是对so的一些参数进行记录.

ElfReader::Load

bool ElfReader::Load() {
  return ReadElfHeader() &&
         VerifyElfHeader() &&
         ReadProgramHeader() &&
         ReserveAddressSpace() &&
         LoadSegments() &&
         FindPhdr();
}
  • 读取Elf头 ReadElfHeader()
  • 校检elf头 VerifyElfHeader()
  • 读取程序头表 ReadProgramHeader()
  • 申请足够的的内存空间 ReserveAddressSpace()
  • 分段加载so文件. LoadSegments()

首先看ReadElfHeader

bool ElfReader::ReadElfHeader() {
  ssize_t rc = TEMP_FAILURE_RETRY(read(fd_, &header_, sizeof(header_)));//读取elf header
  if (rc < 0) {
    DL_ERR("can't read file \"%s\": %s", name_, strerror(errno));
    return false;
  }
  if (rc != sizeof(header_)) {
    DL_ERR("\"%s\" is too small to be an ELF executable", name_);
    return false;
  }
  return true;
}

比较简单,就单纯调用read都一个Elf头的大小.

 

VerifyElfHeader

bool ElfReader::VerifyElfHeader() {
  if (header_.e_ident[EI_MAG0] != ELFMAG0 ||
      header_.e_ident[EI_MAG1] != ELFMAG1 ||
      header_.e_ident[EI_MAG2] != ELFMAG2 ||
      header_.e_ident[EI_MAG3] != ELFMAG3) {
    DL_ERR("\"%s\" has bad ELF magic", name_);
    return false;
  }//校检ELFMAG

  if (header_.e_ident[EI_CLASS] != ELFCLASS32) {//校检class类型 必须为ELFCLASS32
    DL_ERR("\"%s\" not 32-bit: %d", name_, header_.e_ident[EI_CLASS]);
    return false;
  }
  if (header_.e_ident[EI_DATA] != ELFDATA2LSB) {//校检字节序 必须为ELFDATA2LSB
    DL_ERR("\"%s\" not little-endian: %d", name_, header_.e_ident[EI_DATA]);
    return false;
  }

  if (header_.e_type != ET_DYN) {//校检文件类型 必须为ET_DYN
    DL_ERR("\"%s\" has unexpected e_type: %d", name_, header_.e_type);
    return false;
  }

  if (header_.e_version != EV_CURRENT) { 校检版本 必须为EV_CURRENT
    DL_ERR("\"%s\" has unexpected e_version: %d", name_, header_.e_version);
    return false;
  }

  if (header_.e_machine !=
#ifdef ANDROID_ARM_LINKER
      EM_ARM
#elif defined(ANDROID_MIPS_LINKER)
      EM_MIPS
#elif defined(ANDROID_X86_LINKER)
      EM_386
#endif
  ) {
    DL_ERR("\"%s\" has unexpected e_machine: %d", name_, header_.e_machine);
    return false;
  }

  return true;
}

校检:elf标志,class,type,version.

 

ReadProgramHeader

bool ElfReader::ReadProgramHeader() {
  phdr_num_ = header_.e_phnum;

  // Like the kernel, we only accept program header tables that
  // are smaller than 64KiB.
  if (phdr_num_ < 1 || phdr_num_ > 65536/sizeof(Elf32_Phdr)) {
    DL_ERR("\"%s\" has invalid e_phnum: %d", name_, phdr_num_);
    return false;
  }

  Elf32_Addr page_min = PAGE_START(header_.e_phoff);
  Elf32_Addr page_max = PAGE_END(header_.e_phoff + (phdr_num_ * sizeof(Elf32_Phdr)));
  Elf32_Addr page_offset = PAGE_OFFSET(header_.e_phoff);

  phdr_size_ = page_max - page_min;

  void* mmap_result = mmap(NULL, phdr_size_, PROT_READ, MAP_PRIVATE, fd_, page_min);
  if (mmap_result == MAP_FAILED) {
    DL_ERR("\"%s\" phdr mmap failed: %s", name_, strerror(errno));
    return false;
  }

  phdr_mmap_ = mmap_result;
  phdr_table_ = reinterpret_cast<Elf32_Phdr*>(reinterpret_cast<char*>(mmap_result) + page_offset);
  return true;
}

这里先看一下这几个宏的定义

#define PAGE_SHIFT 12
#define PAGE_SIZE (1UL << PAGE_SHIFT) //0x1000 = 4096
/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */
#define PAGE_MASK (~(PAGE_SIZE-1)) //0xfffff000

// Returns the address of the page containing address 'x'.
#define PAGE_START(x)  ((x) & PAGE_MASK) //其实就是把最后12位置0

// Returns the offset of address 'x' in its page.
#define PAGE_OFFSET(x) ((x) & ~PAGE_MASK) //其实就是把前面20位置0,留下最后12位

// Returns the address of the next page after address 'x', unless 'x' is
// itself at the start of a page.
#define PAGE_END(x)    PAGE_START((x) + (PAGE_SIZE-1)) //取一个页的结束地址 比如  0x1002 的页结束地址为 0x2000
  • phdrtable 保存的是程序头表.

ReserveAddressSpace

size_t phdr_table_get_load_size(const Elf32_Phdr* phdr_table,
                                size_t phdr_count,
                                Elf32_Addr* out_min_vaddr,
                                Elf32_Addr* out_max_vaddr)
{
    Elf32_Addr min_vaddr = 0xFFFFFFFFU;//用于保存PT_LOAD段最小的p_vaddr
    Elf32_Addr max_vaddr = 0x00000000U;//用于保存PT_LOAD段最大的p_vaddr

    bool found_pt_load = false;
    for (size_t i = 0; i < phdr_count; ++i) {
        const Elf32_Phdr* phdr = &phdr_table[i];

        if (phdr->p_type != PT_LOAD) {
            continue;
        }
        found_pt_load = true;

        if (phdr->p_vaddr < min_vaddr) {
            min_vaddr = phdr->p_vaddr;
        }

        if (phdr->p_vaddr + phdr->p_memsz > max_vaddr) {
            max_vaddr = phdr->p_vaddr + phdr->p_memsz;
        }
    }
    if (!found_pt_load) {
        min_vaddr = 0x00000000U;
    }

    min_vaddr = PAGE_START(min_vaddr);//取最小地址的页开始地址
    max_vaddr = PAGE_END(max_vaddr);//取最大地址的页结束地址

    if (out_min_vaddr != NULL) {
        *out_min_vaddr = min_vaddr;
    }
    if (out_max_vaddr != NULL) {
        *out_max_vaddr = max_vaddr;
    }
    return max_vaddr - min_vaddr;//最大地址-最小地址 就是加载的大小
}

// Reserve a virtual address range big enough to hold all loadable
// segments of a program header table. This is done by creating a
// private anonymous mmap() with PROT_NONE.
bool ElfReader::ReserveAddressSpace() {
  Elf32_Addr min_vaddr;
  load_size_ = phdr_table_get_load_size(phdr_table_, phdr_num_, &min_vaddr);//计算需要申请多大的加载内存
  if (load_size_ == 0) {
    DL_ERR("\"%s\" has no loadable segments", name_);
    return false;
  }

  uint8_t* addr = reinterpret_cast<uint8_t*>(min_vaddr);
  int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS;
  void* start = mmap(addr, load_size_, PROT_NONE, mmap_flags, -1, 0);//申请内存空间
  if (start == MAP_FAILED) {
    DL_ERR("couldn't reserve %d bytes of address space for \"%s\"", load_size_, name_);
    return false;
  }

  load_start_ = start;
  load_bias_ = reinterpret_cast<uint8_t*>(start) - addr;
  return true;
}
  • 计算需要申请多少空间
  • 申请空间 loadstart 保存申请空间的开始地址.loadbias保存申请空间-最小加载地址.

LoadSegments

bool ElfReader::LoadSegments() {
  for (size_t i = 0; i < phdr_num_; ++i) {
    const Elf32_Phdr* phdr = &phdr_table_[i];

    if (phdr->p_type != PT_LOAD) {//如果不是PT_LOAD就继续循环
      continue;
    }

    // Segment addresses in memory.
    Elf32_Addr seg_start = phdr->p_vaddr + load_bias_;
    Elf32_Addr seg_end   = seg_start + phdr->p_memsz;

    Elf32_Addr seg_page_start = PAGE_START(seg_start);
    Elf32_Addr seg_page_end   = PAGE_END(seg_end);

    Elf32_Addr seg_file_end   = seg_start + phdr->p_filesz;

    // File offsets.
    Elf32_Addr file_start = phdr->p_offset;
    Elf32_Addr file_end   = file_start + phdr->p_filesz;

    Elf32_Addr file_page_start = PAGE_START(file_start);
    Elf32_Addr file_length = file_end - file_page_start;

    if (file_length != 0) {
      void* seg_addr = mmap((void*)seg_page_start,
                            file_length,
                            PFLAGS_TO_PROT(phdr->p_flags),
                            MAP_FIXED|MAP_PRIVATE,
                            fd_,
                            file_page_start);//映射文件内容到指定的地址
      if (seg_addr == MAP_FAILED) {
        DL_ERR("couldn't map \"%s\" segment %d: %s", name_, i, strerror(errno));
        return false;
      }
    }

    // if the segment is writable, and does not end on a page boundary,
    // zero-fill it until the page limit.
    if ((phdr->p_flags & PF_W) != 0 && PAGE_OFFSET(seg_file_end) > 0) {
      memset((void*)seg_file_end, 0, PAGE_SIZE - PAGE_OFFSET(seg_file_end));
    }

    seg_file_end = PAGE_END(seg_file_end);

    // seg_file_end is now the first page address after the file
    // content. If seg_end is larger, we need to zero anything
    // between them. This is done by using a private anonymous
    // map for all extra pages.
    if (seg_page_end > seg_file_end) {
      void* zeromap = mmap((void*)seg_file_end,
                           seg_page_end - seg_file_end,
                           PFLAGS_TO_PROT(phdr->p_flags),
                           MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
                           -1,
                           0);//把不足一页的文件内容后面的数据置0
      if (zeromap == MAP_FAILED) {
        DL_ERR("couldn't zero fill \"%s\" gap: %s", name_, strerror(errno));
        return false;
      }
    }
  }
  return true;
}
  • 映射文件内容到指定的段
  • 把不足一页的文件内容后面的数据置0

在这里我们需要明白加载so文件的一个概念.因为so文件不同于可执行文件.它加载的地址是不一定得,所以p_vaddr里面保存的是相对于加载地址的偏移量.而可执行文件加载的地址是可以固定的,所以可执行文件的p_vaddr一般就是加载的最终地址.

 

假如有一个so文件,有两个加载段

1. p_vaddr 0x2001  p_filesz:0x100
2. p_vaddr 0x3335  p_filesz:0x100

假如加载的基地址是0x77777000

 

loadbias = 加载的基地址 - PAGE_START(最小的p_vaddr) = 0x77777000 - 0x2000 = 0x77775000

 

加载进内存后这两个段的地址分别是

0x77775000 + PAGE_START(0x2001) = 0x77777000
0x77775000 + PAGE_START(0x3335) = 0x77778000

这就是为什么要用基地址-最小加载地址的原因.

 

FindPhdr
返回程序头部表在内存中地址。这与phdrtable是不同的,
后者是一个临时的、在so被重定位之前会为释放的变量:

bool ElfReader::FindPhdr() {
  const Elf32_Phdr* phdr_limit = phdr_table_ + phdr_num_;

  // If there is a PT_PHDR, use it directly.
  for (const Elf32_Phdr* phdr = phdr_table_; phdr < phdr_limit; ++phdr) {
    if (phdr->p_type == PT_PHDR) {
      return CheckPhdr(load_bias_ + phdr->p_vaddr);
    }
  }

  // Otherwise, check the first loadable segment. If its file offset
  // is 0, it starts with the ELF header, and we can trivially find the
  // loaded program header from it.
  for (const Elf32_Phdr* phdr = phdr_table_; phdr < phdr_limit; ++phdr) {
    if (phdr->p_type == PT_LOAD) {
      if (phdr->p_offset == 0) {
        Elf32_Addr  elf_addr = load_bias_ + phdr->p_vaddr;
        const Elf32_Ehdr* ehdr = (const Elf32_Ehdr*)(void*)elf_addr;
        Elf32_Addr  offset = ehdr->e_phoff;
        return CheckPhdr((Elf32_Addr)ehdr + offset);
      }
      break;
    }
  }

  DL_ERR("can't find loaded phdr for \"%s\"", name_);
  return false;
}

// Ensures that our program header is actually within a loadable
// segment. This should help catch badly-formed ELF files that
// would cause the linker to crash later when trying to access it.
bool ElfReader::CheckPhdr(Elf32_Addr loaded) {
  const Elf32_Phdr* phdr_limit = phdr_table_ + phdr_num_;
  Elf32_Addr loaded_end = loaded + (phdr_num_ * sizeof(Elf32_Phdr));
  for (Elf32_Phdr* phdr = phdr_table_; phdr < phdr_limit; ++phdr) {
    if (phdr->p_type != PT_LOAD) {
      continue;
    }
    Elf32_Addr seg_start = phdr->p_vaddr + load_bias_;
    Elf32_Addr seg_end = phdr->p_filesz + seg_start;
    if (seg_start <= loaded && loaded_end <= seg_end) {
      loaded_phdr_ = reinterpret_cast<const Elf32_Phdr*>(loaded);
      return true;
    }
  }
  DL_ERR("\"%s\" loaded phdr %x not in loadable segment", name_, loaded);
  return false;
}

链接so文件

soinfo_link_image

static bool soinfo_link_image(soinfo* si) {
    /* "base" might wrap around UINT32_MAX. */
    Elf32_Addr base = si->load_bias;
    const Elf32_Phdr *phdr = si->phdr;
    int phnum = si->phnum;
    bool relocating_linker = (si->flags & FLAG_LINKER) != 0;

    /* We can't debug anything until the linker is relocated */
    if (!relocating_linker) {
        INFO("[ linking %s ]", si->name);
        DEBUG("si->base = 0x%08x si->flags = 0x%08x", si->base, si->flags);
    }

    /* Extract dynamic section */
    size_t dynamic_count;
    Elf32_Word dynamic_flags;
    phdr_table_get_dynamic_section(phdr, phnum, base, &si->dynamic,
                                   &dynamic_count, &dynamic_flags);
    if (si->dynamic == NULL) {
        if (!relocating_linker) {
            DL_ERR("missing PT_DYNAMIC in \"%s\"", si->name);
        }
        return false;
    } else {
        if (!relocating_linker) {
            DEBUG("dynamic = %p", si->dynamic);
        }
    }

#ifdef ANDROID_ARM_LINKER
    (void) phdr_table_get_arm_exidx(phdr, phnum, base,
                                    &si->ARM_exidx, &si->ARM_exidx_count);
#endif

    // Extract useful information from dynamic section.
    uint32_t needed_count = 0;
    for (Elf32_Dyn* d = si->dynamic; d->d_tag != DT_NULL; ++d) {
        DEBUG("d = %p, d[0](tag) = 0x%08x d[1](val) = 0x%08x", d, d->d_tag, d->d_un.d_val);
        switch(d->d_tag){
        case DT_HASH:
            si->nbucket = ((unsigned *) (base + d->d_un.d_ptr))[0];
            si->nchain = ((unsigned *) (base + d->d_un.d_ptr))[1];
            si->bucket = (unsigned *) (base + d->d_un.d_ptr + 8);
            si->chain = (unsigned *) (base + d->d_un.d_ptr + 8 + si->nbucket * 4);
            break;
        case DT_STRTAB:
            si->strtab = (const char *) (base + d->d_un.d_ptr);
            break;
        case DT_SYMTAB:
            si->symtab = (Elf32_Sym *) (base + d->d_un.d_ptr);
            break;
        case DT_PLTREL:
            if (d->d_un.d_val != DT_REL) {
                DL_ERR("unsupported DT_RELA in \"%s\"", si->name);
                return false;
            }
            break;
        case DT_JMPREL:
            si->plt_rel = (Elf32_Rel*) (base + d->d_un.d_ptr);
            break;
        case DT_PLTRELSZ:
            si->plt_rel_count = d->d_un.d_val / sizeof(Elf32_Rel);
            break;
        case DT_REL:
            si->rel = (Elf32_Rel*) (base + d->d_un.d_ptr);
            break;
        case DT_RELSZ:
            si->rel_count = d->d_un.d_val / sizeof(Elf32_Rel);
            break;
        case DT_PLTGOT:
            /* Save this in case we decide to do lazy binding. We don't yet. */
            si->plt_got = (unsigned *)(base + d->d_un.d_ptr);
            break;
        case DT_DEBUG:
            // Set the DT_DEBUG entry to the address of _r_debug for GDB
            // if the dynamic table is writable
            if ((dynamic_flags & PF_W) != 0) {
                d->d_un.d_val = (int) &_r_debug;
            }
            break;
         case DT_RELA:
            DL_ERR("unsupported DT_RELA in \"%s\"", si->name);
            return false;
        case DT_INIT:
            si->init_func = reinterpret_cast<linker_function_t>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_INIT) found at %p", si->name, si->init_func);
            break;
        case DT_FINI:
            si->fini_func = reinterpret_cast<linker_function_t>(base + d->d_un.d_ptr);
            DEBUG("%s destructors (DT_FINI) found at %p", si->name, si->fini_func);
            break;
        case DT_INIT_ARRAY:
            si->init_array = reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_INIT_ARRAY) found at %p", si->name, si->init_array);
            break;
        case DT_INIT_ARRAYSZ:
            si->init_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_FINI_ARRAY:
            si->fini_array = reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr);
            DEBUG("%s destructors (DT_FINI_ARRAY) found at %p", si->name, si->fini_array);
            break;
        case DT_FINI_ARRAYSZ:
            si->fini_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_PREINIT_ARRAY:
            si->preinit_array = reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_PREINIT_ARRAY) found at %p", si->name, si->preinit_array);
            break;
        case DT_PREINIT_ARRAYSZ:
            si->preinit_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_TEXTREL:
            si->has_text_relocations = true;
            break;
        case DT_SYMBOLIC:
            si->has_DT_SYMBOLIC = true;
            break;
        case DT_NEEDED:
            ++needed_count;
            break;
#if defined DT_FLAGS
        // TODO: why is DT_FLAGS not defined?
        case DT_FLAGS:
            if (d->d_un.d_val & DF_TEXTREL) {
                si->has_text_relocations = true;
            }
            if (d->d_un.d_val & DF_SYMBOLIC) {
                si->has_DT_SYMBOLIC = true;
            }
            break;
#endif
#if defined(ANDROID_MIPS_LINKER)
        case DT_STRSZ:
        case DT_SYMENT:
        case DT_RELENT:
             break;
        case DT_MIPS_RLD_MAP:
            // Set the DT_MIPS_RLD_MAP entry to the address of _r_debug for GDB.
            {
              r_debug** dp = (r_debug**) d->d_un.d_ptr;
              *dp = &_r_debug;
            }
            break;
        case DT_MIPS_RLD_VERSION:
        case DT_MIPS_FLAGS:
        case DT_MIPS_BASE_ADDRESS:
        case DT_MIPS_UNREFEXTNO:
            break;

        case DT_MIPS_SYMTABNO:
            si->mips_symtabno = d->d_un.d_val;
            break;

        case DT_MIPS_LOCAL_GOTNO:
            si->mips_local_gotno = d->d_un.d_val;
            break;

        case DT_MIPS_GOTSYM:
            si->mips_gotsym = d->d_un.d_val;
            break;

        default:
            DEBUG("Unused DT entry: type 0x%08x arg 0x%08x", d->d_tag, d->d_un.d_val);
            break;
#endif
        }
    }

    DEBUG("si->base = 0x%08x, si->strtab = %p, si->symtab = %p",
          si->base, si->strtab, si->symtab);

    // Sanity checks.
    if (relocating_linker && needed_count != 0) {
        DL_ERR("linker cannot have DT_NEEDED dependencies on other libraries");
        return false;
    }
    if (si->nbucket == 0) { //DT_HASH 是必须的.
        DL_ERR("empty/missing DT_HASH in \"%s\" (built with --hash-style=gnu?)", si->name);
        return false;
    }
    if (si->strtab == 0) {// DT_STRTAB 是必须的.
        DL_ERR("empty/missing DT_STRTAB in \"%s\"", si->name);
        return false;
    }
    if (si->symtab == 0) { //DT_SYMTAB 是必须的.
        DL_ERR("empty/missing DT_SYMTAB in \"%s\"", si->name);
        return false;
    }

    // If this is the main executable, then load all of the libraries from LD_PRELOAD now.
    if (si->flags & FLAG_EXE) {
        memset(gLdPreloads, 0, sizeof(gLdPreloads));
        size_t preload_count = 0;
        for (size_t i = 0; gLdPreloadNames[i] != NULL; i++) {
            soinfo* lsi = find_library(gLdPreloadNames[i]);
            if (lsi != NULL) {
                gLdPreloads[preload_count++] = lsi;
            } else {
                // As with glibc, failure to load an LD_PRELOAD library is just a warning.
                DL_WARN("could not load library \"%s\" from LD_PRELOAD for \"%s\"; caused by %s",
                        gLdPreloadNames[i], si->name, linker_get_error_buffer());
            }
        }
    }

    soinfo** needed = (soinfo**) alloca((1 + needed_count) * sizeof(soinfo*));
    soinfo** pneeded = needed;

    for (Elf32_Dyn* d = si->dynamic; d->d_tag != DT_NULL; ++d) {//加载所有的DT_NEEDED对应的so文件
        if (d->d_tag == DT_NEEDED) {
            const char* library_name = si->strtab + d->d_un.d_val;
            DEBUG("%s needs %s", si->name, library_name);
            soinfo* lsi = find_library(library_name);
            if (lsi == NULL) {
                strlcpy(tmp_err_buf, linker_get_error_buffer(), sizeof(tmp_err_buf));
                DL_ERR("could not load library \"%s\" needed by \"%s\"; caused by %s",
                       library_name, si->name, tmp_err_buf);
                return false;
            }
            *pneeded++ = lsi;
        }
    }
    *pneeded = NULL;

    if (si->has_text_relocations) {
        /* Unprotect the segments, i.e. make them writable, to allow
         * text relocations to work properly. We will later call
         * phdr_table_protect_segments() after all of them are applied
         * and all constructors are run.
         */
        DL_WARN("%s has text relocations. This is wasting memory and is "
                "a security risk. Please fix.", si->name);
        if (phdr_table_unprotect_segments(si->phdr, si->phnum, si->load_bias) < 0) {
            DL_ERR("can't unprotect loadable segments for \"%s\": %s",
                   si->name, strerror(errno));
            return false;
        }
    }

    if (si->plt_rel != NULL) {
        DEBUG("[ relocating %s plt ]", si->name );
        if (soinfo_relocate(si, si->plt_rel, si->plt_rel_count, needed)) {//修复重定位信息
            return false;
        }
    }
    if (si->rel != NULL) {
        DEBUG("[ relocating %s ]", si->name );
        if (soinfo_relocate(si, si->rel, si->rel_count, needed)) {
            return false;
        }
    }

#ifdef ANDROID_MIPS_LINKER
    if (!mips_relocate_got(si, needed)) {
        return false;
    }
#endif

    si->flags |= FLAG_LINKED;
    DEBUG("[ finished linking %s ]", si->name);

    if (si->has_text_relocations) {
        /* All relocations are done, we can protect our segments back to
         * read-only. */
        if (phdr_table_protect_segments(si->phdr, si->phnum, si->load_bias) < 0) {
            DL_ERR("can't protect segments for \"%s\": %s",
                   si->name, strerror(errno));
            return false;
        }
    }

    /* We can also turn on GNU RELRO protection */
    if (phdr_table_protect_gnu_relro(si->phdr, si->phnum, si->load_bias) < 0) {
        DL_ERR("can't enable GNU RELRO protection for \"%s\": %s",
               si->name, strerror(errno));
        return false;
    }

    notify_gdb_of_load(si);
    return true;
}

我们先看一下PT_DYNAMIC对应的结构是Elf32_Dyn.

typedef struct dynamic {
  Elf32_Sword d_tag;
  union {
    Elf32_Sword d_val;
    Elf32_Addr d_ptr;
  } d_un;
} Elf32_Dyn;

d_un这个联合体里面的值怎么解释是根据d_tag来定的.

 

该Elf32_Dyn数组就是soinfo结构体中的dynamic成员,我们在介绍的load_library函数中发现,si->dynamic被赋值为null,这就说明,在加载阶段是不需要此值的,只有在链接阶段才需要。

 

从上面我们可以作如下总结:

  • Android linker没有提供懒加载
  • DT_HASH DT_SYMTAB DT_STRTAB 是必须提供的.
  • 会加载所有DT_NEEDED提供的so文件.
  • DT_JMPREL,DT_PLTRELSZ 和 DT_REL,DT_RELSZ 分别提供重定位信息所在的位置及其大小
  • soinfo_relocate函数 修复重定位信息.

接下来我们看一下soinfo_relocate是怎么运作的

重定位修复:soinfo_relocate

/* TODO: don't use unsigned for addrs below. It works, but is not
 * ideal. They should probably be either uint32_t, Elf32_Addr, or unsigned
 * long.
 */
static int soinfo_relocate(soinfo* si, Elf32_Rel* rel, unsigned count,
                           soinfo* needed[])
{
    Elf32_Sym* symtab = si->symtab;
    const char* strtab = si->strtab;
    Elf32_Sym* s;
    Elf32_Rel* start = rel;
    soinfo* lsi;

    for (size_t idx = 0; idx < count; ++idx, ++rel) {
        unsigned type = ELF32_R_TYPE(rel->r_info);//获取重定位类型
        unsigned sym = ELF32_R_SYM(rel->r_info);//对应的符号
        Elf32_Addr reloc = static_cast<Elf32_Addr>(rel->r_offset + si->load_bias);
        Elf32_Addr sym_addr = 0;
        char* sym_name = NULL;

        DEBUG("Processing '%s' relocation at index %d", si->name, idx);
        if (type == 0) { // R_*_NONE
            continue;
        }
        if (sym != 0) {
            sym_name = (char *)(strtab + symtab[sym].st_name);
            s = soinfo_do_lookup(si, sym_name, &lsi, needed);//查找符号地址
            if (s == NULL) {
                /* We only allow an undefined symbol if this is a weak
                   reference..   */
                s = &symtab[sym];
                if (ELF32_ST_BIND(s->st_info) != STB_WEAK) {
                    DL_ERR("cannot locate symbol \"%s\" referenced by \"%s\"...", sym_name, si->name);
                    return -1;
                }

                /* IHI0044C AAELF 4.5.1.1:

                   Libraries are not searched to resolve weak references.
                   It is not an error for a weak reference to remain
                   unsatisfied.

                   During linking, the value of an undefined weak reference is:
                   - Zero if the relocation type is absolute
                   - The address of the place if the relocation is pc-relative
                   - The address of nominal base address if the relocation
                     type is base-relative.
                  */

                switch (type) {
#if defined(ANDROID_ARM_LINKER)
                case R_ARM_JUMP_SLOT:
                case R_ARM_GLOB_DAT:
                case R_ARM_ABS32:
                case R_ARM_RELATIVE:    /* Don't care. */
#elif defined(ANDROID_X86_LINKER)
                case R_386_JMP_SLOT:
                case R_386_GLOB_DAT:
                case R_386_32:
                case R_386_RELATIVE:    /* Dont' care. */
#endif /* ANDROID_*_LINKER */
                    /* sym_addr was initialized to be zero above or relocation
                       code below does not care about value of sym_addr.
                       No need to do anything.  */
                    break;

#if defined(ANDROID_X86_LINKER)
                case R_386_PC32:
                    sym_addr = reloc;
                    break;
#endif /* ANDROID_X86_LINKER */

#if defined(ANDROID_ARM_LINKER)
                case R_ARM_COPY:
                    /* Fall through.  Can't really copy if weak symbol is
                       not found in run-time.  */
#endif /* ANDROID_ARM_LINKER */
                default:
                    DL_ERR("unknown weak reloc type %d @ %p (%d)",
                                 type, rel, (int) (rel - start));
                    return -1;
                }
            } else {
                /* We got a definition.  */
#if 0
                if ((base == 0) && (si->base != 0)) {
                        /* linking from libraries to main image is bad */
                    DL_ERR("cannot locate \"%s\"...",
                           strtab + symtab[sym].st_name);
                    return -1;
                }
#endif
                sym_addr = static_cast<Elf32_Addr>(s->st_value + lsi->load_bias);
            }
            count_relocation(kRelocSymbol);
        } else {
            s = NULL;
        }

/* TODO: This is ugly. Split up the relocations by arch into
 * different files.
 */
        switch(type){
#if defined(ANDROID_ARM_LINKER)
        case R_ARM_JUMP_SLOT:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO JMP_SLOT %08x <- %08x %s", reloc, sym_addr, sym_name);
            *reinterpret_cast<Elf32_Addr*>(reloc) = sym_addr;
            break;
        case R_ARM_GLOB_DAT:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO GLOB_DAT %08x <- %08x %s", reloc, sym_addr, sym_name);
            *reinterpret_cast<Elf32_Addr*>(reloc) = sym_addr;
            break;
        case R_ARM_ABS32:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO ABS %08x <- %08x %s", reloc, sym_addr, sym_name);
            *reinterpret_cast<Elf32_Addr*>(reloc) += sym_addr;
            break;
        case R_ARM_REL32:
            count_relocation(kRelocRelative);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO REL32 %08x <- %08x - %08x %s",
                       reloc, sym_addr, rel->r_offset, sym_name);
            *reinterpret_cast<Elf32_Addr*>(reloc) += sym_addr - rel->r_offset;
            break;
#elif defined(ANDROID_X86_LINKER)
        case R_386_JMP_SLOT:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO JMP_SLOT %08x <- %08x %s", reloc, sym_addr, sym_name);
            *reinterpret_cast<Elf32_Addr*>(reloc) = sym_addr;
            break;
        case R_386_GLOB_DAT:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO GLOB_DAT %08x <- %08x %s", reloc, sym_addr, sym_name);
            *reinterpret_cast<Elf32_Addr*>(reloc) = sym_addr;
            break;
#elif defined(ANDROID_MIPS_LINKER)
    case R_MIPS_REL32:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO REL32 %08x <- %08x %s",
                       reloc, sym_addr, (sym_name) ? sym_name : "*SECTIONHDR*");
            if (s) {
                *reinterpret_cast<Elf32_Addr*>(reloc) += sym_addr;
            } else {
                *reinterpret_cast<Elf32_Addr*>(reloc) += si->base;
            }
            break;
#endif /* ANDROID_*_LINKER */

#if defined(ANDROID_ARM_LINKER)
        case R_ARM_RELATIVE:
#elif defined(ANDROID_X86_LINKER)
        case R_386_RELATIVE:
#endif /* ANDROID_*_LINKER */
            count_relocation(kRelocRelative);
            MARK(rel->r_offset);
            if (sym) {
                DL_ERR("odd RELATIVE form...");
                return -1;
            }
            TRACE_TYPE(RELO, "RELO RELATIVE %08x <- +%08x", reloc, si->base);
            *reinterpret_cast<Elf32_Addr*>(reloc) += si->base;
            break;

#if defined(ANDROID_X86_LINKER)
        case R_386_32:
            count_relocation(kRelocRelative);
            MARK(rel->r_offset);

            TRACE_TYPE(RELO, "RELO R_386_32 %08x <- +%08x %s", reloc, sym_addr, sym_name);
            *reinterpret_cast<Elf32_Addr*>(reloc) += sym_addr;
            break;

        case R_386_PC32:
            count_relocation(kRelocRelative);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO R_386_PC32 %08x <- +%08x (%08x - %08x) %s",
                       reloc, (sym_addr - reloc), sym_addr, reloc, sym_name);
            *reinterpret_cast<Elf32_Addr*>(reloc) += (sym_addr - reloc);
            break;
#endif /* ANDROID_X86_LINKER */

#ifdef ANDROID_ARM_LINKER
        case R_ARM_COPY:
            if ((si->flags & FLAG_EXE) == 0) {
                /*
                 * http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
                 *
                 * Section 4.7.1.10 "Dynamic relocations"
                 * R_ARM_COPY may only appear in executable objects where e_type is
                 * set to ET_EXEC.
                 *
                 * TODO: FLAG_EXE is set for both ET_DYN and ET_EXEC executables.
                 * We should explicitly disallow ET_DYN executables from having
                 * R_ARM_COPY relocations.
                 */
                DL_ERR("%s R_ARM_COPY relocations only supported for ET_EXEC", si->name);
                return -1;
            }
            count_relocation(kRelocCopy);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO %08x <- %d @ %08x %s", reloc, s->st_size, sym_addr, sym_name);
            if (reloc == sym_addr) {
                Elf32_Sym *src = soinfo_do_lookup(NULL, sym_name, &lsi, needed);

                if (src == NULL) {
                    DL_ERR("%s R_ARM_COPY relocation source cannot be resolved", si->name);
                    return -1;
                }
                if (lsi->has_DT_SYMBOLIC) {
                    DL_ERR("%s invalid R_ARM_COPY relocation against DT_SYMBOLIC shared "
                           "library %s (built with -Bsymbolic?)", si->name, lsi->name);
                    return -1;
                }
                if (s->st_size < src->st_size) {
                    DL_ERR("%s R_ARM_COPY relocation size mismatch (%d < %d)",
                           si->name, s->st_size, src->st_size);
                    return -1;
                }
                memcpy((void*)reloc, (void*)(src->st_value + lsi->load_bias), src->st_size);
            } else {
                DL_ERR("%s R_ARM_COPY relocation target cannot be resolved", si->name);
                return -1;
            }
            break;
#endif /* ANDROID_ARM_LINKER */

        default:
            DL_ERR("unknown reloc type %d @ %p (%d)",
                   type, rel, (int) (rel - start));
            return -1;
        }
    }
    return 0;
}

我们可以看到每个重定位信息都对应一个Elf32_Sym符号信息.并且r_info提供了重定位类型和对应的符号表索引.

 

如果有兴趣可以自己查看每个符号是怎么修复的.

 

接下来我们看一个查找符号地址的函数soinfo_do_lookup

查找符号地址:soinfo_do_lookup

static Elf32_Sym* soinfo_do_lookup(soinfo* si, const char* name, soinfo** lsi, soinfo* needed[]) {
    unsigned elf_hash = elfhash(name);
    Elf32_Sym* s = NULL;

    if (si != NULL && somain != NULL) {

        /*
         * Local scope is executable scope. Just start looking into it right away
         * for the shortcut.
         */

        if (si == somain) {
            s = soinfo_elf_lookup(si, elf_hash, name);
            if (s != NULL) {
                *lsi = si;
                goto done;
            }
        } else {
            /* Order of symbol lookup is controlled by DT_SYMBOLIC flag */

            /*
             * If this object was built with symbolic relocations disabled, the
             * first place to look to resolve external references is the main
             * executable.
             */

            if (!si->has_DT_SYMBOLIC) {
                DEBUG("%s: looking up %s in executable %s",
                      si->name, name, somain->name);
                s = soinfo_elf_lookup(somain, elf_hash, name);
                if (s != NULL) {
                    *lsi = somain;
                    goto done;
                }
            }

            /* Look for symbols in the local scope (the object who is
             * searching). This happens with C++ templates on i386 for some
             * reason.
             *
             * Notes on weak symbols:
             * The ELF specs are ambiguous about treatment of weak definitions in
             * dynamic linking.  Some systems return the first definition found
             * and some the first non-weak definition.   This is system dependent.
             * Here we return the first definition found for simplicity.  */

            s = soinfo_elf_lookup(si, elf_hash, name);//首先在自身模块地址获取
            if (s != NULL) {
                *lsi = si;
                goto done;
            }

            /*
             * If this object was built with -Bsymbolic and symbol is not found
             * in the local scope, try to find the symbol in the main executable.
             */

            if (si->has_DT_SYMBOLIC) {
                DEBUG("%s: looking up %s in executable %s after local scope",
                      si->name, name, somain->name);
                s = soinfo_elf_lookup(somain, elf_hash, name);//在somain获取
                if (s != NULL) {
                    *lsi = somain;
                    goto done;
                }
            }
        }
    }

    /* Next, look for it in the preloads list */
    for (int i = 0; gLdPreloads[i] != NULL; i++) {//在预加载的模块中查找
        s = soinfo_elf_lookup(gLdPreloads[i], elf_hash, name);
        if (s != NULL) {
            *lsi = gLdPreloads[i];
            goto done;
        }
    }

    for (int i = 0; needed[i] != NULL; i++) {//在加载进来的模块中查找
        DEBUG("%s: looking up %s in %s",
              si->name, name, needed[i]->name);
        s = soinfo_elf_lookup(needed[i], elf_hash, name);
        if (s != NULL) {
            *lsi = needed[i];
            goto done;
        }
    }

done:
    if (s != NULL) {
        TRACE_TYPE(LOOKUP, "si %s sym %s s->st_value = 0x%08x, "
                   "found in %s, base = 0x%08x, load bias = 0x%08x",
                   si->name, name, s->st_value,
                   (*lsi)->name, (*lsi)->base, (*lsi)->load_bias);
        return s;
    }

    return NULL;
}

符号会被elfhash经过HASH计算返回一个整数,然后通过函数soinfo_elf_lookup返回符号的地址.

 

我们可以看到其查找的步骤:

  1. 在自身模块获取符号信息
  2. 在somain模块获取符号信息
  3. 在gLdPreloads数组中的模块获取符号信息
  4. 在DT_NEEDED加载进来的模块数组中查找

其中查找的函数为soinfo_elf_lookup

unsigned elfhash(const char* _name) {
    const unsigned char* name = (const unsigned char*) _name;
    unsigned h = 0, g;

    while(*name) {
        h = (h << 4) + *name++;
        g = h & 0xf0000000;
        h ^= g;
        h ^= g >> 24;
    }
    return h;
}

static Elf32_Sym* soinfo_elf_lookup(soinfo* si, unsigned hash, const char* name) {
    Elf32_Sym* symtab = si->symtab;
    const char* strtab = si->strtab;

    TRACE_TYPE(LOOKUP, "SEARCH %s in %s@0x%08x %08x %d",
               name, si->name, si->base, hash, hash % si->nbucket);

    for (unsigned n = si->bucket[hash % si->nbucket]; n != 0; n = si->chain[n]) {
        Elf32_Sym* s = symtab + n;
        if (strcmp(strtab + s->st_name, name)) continue;

            /* only concern ourselves with global and weak symbol definitions */
        switch(ELF32_ST_BIND(s->st_info)){
        case STB_GLOBAL:
        case STB_WEAK:
            if (s->st_shndx == SHN_UNDEF) {
                continue;
            }

            TRACE_TYPE(LOOKUP, "FOUND %s in %s (%08x) %d",
                       name, si->name, s->st_value, s->st_size);
            return s;
        }
    }

    return NULL;
}

这里就是通过DT_HASH对应的那张表来查找指定符号了.
这里我列一下DT_HASH动态节表对应的结构如下.

{
    unsigned nbucket;
    unsigned nchain;
    unsigned bucket[nbucket];
    unsigned chain[nchain];
}

dlsym其实最终就是通过这个函数返回符号地址的.


[课程]FART 脱壳王!加量不加价!FART作者讲授!

最后于 2019-2-22 17:54 被chpeagle编辑 ,原因: 修改标题
收藏
免费 3
支持
分享
打赏 + 2.00雪花
打赏次数 1 雪花 + 2.00
 
赞赏  orz1ruo   +2.00 2019/02/22 感谢分享~
最新回复 (13)
雪    币: 21449
活跃值: (62288)
能力值: (RANK:125 )
在线值:
发帖
回帖
粉丝
2
感谢分享!
2019-2-22 18:03
0
雪    币: 2968
活跃值: (319)
能力值: ( LV3,RANK:20 )
在线值:
发帖
回帖
粉丝
3
学习复习中
2019-2-22 18:10
0
雪    币: 398
活跃值: (286)
能力值: ( LV5,RANK:70 )
在线值:
发帖
回帖
粉丝
4
我的so加壳已经实现所有版本系统的兼容,目前在处理针对具体机型遇到的小bug
2019-2-22 18:26
0
雪    币: 2685
活跃值: (3589)
能力值: ( LV9,RANK:140 )
在线值:
发帖
回帖
粉丝
5
好文
2019-2-23 10:57
0
雪    币: 6573
活跃值: (3858)
能力值: (RANK:200 )
在线值:
发帖
回帖
粉丝
6
不错,写的很详细。对linker的加载机制和elf文件格式的深刻理解,才能在对so处理的上,得心应手。
2019-2-23 12:53
0
雪    币: 46
活跃值: (40)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
7
解释的很清楚,学习了
2019-2-27 09:32
0
雪    币: 4384
活跃值: (861)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
8
王正飞 我的so加壳已经实现所有版本系统的兼容,目前在处理针对具体机型遇到的小bug[em_13]
你好,可以拿出来观摩一下吗,针对各个android版本系统兼容,我现在遇到很大的问题
2019-6-6 09:30
0
雪    币: 399
活跃值: (2619)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
9
赞一个
2019-6-8 17:07
0
雪    币: 57
活跃值: (381)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
10
赞一个 
2019-6-17 11:23
0
雪    币: 102
活跃值: (1845)
能力值: ( LV4,RANK:50 )
在线值:
发帖
回帖
粉丝
11
mark
2019-6-19 18:08
0
雪    币: 290
活跃值: (366)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
12
2020-3-27 14:04
0
雪    币: 5330
活跃值: (5424)
能力值: ( LV9,RANK:170 )
在线值:
发帖
回帖
粉丝
13
mark
2020-3-28 15:23
0
雪    币: 35
活跃值: (56)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
14
2020-4-26 13:24
0
游客
登录 | 注册 方可回帖
返回
//