首页
社区
课程
招聘
[原创] ELF文件结构浅析-解析器和加载器实现
发表于: 7小时前 438

[原创] ELF文件结构浅析-解析器和加载器实现

7小时前
438

前言

近期冲浪刷到大佬博客ELF文件格式, 心血来潮

网上有不少ELF文件结构相关的文章,但大都介绍原理,具体的代码实现并不多(或许是因为有开源代码)

然而阅读开源代码不是我的强项(看的头大), 于是依据当年学习PE文件结构的思路,学习ELF文件格式

仿照 readelf 的输出结果编写解析器, 最后编写了简单的ELF加载器

代码支持x86和x64的ELF文件:

  • 解析器针对x86/x64有两套实现, 支持解析x86和x64平台的ELF文件

  • 加载器依赖编译环境,只能加载对应平台的ELF文件,要分别编译x86和x64的加载器

  • 内容讲解演示主要以x86为主

环境&工具:

  • VMware pro 17.6.1
  • Kali Linux 2023.4 vmware amd64
  • gcc (Debian 14.2.0-8) 14.2.0
  • CLion 2024.2.3
  • 010 Editor 13.0.1
  • IDA Pro 7.7

附件:

  • Sources.zip
  • CompiledTools.zip
  • TestFiles.zip

由于本人水平有限, 内容错误之处还望大佬多多包涵, 批评指正

ELF文件结构概述

ELF是UNIX系统实验室(USL)作为应用程序二进制接口(Application Binary Interface,ABI)而开发和发布的,也是Linux的主要可执行文件格式, 全称是Executable and Linking Format,这个名字相当关键,包含了ELF所需要支持的两个功能——执行和链接

ELF文件包含3大部分,ELF头,ELF节,ELF段:

  • 节头表指向节, 类似PE的节表, 描述各个节区的信息

  • 程序头表描述段信息,一个段可以包含多个节,指导ELF文件如何映射至文件

  • 在OBJ文件中,段是可选的,在可执行文件中,节是可选的,但NDK编译的ELF文件同时有段和节

ELF文件封装了部分数据类型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <stdint.h>
 
typedef uint16_t Elf32_Half;
typedef uint16_t Elf64_Half;
 
/* Types for signed and unsigned 32-bit quantities.  */
typedef uint32_t Elf32_Word;
typedef int32_t  Elf32_Sword;
typedef uint32_t Elf64_Word;
typedef int32_t  Elf64_Sword;
 
/* Types for signed and unsigned 64-bit quantities.  */
typedef uint64_t Elf32_Xword;
typedef int64_t  Elf32_Sxword;
typedef uint64_t Elf64_Xword;
typedef int64_t  Elf64_Sxword;
 
/* Type of addresses.  */
typedef uint32_t Elf32_Addr;
typedef uint64_t Elf64_Addr;
 
/* Type of file offsets.  */
typedef uint32_t Elf32_Off;
typedef uint64_t Elf64_Off;
 
/* Type for section indices, which are 16-bit quantities.  */
typedef uint16_t Elf32_Section;
typedef uint16_t Elf64_Section;
 
/* Type for version symbol information.  */
typedef Elf32_Half Elf32_Versym;
typedef Elf64_Half Elf64_Versym;

可以发现,32和64位定义的数据结构仅有Addr和Off有位宽差距,我们可以定义对应的通用类型

ELF数据结构 原始类型 备注
Elfn_Half uint16_t
Elfn_Word uint32_t
Elfn_Sword int32_t
Elfn_Xword uint64_t
Elfn_Sxword int64_t
Elf32_Addr uint32_t 地址
Elf64_Addr uint64_t
Elf32_Off uint32_t 文件偏移
Elf64_Off uint64_t
Elfn_Section uint16_t 节索引
Elfn_Versym uint16_t

使用gcc分别编译32/64位的elf可执行文件用于测试

1
2
3
4
5
6
#include <stdio.h>
 
int main(int argc, char* argv[]){
    printf("Hello ELF!\n");
    return 0;
}
1
2
gcc -m32 -O0 main.c -o HelloELF32
gcc -m64 -O0 main.c -o HelloELF64

编写ELF解析器/加载器前,定义文件读取函数

读取指定路径文件,返回字节指针和读取文件大小

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// 读取文件,返回buffer和读取字节数
uint8_t* readFileToBytes(const char *fileName,size_t* readSize) {
    FILE *file = fopen(fileName, "rb");
    if (file == NULL) {
        printf("Error opening file\n");
        fclose(file);
        return NULL;
    }
    fseek(file, 0,SEEK_END);
    size_t fileSize = ftell(file);
    fseek(file, 0,SEEK_SET);
    uint8_t *buffer = (uint8_t *) malloc(fileSize);
    if (buffer == NULL) {
        printf("Error allocating memory\n");
        fclose(file);
        return NULL;
    }
    size_t bytesRead = fread(buffer, 1, fileSize, file);
    if(bytesRead!=fileSize) {
        printf("Read bytes not equal file size!\n");
        free(buffer);
        fclose(file);
        return NULL;
    }
    fclose(file);
    if(readSize)
        *readSize=bytesRead;
    return buffer;
}

ELF Header

定义在elf.h中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#define EI_NIDENT (16)
typedef struct
{
  unsigned char    e_ident[EI_NIDENT];    /* Magic number and other info */
  Elf32_Half    e_type;            /* Object file type */
  Elf32_Half    e_machine;        /* Architecture */
  Elf32_Word    e_version;        /* Object file version */
  Elf32_Addr    e_entry;        /* Entry point virtual address */
  Elf32_Off     e_phoff;        /* Program header table file offset */
  Elf32_Off     e_shoff;        /* Section header table file offset */
  Elf32_Word    e_flags;        /* Processor-specific flags */
  Elf32_Half    e_ehsize;        /* ELF header size in bytes */
  Elf32_Half    e_phentsize;        /* Program header table entry size */
  Elf32_Half    e_phnum;        /* Program header table entry count */
  Elf32_Half    e_shentsize;        /* Section header table entry size */
  Elf32_Half    e_shnum;        /* Section header table entry count */
  Elf32_Half    e_shstrndx;        /* Section header string table index */
} Elf32_Ehdr;
 
//64位
typedef struct
{
  unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
  Elf64_Half    e_type;         /* Object file type */
  Elf64_Half    e_machine;      /* Architecture */
  Elf64_Word    e_version;      /* Object file version */
  Elf64_Addr    e_entry;        /* Entry point virtual address */
  Elf64_Off     e_phoff;        /* Program header table file offset */
  Elf64_Off     e_shoff;        /* Section header table file offset */
  Elf64_Word    e_flags;        /* Processor-specific flags */
  Elf64_Half    e_ehsize;       /* ELF header size in bytes */
  Elf64_Half    e_phentsize;        /* Program header table entry size */
  Elf64_Half    e_phnum;        /* Program header table entry count */
  Elf64_Half    e_shentsize;        /* Section header table entry size */
  Elf64_Half    e_shnum;        /* Section header table entry count */
  Elf64_Half    e_shstrndx;     /* Section header string table index */
} Elf64_Ehdr;

可以使用readelf查看

e_ident

16字节ELF标识,前4字节是ELF文件标识"\x7fELF",不可修改

010editor中解析如下

4-e_ident-010editor

  1. e_ident[EI_CLASS]

    该字节指明了文件类型

    Android系统不检查该字节,通过判断指令集v7a/v8a确定是32或64位

    IDA检查该字节,如果修改了这个字节,IDA就无法反汇编

  2. e_ident[EI_DATA]

    该字节指明了目标文件的数据编码格式(大小端序)

    Android不检查该字节,默认小端序; IDA检查该字节,如果修改该字节则IDA无法正确反汇编

  3. e_ident[EI_VERSION]

    ELF文件头的版本

e_type

2字节,表明目标文件属于哪种类型

Android5.0后,可执行文件全部为so,这个标志只能为03不可修改

1
2
3
4
5
6
7
8
9
10
11
12
/* Legal values for e_type (object file type).  */
 
#define ET_NONE     0       /* No file type */
#define ET_REL      1       /* Relocatable file */
#define ET_EXEC     2       /* Executable file */
#define ET_DYN      3       /* Shared object file */
#define ET_CORE     4       /* Core file */
#define ET_NUM      5       /* Number of defined types */
#define ET_LOOS     0xfe00      /* OS-specific range start */
#define ET_HIOS     0xfeff      /* OS-specific range end */
#define ET_LOPROC   0xff00      /* Processor-specific range start */
#define ET_HIPROC   0xffff      /* Processor-specific range end */

e_machine

2字节,该字段用于指定ELF文件适用的处理器架构,部分定义如下, 对于intel,固定为EM_386

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#define EM_NONE      0  /* No machine */
#define EM_M32       1  /* AT&T WE 32100 */
#define EM_SPARC     2  /* SUN SPARC */
#define EM_386       3  /* Intel 80386 */
#define EM_68K       4  /* Motorola m68k family */
#define EM_88K       5  /* Motorola m88k family */
#define EM_IAMCU     6  /* Intel MCU */
#define EM_860       7  /* Intel 80860 */
#define EM_MIPS      8  /* MIPS R3000 big-endian */
#define EM_S370      9  /* IBM System/370 */
#define EM_MIPS_RS3_LE  10  /* MIPS R3000 little-endian */
                /* reserved 11-14 */
#define EM_PARISC   15  /* HPPA */
                /* reserved 16 */

e_version

4字节,指明目标文件版本

Android不检查该字段,IDA检查,但对反汇编无影响

e_entry

4或8字节,程序入口点(OEP) RVA, 如果e_type=2 即可执行程序, 则该字段为VA; 如果是so,则为0

e_phoff

4或8字节,程序头表偏移FOA,如果没有程序头表则该字段为0

e_shoff

4或8字节,节头表偏移FOA,如果没有节头表则该字段为0

Android对抗中经常会删除节表

e_flags

4字节标志,无用

e_ehsize

2字节,ELF文件头大小

Android不检查,默认ELF Header大小为52字节; IDA检查,修改该字段只会产生警告不影响反汇编

e_phentsize

2字节,表示程序头表每一个表项的大小

e_phnum

2字节,表示程序头表的表项数目

e_shentsize

2字节,节头表表项大小

e_shnum

2字节,节头表表项个数

e_shstrndx

2字节,节头表中与节名表相对应表项的索引

打印文件头

根据枚举值,定义对应的字符串数组以打印相关信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Print ELF Header
char ELF_Class[3][6] = {"NONE", "ELF32", "ELF64"};
char ELF_Data[3][14] = {"NONE", "Little Endian", "Big Endian"};
char objectFileType[7][7] = {"NONE", "REL", "EXEC", "DYN", "CORE", "LOPROC", "HIPROC"};
void printELFHeader32(const Elf32_Ehdr* pElfHeader) {
    printf("ELF Header:\n");
    printf("\tMagic:\t");
    for (int i = 0; i < EI_NIDENT; i++) {
        printf("%02x ", pElfHeader[i].e_ident[i]);
    }
    printf("\n");
    printf("\t%-36s%s\n", "Class:", ELF_Class[pElfHeader->e_ident[EI_CLASS]]);
    printf("\t%-36s%s\n", "Data:", ELF_Data[pElfHeader->e_ident[EI_DATA]]);
    printf("\t%-36s%#x\n", "Version:", pElfHeader->e_version);
    printf("\t%-36s%#x\n", "Machine:", pElfHeader->e_machine);
    printf("\t%-36s%s\n", "Type:", objectFileType[pElfHeader->e_type]);
    printf("\t%-36s%#x\n", "Size Of ELF Header:", pElfHeader->e_ehsize);
    printf("\t%-36s%#x\n", "Entry point:", pElfHeader->e_entry);
    printf("\t%-36s%#x\n", "Start Of Program Headers:", pElfHeader->e_phoff);
    printf("\t%-36s%#x\n", "Start Of Section Headers:", pElfHeader->e_shoff);
    printf("\t%-36s%#x\n", "Size Of Program Headers:", pElfHeader->e_phentsize);
    printf("\t%-36s%#x\n", "Number Of Program Headers:", pElfHeader->e_phnum);
    printf("\t%-36s%#x\n", "Size Of Section Headers:", pElfHeader->e_shentsize);
    printf("\t%-36s%#x\n", "Number Of Sections:", pElfHeader->e_shnum);
    printf("\t%-36s%d\n", "Section Header String Table Index:", pElfHeader->e_shstrndx);
    printf("ELF Header End\n");
}

打印效果如下

Section Header

类似PE文件的节表(IMAGE_SECTION_HEADER)

节表保存了节的基本属性,是ELF文件中除了文件头之外最重要的结构,编译器,链接器和装载器都依赖节表定位和访问各个节的属性

节表数组第0个元素固定为SHN_UNDEF, 节表成员结构定义如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
typedef struct
{
  Elf32_Word    sh_name;        /* Section name (string tbl index) */
  Elf32_Word    sh_type;        /* Section type */
  Elf32_Word    sh_flags;       /* Section flags */
  Elf32_Addr    sh_addr;        /* Section virtual addr at execution */
  Elf32_Off sh_offset;      /* Section file offset */
  Elf32_Word    sh_size;        /* Section size in bytes */
  Elf32_Word    sh_link;        /* Link to another section */
  Elf32_Word    sh_info;        /* Additional section information */
  Elf32_Word    sh_addralign;       /* Section alignment */
  Elf32_Word    sh_entsize;     /* Entry size if section holds table */
} Elf32_Shdr;
 
typedef struct
{
  Elf64_Word    sh_name;        /* Section name (string tbl index) */
  Elf64_Word    sh_type;        /* Section type */
  Elf64_Xword   sh_flags;       /* Section flags */
  Elf64_Addr    sh_addr;        /* Section virtual addr at execution */
  Elf64_Off     sh_offset;      /* Section file offset */
  Elf64_Xword   sh_size;        /* Section size in bytes */
  Elf64_Word    sh_link;        /* Link to another section */
  Elf64_Word    sh_info;        /* Additional section information */
  Elf64_Xword   sh_addralign;       /* Section alignment */
  Elf64_Xword   sh_entsize;     /* Entry size if section holds table */
} Elf64_Shdr;

readelf查看节表

sh_name

4字节,偏移值,通过ELF File Header.e_shstrndx拿到节表中节名称表对应项的索引

然后在节表中找到该项,找到sh_offset的文件偏移 sh_name+sh_offset即为该节名的字符串的FOA

sh_type

4字节,指示节的类型, 定义如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/* Legal values for sh_type (section type).  */
 
#define SHT_NULL      0     /* Section header table entry unused */
#define SHT_PROGBITS      1     /* Program data */
#define SHT_SYMTAB    2     /* Symbol table */
#define SHT_STRTAB    3     /* String table */
#define SHT_RELA      4     /* Relocation entries with addends */
#define SHT_HASH      5     /* Symbol hash table */
#define SHT_DYNAMIC   6     /* Dynamic linking information */
#define SHT_NOTE      7     /* Notes */
#define SHT_NOBITS    8     /* Program space with no data (bss) */
#define SHT_REL       9     /* Relocation entries, no addends */
#define SHT_SHLIB     10        /* Reserved */
#define SHT_DYNSYM    11        /* Dynamic linker symbol table */
#define SHT_INIT_ARRAY    14        /* Array of constructors */
#define SHT_FINI_ARRAY    15        /* Array of destructors */
#define SHT_PREINIT_ARRAY 16        /* Array of pre-constructors */
#define SHT_GROUP     17        /* Section group */
#define SHT_SYMTAB_SHNDX  18        /* Extended section indices */
#define SHT_RELR      19            /* RELR relative relocations */
#define SHT_NUM       20        /* Number of defined types.  */
#define SHT_LOOS      0x60000000    /* Start OS-specific.  */
#define SHT_GNU_ATTRIBUTES 0x6ffffff5   /* Object attributes.  */
#define SHT_GNU_HASH      0x6ffffff6    /* GNU-style hash table.  */
#define SHT_GNU_LIBLIST   0x6ffffff7    /* Prelink library list */
#define SHT_CHECKSUM      0x6ffffff8    /* Checksum for DSO content.  */
#define SHT_LOSUNW    0x6ffffffa    /* Sun-specific low bound.  */
#define SHT_SUNW_move     0x6ffffffa
#define SHT_SUNW_COMDAT   0x6ffffffb
#define SHT_SUNW_syminfo  0x6ffffffc
#define SHT_GNU_verdef    0x6ffffffd    /* Version definition section.  */
#define SHT_GNU_verneed   0x6ffffffe    /* Version needs section.  */
#define SHT_GNU_versym    0x6fffffff    /* Version symbol table.  */
#define SHT_HISUNW    0x6fffffff    /* Sun-specific high bound.  */
#define SHT_HIOS      0x6fffffff    /* End OS-specific type */
#define SHT_LOPROC    0x70000000    /* Start of processor-specific */
#define SHT_HIPROC    0x7fffffff    /* End of processor-specific */
#define SHT_LOUSER    0x80000000    /* Start of application-specific */
#define SHT_HIUSER    0x8fffffff    /* End of application-specific */

比较常见的节类型如下

1
2
3
4
5
6
7
8
SHT_NULL    //无效节
SHT_STRTAB  //本节是字符串表 ELF文件可以有多个字符串表节
SHT_RELA    //重定位节
SHT_HASH    //表明本节包含一张哈希表 目前一个ELF文件最多只能有一张哈希表
SHT_DYNAMIC //表明本节包含动态链接信息 目前一个目标文件最多一个dynamic节
SHT_NOBITS  //表明本节内容为空,不占用实际内存空间
SHT_REL     //重定位节
SHT_DYNSYM  //表明本节是符号表,同SHT_SYMTAB

sh_flags

4字节,由一系列标志bit位组成

  1. SHF_WRITE 表示本节在进程中可写

  2. SHF_ALLOC 表示本节在运行中需要占用内存

    不是所有节都要占用实际内存,部分起控制作用的节在文件映射至内存时不需要占用

  3. SHF_EXECINSTR 表示本节的内容是指令代码

  4. SHF_MASKPROC 被该值覆盖的位都保留做特殊处理器扩展用

sh_addr

4字节,节的内存虚拟地址

sh_offset

4字节,节的FOA

sh_size

4字节,段的大小

sh_link

4字节,索引值

sh_info

4字节,节的附加信息

根据节类型不同,sh_info和sh_link有不同的含义

sh_addralign

4字节,段地址对齐值,假如为0或者1表示该段没有对齐要求; 假如为3表示对齐2^3=8

节的sh_addr必须能被sh_addralign整除,即sh_addr%sh_addralign=0

sh_entsize

4字节,部分节的内容是一张表,每个表项的大小固定(例如符号表), 该字段指定其每个表项的大小

为0则表示不是这些表

打印节表头

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
// Print ELF Section Headers
char *getSectionTypeString(Elf_Word sectionType) {
    switch (sectionType) {
        case SHT_NULL:           return "NULL";
        case SHT_PROGBITS:       return "PROGBITS";
        case SHT_SYMTAB:         return "SYMTAB";
        case SHT_STRTAB:         return "STRTAB";
        case SHT_RELA:           return "RELA";
        case SHT_HASH:           return "HASH";
        case SHT_DYNAMIC:        return "DYNAMIC";
        case SHT_NOTE:           return "NOTE";
        case SHT_NOBITS:         return "NOBITS";
        case SHT_REL:            return "REL";
        case SHT_SHLIB:          return "SHLIB";
        case SHT_DYNSYM:         return "DYNSYM";
        case SHT_INIT_ARRAY:     return "INIT_ARRAY";
        case SHT_FINI_ARRAY:     return "FINI_ARRAY";
        case SHT_PREINIT_ARRAY:  return "PREINIT_ARRAY";
        case SHT_GROUP:          return "GROUP";
        case SHT_SYMTAB_SHNDX:   return "SYMTAB_SHNDX";
        case SHT_RELR:           return "RELR";
        case SHT_NUM:            return "NUM";
        case SHT_LOOS:           return "LOOS";
        case SHT_GNU_ATTRIBUTES: return "GNU_ATTRIBUTES";
        case SHT_GNU_HASH:       return "GNU_HASH";
        case SHT_GNU_LIBLIST:    return "GNU_LIBLIST";
        case SHT_CHECKSUM:       return "CHECKSUM";
        case SHT_LOSUNW:         return "LOSUNW";
        case SHT_SUNW_COMDAT:    return "SUNW_COMDAT";
        case SHT_SUNW_syminfo:   return "SUNW_syminfo";
        case SHT_GNU_verdef:     return "GNU_verdef";
        case SHT_GNU_verneed:    return "GNU_verneed";
        case SHT_GNU_versym:     return "GNU_versym";
        case SHT_LOPROC:         return "LOPROC";
        case SHT_HIPROC:         return "HIPROC";
        case SHT_LOUSER:         return "LOUSER";
        case SHT_HIUSER:         return "HIUSER";
        default:                 return "UNKNOWN";
    }
}
const char* getSectionFlagStr(Elf_Word flags) {
    switch (flags) {
        case SHF_ALLOC:             return "  A";
        case SHF_WRITE:             return "  W";
        case SHF_WRITE | SHF_ALLOC: return " WA";
        case SHF_EXECINSTR:         return "  X";
        case SHF_ALLOC | SHF_EXECINSTR: return " AX";
        case SHF_MASKPROC:          return "MKP";
        default:                    return "   ";
    }
}
void printElfSectionHeader32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,const char* pStringTable) {
    printf("ELF Section Headers:\n");
    printf("\t[Nr] Name\t\t\tType\t\t\tAddr\t\tOffset\t\tSize\t\tEntSize\tFlag\tLink\tInfo\tAlign\n");
    for (int i = 0; i < sectionNum; i++) {
        printf("\t[%2d] %-20s", i, (char *) &pStringTable[pSectionHeader[i].sh_name]);
        printf("\t%-16s", getSectionTypeString(pSectionHeader[i].sh_type));
        printf("\t%08x", pSectionHeader[i].sh_addr);
        printf("\t%08x", pSectionHeader[i].sh_offset);
        printf("\t%08x", pSectionHeader[i].sh_size);
        printf("\t%x", pSectionHeader[i].sh_entsize);
        printf("\t%s", getSectionFlagStr(pSectionHeader[i].sh_flags));
        printf("\t%x", pSectionHeader[i].sh_link);
        printf("\t%x", pSectionHeader[i].sh_info);
        printf("\t%x\n", pSectionHeader[i].sh_addralign);
    }
    printf("ELF Section Headers End\n");
}

打印结果如下

Program Header

程序头表用于描述ELF文件如何映射到内存中,用段(segment)表示

定义如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
typedef struct
{
  Elf32_Word    p_type;         /* Segment type */
  Elf32_Off     p_offset;       /* Segment file offset */
  Elf32_Addr    p_vaddr;        /* Segment virtual address */
  Elf32_Addr    p_paddr;        /* Segment physical address */
  Elf32_Word    p_filesz;       /* Segment size in file */
  Elf32_Word    p_memsz;        /* Segment size in memory */
  Elf32_Word    p_flags;        /* Segment flags */
  Elf32_Word    p_align;        /* Segment alignment */
} Elf32_Phdr;
 
typedef struct
{
  Elf64_Word    p_type;         /* Segment type */
  Elf64_Word    p_flags;        /* Segment flags */
  Elf64_Off     p_offset;       /* Segment file offset */
  Elf64_Addr    p_vaddr;        /* Segment virtual address */
  Elf64_Addr    p_paddr;        /* Segment physical address */
  Elf64_Xword   p_filesz;       /* Segment size in file */
  Elf64_Xword   p_memsz;        /* Segment size in memory */
  Elf64_Xword   p_align;        /* Segment alignment */
} Elf64_Phdr;

p_type

指定了程序头描述的段类型(或如何解析本程序头的信息)

段类型如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/* Legal values for p_type (segment type).  */
 
#define PT_NULL     0       /* Program header table entry unused */
#define PT_LOAD     1       /* Loadable program segment */
#define PT_DYNAMIC  2       /* Dynamic linking information */
#define PT_INTERP   3       /* Program interpreter */
#define PT_NOTE     4       /* Auxiliary information */
#define PT_SHLIB    5       /* Reserved */
#define PT_PHDR     6       /* Entry for header table itself */
#define PT_TLS      7       /* Thread-local storage segment */
#define PT_NUM      8       /* Number of defined types */
#define PT_LOOS         0x60000000  /* Start of OS-specific */
#define PT_GNU_EH_FRAME 0x6474e550  /* GCC .eh_frame_hdr segment */
#define PT_GNU_STACK    0x6474e551  /* Indicates stack executability */
#define PT_GNU_RELRO    0x6474e552  /* Read-only after relocation */
#define PT_GNU_PROPERTY 0x6474e553  /* GNU property */
#define PT_GNU_SFRAME   0x6474e554  /* SFrame segment.  */
#define PT_LOSUNW       0x6ffffffa
#define PT_SUNWBSS      0x6ffffffa  /* Sun Specific segment */
#define PT_SUNWSTACK    0x6ffffffb  /* Stack segment */
#define PT_HISUNW       0x6fffffff
#define PT_HIOS         0x6fffffff  /* End of OS-specific */
#define PT_LOPROC       0x70000000  /* Start of processor-specific */
#define PT_HIPROC       0x7fffffff  /* End of processor-specific */

p_offset

段的文件偏移值

p_vaddr

段的内存虚拟地址

p_paddr

段的内存物理地址, 由于多数现代操作系统的设计不可预知段的物理地址,故该字段多数情况下保留

p_filesz

段的文件大小

p_memsz

段的内存大小

p_flags

段的属性

1
2
3
4
5
6
7
/* Legal values for p_flags (segment flags).  */
 
#define PF_X        (1 << 0)  /* Segment is executable */ //可读
#define PF_W        (1 << 1)  /* Segment is writable */   //可写
#define PF_R        (1 << 2)  /* Segment is readable */   //可执行
#define PF_MASKOS   0x0ff00000  /* OS-specific */           //系统指定
#define PF_MASKPROC 0xf0000000  /* Processor-specific */    //进程指定

p_align

段的内存对齐值

打印段表头

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
// Print ELF Program Headers
const char *getSegmentTypeStr(Elf32_Word segmentType) {
    switch (segmentType) {
        case PT_NULL:return "NULL";
        case PT_LOAD: return "LOAD";
        case PT_DYNAMIC: return "DYNAMIC";
        case PT_INTERP:return "INTERP";
        case PT_NOTE: return "NOTE";
        case PT_SHLIB:return "SHLIB";
        case PT_PHDR: return "PHDR";
        case PT_TLS:return "TLS";
        case PT_NUM: return "PT_NUM";
        case PT_LOOS:return "LOOS";
        case PT_GNU_EH_FRAME: return "GNU_EH_FRAME";
        case PT_GNU_STACK:return "GNU_STACK";
        case PT_GNU_RELRO: return "GNU_RELRO";
        case PT_GNU_PROPERTY: return "GNU_PROPERTY";
        case PT_GNU_SFRAME: return "GNU_SFRAME";
        case PT_SUNWBSS: return "SUNWBSS";
        case PT_SUNWSTACK: return "SUNWSTACK";
        case PT_HIOS: return "HIOS";
        case PT_LOPROC: return "LOPROC";
        case PT_HIPROC: return "HIPROC";
        default: return "UNKNOWN";
    }
}
const char* getSegmentFlagStr(Elf_Word segmentFlags) {
    static char segmentFlagStr[5] = "    ";
    int count = 0;
    if (segmentFlags & PF_R) {
        segmentFlagStr[count++] = 'R';
    }
    if (segmentFlags & PF_W) {
        segmentFlagStr[count++] = 'W';
    }
    if (segmentFlags & PF_X) {
        segmentFlagStr[count++] = 'X';
    }
    return segmentFlagStr;
}
void printElfProgramHeader32(const Elf32_Phdr *pProgramHeader,Elf_Half segmentNum,const uint8_t* pFileBuffer) {
    printf("ELF ProgramHeader:\n");
    printf("\t[Nr] Type\t\tFileOff\t\tVirAddr\t\tPhyAddr\t\tFileSize\tMemSize\t\tFlag\tAlign\n");
    for (int i = 0; i <  segmentNum; i++) {
        printf("\t[%02d] %-16s", i, getSegmentTypeStr(pProgramHeader[i].p_type));
        printf("\t%08x", pProgramHeader[i].p_offset);
        printf("\t%08x", pProgramHeader[i].p_vaddr);
        printf("\t%08x", pProgramHeader[i].p_paddr);
        printf("\t%08x", pProgramHeader[i].p_filesz);
        printf("\t%08x", pProgramHeader[i].p_memsz);
        printf("\t%#4s", getSegmentFlagStr(pProgramHeader[i].p_flags));
        printf("\t%#x\n", pProgramHeader[i].p_align);
        if (pProgramHeader[i].p_type == PT_INTERP) {
            printf("\t\t [Request Program Interpreter Path: %s]\n",(char *) (pFileBuffer + pProgramHeader[i].p_offset));
        }
    }
    printf("ELF ProgramHeader End\n");
}
// print segment mapping
void printSectionToSegmentMapping32(const Elf32_Phdr* pProgramHeader,const Elf32_Shdr* pSectionHeader,Elf_Half segmentNum,Elf_Half sectionNum,const char* pSectionHeaderStringTable) {
    printf("Segtion to Segment Mapping:\n");
    printf("\tSegment\tSections\n");
    //Traverse program headers
    for (int i = 0; i < segmentNum; i++) {
        Elf32_Addr segmentStartAddr = pProgramHeader[i].p_vaddr;
        Elf32_Addr segmentEndAddr = segmentStartAddr + pProgramHeader[i].p_memsz;
        printf("\t%02d\t\t", i);
        //Traverse section headers
        for (int j = 0; j < sectionNum; j++) {
            Elf32_Addr sectionStartAddr = pSectionHeader[j].sh_addr;
            //Check whether the start addr of a section is in the segment addr
            if (sectionStartAddr >= segmentStartAddr && sectionStartAddr < segmentEndAddr) {
                //SHF_ALLOC means need alloc memory, some control sections don't need mapping to memory
                if (pSectionHeader[j].sh_flags & SHF_ALLOC) {
                    printf("%s ",(char *) pSectionHeaderStringTable + pSectionHeader[j].sh_name);
                }
            }
        }
        printf("\n");
    }
}

打印结果如下

特殊节

ELF 文件中有一些特定的节是预定义好的,其内容是指令代码或者控制信息

这些节专门为操作系统使用,对于不同的操作系统,这些节的类型和属性有所不同

节名 作用
.text 代码段
.data 保存已经初始化的全局变量和局部静态变量
.bss 保存未初始化的全局变量和局部静态变量
.rodata 存放只读数据, 例如常量字符串
.comment 编译器版本信息
.debug 调试信息
.dynamic 动态链接信息, linker解析该段以加载elf文件
.hash 符号哈希表 (可查导入和导出符号)
.gnu.hash GNU哈希表 (只可查导出符号,导出表)
.line 调试行号表 即源代码行号与编译后指令的对应表
.note 额外的编译器信息 例如公司名,版本号
.rel.dyn 动态链接重定位表 存放全局变量重定位项
.rel.plt 动态链接函数跳转重定位表 存放plt重定位项
.symtab 符号表
.dynsym 动态链接符号表
.strtab 字符串表
.shstrtab 节名表
.dynstr 动态链接字符串表
.plt 动态链接跳转表
.got 动态链接全局偏移表
.init 程序初始化代码段(节)
.fini 程序结束代码段(节)

String Table

ELF文件中有很多字符串,例如段名,变量名等, 由于字符串长度往往不固定,所以使用固定结构描述比较困难

常见做法是将字符串集中起来存放到一张字符串表,然后通过索引查表来引用字符串

常见的有:

  1. .strtab(字符串表,保存普通字符串)

    遍历section header, 查找type==SHT_STRTAB的即为字符串表 (包括段表字符串表)

  2. .shstrtab(段表字符串表,保存段表用到的字符串)

    获取该表可以通过ELF Header的e_shstrndx成员做索引,查找ELF Section Header Table

    即p_shstrtab=ELFSectionHeaderTable[ELFHeader.e_shstrndx]

打印代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Print String Table
void printStringTable32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,const char* pSectionHeaderStringTable,const uint8_t* pFileBuffer) {
    //Traverse the section header table then find string table
    printf("ELF String Table:\n");
    for (int i = 0; i < sectionNum; i++) {
        //not only just one string table such as .dynstr .strtab
        if (pSectionHeader[i].sh_type == SHT_STRTAB) {
            printf("\t==========String Table %s==========\n",getSectionName(pSectionHeaderStringTable,pSectionHeader[i].sh_name));
            char *pStringTable = (char *) (pFileBuffer + pSectionHeader[i].sh_offset);
            Elf32_Word stringTableSize = pSectionHeader[i].sh_size, pos = 0;
 
            //遍历字符串表, 遇到0时pos+1打印字符串, 非0时继续搜索
            while (pos < stringTableSize) {
                if (pStringTable[pos] == 0) {
                    pos += 1;
                    printf("\t%s\n", pStringTable + pos);
                } else {
                    //find zero
                    while (pStringTable[pos] != 0) {
                        pos++;
                    }
                }
            }
        }
    }
    printf("ELF String Table End\n");
}

Symbol Table

符号表的作用是描述导入和导出符号,这里的符号可以是全局变量,函数,外部引用等

通过符号表和对应的字符串表可以得到符号名,符号大小,符号地址等信息

1
2
3
4
5
.dynsym //动态链接符号表
.symtab //符号表
 
.dynstr //动态链接符号表的字符串表
.strtab //符号表的字符串表

符号表表项结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
typedef struct
{
  Elf32_Word    st_name;        /* Symbol name (string tbl index) */
  Elf32_Addr    st_value;       /* Symbol value */
  Elf32_Word    st_size;        /* Symbol size */
  unsigned char st_info;        /* Symbol type and binding */
  unsigned char st_other;       /* Symbol visibility */
  Elf32_Section st_shndx;       /* Section index */
} Elf32_Sym;
 
typedef struct
{
  Elf64_Word    st_name;        /* Symbol name (string tbl index) */
  unsigned char st_info;        /* Symbol type and binding */
  unsigned char st_other;       /* Symbol visibility */
  Elf64_Section st_shndx;       /* Section index */
  Elf64_Addr    st_value;       /* Symbol value */
  Elf64_Xword   st_size;        /* Symbol size */
} Elf64_Sym;

st_name

符号名, 字符串表的索引下标, 节表的sh_link说明了是在哪个字符串表中

st_value

符号对应的值, 和符号有关, 可能是绝对值,也可能是一个地址, 不同符号的含义不同

st_size

符号大小, 对于包含数据的符号, 是该数据类型的大小

例如一个double型的符号占用8字节,如果该值为0表示符号大小为0或未知

st_info

符号的类型和属性,高4bit标识了符号绑定(symbol binding), 低4bit标识了符号类型(symbol type),组成符号信息(symbol information)

有3个宏分别读取这三个属性值

1
2
3
4
5
/* How to extract and insert information held in the st_info field.  */
 
#define ELF32_ST_BIND(val)      (((unsigned char) (val)) >> 4)
#define ELF32_ST_TYPE(val)      ((val) & 0xf)
#define ELF32_ST_INFO(bind, type)   (((bind) << 4) + ((type) & 0xf))

Symbol Binding

符号绑定的合法属性如下

1
2
3
4
5
6
7
8
9
10
11
/* Legal values for ST_BIND subfield of st_info (symbol binding).  */
 
#define STB_LOCAL   0       /* Local symbol */
#define STB_GLOBAL  1       /* Global symbol */
#define STB_WEAK    2       /* Weak symbol */
#define STB_NUM     3       /* Number of defined types.  */
#define STB_LOOS    10      /* Start of OS-specific */
#define STB_GNU_UNIQUE  10  /* Unique symbol.  */
#define STB_HIOS    12      /* End of OS-specific */
#define STB_LOPROC  13      /* Start of processor-specific */
#define STB_HIPROC  15      /* End of processor-specific */

几个重要属性解释如下:

  1. STB_LOCAL

    该符号是本地符号,只出现在本文件中,在其他文件中无效

    所以在不同文件中可以定义相同的符号名,不会互相影响

  2. STB_GLOBAL

    该符号是全局符号,当有多个文件被链接在一起时,在所有文件中该符号都是可见的

    所以在一个文件中定义的全局符号,一定是在其他文件中需要被引用,否则无需定义为全局

  3. STB_WEAK

    弱符号,类似于全局符号,但优先级比global更低

  4. STB_LOPROC~STB_HIPROC

    为特殊处理器保留

Symbol Type

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/* Legal values for ST_TYPE subfield of st_info (symbol type).  */
 
#define STT_NOTYPE  0       /* Symbol type is unspecified */
#define STT_OBJECT  1       /* Symbol is a data object */
#define STT_FUNC    2       /* Symbol is a code object */
#define STT_SECTION 3       /* Symbol associated with a section */
#define STT_FILE    4       /* Symbol's name is file name */
#define STT_COMMON  5       /* Symbol is a common data object */
#define STT_TLS     6       /* Symbol is thread-local data object*/
#define STT_NUM     7       /* Number of defined types.  */
#define STT_LOOS    10      /* Start of OS-specific */
#define STT_GNU_IFUNC   10  /* Symbol is indirect code object */
#define STT_HIOS    12      /* End of OS-specific */
#define STT_LOPROC  13      /* Start of processor-specific */
#define STT_HIPROC  15      /* End of processor-specific */

几个重要符号解析如下

  1. STT_NOTYPE

    该符号类型未指定

  2. STT_OBJECT

    该符号是一个数据对象,例如变量,数组等

  3. STT_FUNC

    该符号是一个函数,或者其他的可执行代码

  4. STT_SECTION

    该符号和一个节相关联,用于重定位,通常具有STB_LOCAL属性

  5. STT_FILE

    该符号是一个文件符号,具有STB_LOCAL属性

  6. STT_LOPROC~STT_HIPROC

    为特殊处理器保留

st_other

低2位保存了符号可见性

st_shndx

符号所在的段

打印符号表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// Print Symbol Table
const char *getSymbolBindingString(uint8_t symbolBinding) {
    switch (symbolBinding) {
        case STB_LOCAL:        return "LOCAL";
        case STB_GLOBAL:       return "GLOBAL";
        case STB_WEAK:         return "WEAK";
        case STB_NUM:          return "STB_NUM";
        case STB_GNU_UNIQUE:   return "GNU_UNIQUE";
        case STB_HIOS:         return "STB_HIOS";
        case STB_LOPROC:       return "STB_LOPROC";
        case STB_HIPROC:       return "STB_HIPROC";
        default:               return "UNKNOWN";
    }
}
const char *getSymbolTypeString(uint8_t symbolType) {
    switch (symbolType) {
        case STT_NOTYPE:    return "NOTYPE";
        case STT_OBJECT:    return "OBJECT";
        case STT_FUNC:      return "FUNC";
        case STT_SECTION:   return "SECTION";
        case STT_FILE:      return "FILE";
        case STT_COMMON:    return "COMMON";
        case STT_TLS:       return "TLS";
        case STT_NUM:       return "STT_NUM";
        case STT_GNU_IFUNC: return "GNU_IFUNC";
        case STT_HIOS:      return "HIOS";
        case STT_LOPROC:    return "LOPROC";
        case STT_HIPROC:    return "HIPROC";
        default:            return "UNKNOWN";
    }
}
const char *getSymbolVisibility(uint8_t st_other) {
    unsigned char visibility = st_other & 0x03;
    switch (visibility) {
        case 0:            return "DEFAULT";
        case 1:            return "INTERNAL";
        case 2:            return "HIDDEN";
        case 3:            return "PROTECTED";
        default:           return "UNKNOWN";
    }
}
 
void printSymbolTable32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,const char* pSectionHeaderStringTable,const uint8_t* pFileBuffer) {
    printf("ELF Symbol Tables:\n");
    for (int i = 0; i < sectionNum; i++) {
        //全局静态符号表和动态符号表
        if (pSectionHeader[i].sh_type == SHT_SYMTAB || pSectionHeader[i].sh_type == SHT_DYNSYM) {
            Elf32_Word symbolNum = pSectionHeader[i].sh_size / pSectionHeader[i].sh_entsize;
            //获取符号表对应的字符串表,全局静态符号和动态符号表对应字符串表可能不同 sh_link is index of string table, fileBuffer+offset is real string table
            char* pSymbolNameTable =(char*) pFileBuffer + pSectionHeader[pSectionHeader[i].sh_link].sh_offset;
            printf("\tSymbol Table '%s' contains %#x entries:\n",(char*)getSectionName(pSectionHeaderStringTable,pSectionHeader[i].sh_name), symbolNum);
            printf("\tNum \tValue\t\tSize\t\tType\t\tBind\t\tVisible\t\tIndex\t\tName\n");
            Elf32_Sym *pSymbolTable = (Elf32_Sym *) (pFileBuffer + pSectionHeader[i].sh_offset);
            for (int j = 0; j < symbolNum; j++) {
                printf("\t%04d", j);
                printf("\t%08x", pSymbolTable[j].st_value);
                printf("\t%08x", pSymbolTable[j].st_size);
                //symbol type and binding
                printf("\t%s\t", getSymbolTypeString(ELF32_ST_TYPE(pSymbolTable[j].st_info)));
                printf("\t%s\t", getSymbolBindingString(ELF32_ST_BIND(pSymbolTable[j].st_info)));
                printf("\t%-10s", getSymbolVisibility(pSymbolTable[j].st_other));
                if (pSymbolTable[j].st_shndx == SHN_UNDEF) {
                    printf("\t%4s\t", "UDEF");
                } else if (pSymbolTable[j].st_shndx == SHN_ABS) {
                    printf("\t%4s\t", "ABS");
                } else {
                    printf("\t%04x\t", pSymbolTable[j].st_shndx);
                }
                printf("\t%s\n", pSymbolNameTable + pSymbolTable[j].st_name);
            }
            printf("\n");
        }
    }
}

Relocation Table

一般有两张重定位表:

  1. .rel.plt 修复外部函数地址

  2. .rel.dyn 修复全局变量地址

重定位表有SHT_REL, SHT_RELA, SHT_RELR三种类型,对应表项定义如下

注: Intel x86架构只使用REL重定位项, x64架构似乎只使用RELA重定位项, 在后续修复重定位表可以得知

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/* Relocation table entry without addend (in section of type SHT_REL).  */
 
typedef struct
{
  Elf32_Addr    r_offset;       /* Address */
  Elf32_Word    r_info;         /* Relocation type and symbol index */
} Elf32_Rel;
 
typedef struct
{
  Elf64_Addr    r_offset;       /* Address */
  Elf64_Xword   r_info;         /* Relocation type and symbol index */
} Elf64_Rel;
 
/* Relocation table entry with addend (in section of type SHT_RELA).  */
 
typedef struct
{
  Elf32_Addr    r_offset;       /* Address */
  Elf32_Word    r_info;         /* Relocation type and symbol index */
  Elf32_Sword   r_addend;       /* Addend */
} Elf32_Rela;
 
typedef struct
{
  Elf64_Addr    r_offset;       /* Address */
  Elf64_Xword   r_info;         /* Relocation type and symbol index */
  Elf64_Sxword  r_addend;       /* Addend */
} Elf64_Rela;
 
/* RELR relocation table entry */
 
typedef Elf32_Word  Elf32_Relr;
typedef Elf64_Xword Elf64_Relr;

r_offset

重定位的位置

对于重定位文件而言,该值是待重定位单元在节中的偏移量

对于可执行文件或链接库文件而言,该值是待重定位单元的虚拟地址

r_info

给出了待重定位单元的符号表索引和重定位类型

获取信息的宏

SYM获取高24/32位, 是符号表索引, 指明符号

TYPE获取低8/32位, 是重定位类型

1
2
3
4
5
6
7
8
9
/* How to extract and insert information held in the r_info field.  */
 
#define ELF32_R_SYM(val)        ((val) >> 8)
#define ELF32_R_TYPE(val)       ((val) & 0xff)
#define ELF32_R_INFO(sym, type)     (((sym) << 8) + ((type) & 0xff))
 
#define ELF64_R_SYM(i)          ((i) >> 32)
#define ELF64_R_TYPE(i)         ((i) & 0xffffffff)
#define ELF64_R_INFO(sym,type)      ((((Elf64_Xword) (sym)) << 32) + (type))

r_addend

指定加数,用于计算需要重定位的域的值

Rela使用该字段显式地指出加数,Rel的加数隐含在被修改的位置中

一个重定位节(Relocation Section)需要引用另外两个节: 符号表和待修复节

重定位节节头的sh_info和sh_link分别指明了引用关系

不同目标文件中,重定位项的r_offset成员含义略有不同

  1. 重定位文件

    r_offset指向待修改节的重定位单元偏移地址

  2. 可执行文件/共享目标文件

    r_offset指向待修改单元的虚拟地址

重定位类型

重定位项用于描述如何修改以下的指令和数据域(被重定位域)

定义以下几种运算符号便于描述

常见重定位类型如下

R_386_GOT_DAT

将指定的符号地址设置为一个GOT表项

修复方法: elf加载后, 填入符号对应真实地址

R_386_JMP_SLOT

用于动态链接的PLT表项

修复方法: elf加载后, 修改跳转地址为符号地址

R_386_RELATIVE

相对偏移地址重定位

修复方法: 将offset指出的位置解引用,加上elf加载的基地址

全部的intel x86架构重定位类型如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
/* Intel 80386 specific definitions.  */
 
/* i386 relocs.  */
 
#define R_386_NONE     0        /* No reloc */
#define R_386_32       1        /* Direct 32 bit  */
#define R_386_PC32     2        /* PC relative 32 bit */
#define R_386_GOT32    3        /* 32 bit GOT entry */
#define R_386_PLT32    4        /* 32 bit PLT address */
#define R_386_COPY     5        /* Copy symbol at runtime */
#define R_386_GLOB_DAT     6        /* Create GOT entry */
#define R_386_JMP_SLOT     7        /* Create PLT entry */
#define R_386_RELATIVE     8        /* Adjust by program base */
#define R_386_GOTOFF       9        /* 32 bit offset to GOT */
#define R_386_GOTPC    10       /* 32 bit PC relative offset to GOT */
#define R_386_32PLT    11
#define R_386_TLS_TPOFF    14       /* Offset in static TLS block */
#define R_386_TLS_IE       15       /* Address of GOT entry for static TLS
                       block offset */
#define R_386_TLS_GOTIE    16       /* GOT entry for static TLS block
                       offset */
#define R_386_TLS_LE       17       /* Offset relative to static TLS
                       block */
#define R_386_TLS_GD       18       /* Direct 32 bit for GNU version of
                       general dynamic thread local data */
#define R_386_TLS_LDM      19       /* Direct 32 bit for GNU version of
                       local dynamic thread local data
                       in LE code */
#define R_386_16       20
#define R_386_PC16     21
#define R_386_8        22
#define R_386_PC8      23
#define R_386_TLS_GD_32    24       /* Direct 32 bit for general dynamic
                       thread local data */
#define R_386_TLS_GD_PUSH  25       /* Tag for pushl in GD TLS code */
#define R_386_TLS_GD_CALL  26       /* Relocation for call to
                       __tls_get_addr() */
#define R_386_TLS_GD_POP   27       /* Tag for popl in GD TLS code */
#define R_386_TLS_LDM_32   28       /* Direct 32 bit for local dynamic
                       thread local data in LE code */
#define R_386_TLS_LDM_PUSH 29       /* Tag for pushl in LDM TLS code */
#define R_386_TLS_LDM_CALL 30       /* Relocation for call to
                       __tls_get_addr() in LDM code */
#define R_386_TLS_LDM_POP  31       /* Tag for popl in LDM TLS code */
#define R_386_TLS_LDO_32   32       /* Offset relative to TLS block */
#define R_386_TLS_IE_32    33       /* GOT entry for negated static TLS
                       block offset */
#define R_386_TLS_LE_32    34       /* Negated offset relative to static
                       TLS block */
#define R_386_TLS_DTPMOD32 35       /* ID of module containing symbol */
#define R_386_TLS_DTPOFF32 36       /* Offset in TLS block */
#define R_386_TLS_TPOFF32  37       /* Negated offset in static TLS block */
#define R_386_SIZE32       38       /* 32-bit symbol size */
#define R_386_TLS_GOTDESC  39       /* GOT offset for TLS descriptor.  */
#define R_386_TLS_DESC_CALL 40      /* Marker of call through TLS
                       descriptor for
                       relaxation.  */
#define R_386_TLS_DESC     41       /* TLS descriptor containing
                       pointer to code and to
                       argument, returning the TLS
                       offset for the symbol.  */
#define R_386_IRELATIVE    42       /* Adjust indirectly by program base */
#define R_386_GOT32X       43       /* Load from 32 bit GOT entry,
                       relaxable. */
/* Keep this the last entry.  */
#define R_386_NUM      44

x64重定位类型定义如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/* AMD x86-64 relocations.  */
#define R_X86_64_NONE       0   /* No reloc */
#define R_X86_64_64     1   /* Direct 64 bit  */
#define R_X86_64_PC32       2   /* PC relative 32 bit signed */
#define R_X86_64_GOT32      3   /* 32 bit GOT entry */
#define R_X86_64_PLT32      4   /* 32 bit PLT address */
#define R_X86_64_COPY       5   /* Copy symbol at runtime */
#define R_X86_64_GLOB_DAT   6   /* Create GOT entry */
#define R_X86_64_JUMP_SLOT  7   /* Create PLT entry */
#define R_X86_64_RELATIVE   8   /* Adjust by program base */
#define R_X86_64_GOTPCREL   9   /* 32 bit signed PC relative
                       offset to GOT */
#define R_X86_64_32     10  /* Direct 32 bit zero extended */
#define R_X86_64_32S        11  /* Direct 32 bit sign extended */
#define R_X86_64_16     12  /* Direct 16 bit zero extended */
#define R_X86_64_PC16       13  /* 16 bit sign extended pc relative */
#define R_X86_64_8      14  /* Direct 8 bit sign extended  */
#define R_X86_64_PC8        15  /* 8 bit sign extended pc relative */
#define R_X86_64_DTPMOD64   16  /* ID of module containing symbol */
#define R_X86_64_DTPOFF64   17  /* Offset in module's TLS block */
#define R_X86_64_TPOFF64    18  /* Offset in initial TLS block */
#define R_X86_64_TLSGD      19  /* 32 bit signed PC relative offset
                       to two GOT entries for GD symbol */
#define R_X86_64_TLSLD      20  /* 32 bit signed PC relative offset
                       to two GOT entries for LD symbol */
#define R_X86_64_DTPOFF32   21  /* Offset in TLS block */
#define R_X86_64_GOTTPOFF   22  /* 32 bit signed PC relative offset
                       to GOT entry for IE symbol */
#define R_X86_64_TPOFF32    23  /* Offset in initial TLS block */
#define R_X86_64_PC64       24  /* PC relative 64 bit */
#define R_X86_64_GOTOFF64   25  /* 64 bit offset to GOT */
#define R_X86_64_GOTPC32    26  /* 32 bit signed pc relative
                       offset to GOT */
#define R_X86_64_GOT64      27  /* 64-bit GOT entry offset */
#define R_X86_64_GOTPCREL64 28  /* 64-bit PC relative offset
                       to GOT entry */
#define R_X86_64_GOTPC64    29  /* 64-bit PC relative offset to GOT */
#define R_X86_64_GOTPLT64   30  /* like GOT64, says PLT entry needed */
#define R_X86_64_PLTOFF64   31  /* 64-bit GOT relative offset
                       to PLT entry */
#define R_X86_64_SIZE32     32  /* Size of symbol plus 32-bit addend */
#define R_X86_64_SIZE64     33  /* Size of symbol plus 64-bit addend */
#define R_X86_64_GOTPC32_TLSDESC 34 /* GOT offset for TLS descriptor.  */
#define R_X86_64_TLSDESC_CALL   35  /* Marker for call through TLS
                       descriptor.  */
#define R_X86_64_TLSDESC        36  /* TLS descriptor.  */
#define R_X86_64_IRELATIVE  37  /* Adjust indirectly by program base */
#define R_X86_64_RELATIVE64 38  /* 64-bit adjust by program base */
                    /* 39 Reserved was R_X86_64_PC32_BND */
                    /* 40 Reserved was R_X86_64_PLT32_BND */
#define R_X86_64_GOTPCRELX  41  /* Load from 32 bit signed pc relative
                       offset to GOT entry without REX
                       prefix, relaxable.  */
#define R_X86_64_REX_GOTPCRELX  42  /* Load from 32 bit signed pc relative
                       offset to GOT entry with REX prefix,
                       relaxable.  */
#define R_X86_64_NUM        43

打印重定位表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// Print Relocation Table
const char *getRelocationTypeString32(Elf_Word value) {
    switch (value) {
        case R_386_NONE: return "R_386_NONE";
        case 1: return "R_386_32";
        case 2: return "R_386_PC32";
        case 3: return "R_386_GOT32";
        case 4: return "R_386_PLT32";
        case 5: return "R_386_COPY";
        case 6: return "R_386_GLOB_DAT";
        case 7: return "R_386_JMP_SLOT";
        case 8: return "R_386_RELATIVE";
        case 9: return "R_386_GOTOFF";
        case 10: return "R_386_GOTPC";
        case 11: return "R_386_32PLT";
        case 14: return "R_386_TLS_TPOFF";
        case 15: return "R_386_TLS_IE";
        case 16: return "R_386_TLS_GOTIE";
        case 17: return "R_386_TLS_LE";
        case 18: return "R_386_TLS_GD";
        case 19: return "R_386_TLS_LDM";
        case 20: return "R_386_16";
        case 21: return "R_386_PC16";
        case 22: return "R_386_8";
        case 23: return "R_386_PC8";
        case 24: return "R_386_TLS_GD_32";
        case 25: return "R_386_TLS_GD_PUSH";
        case 26: return "R_386_TLS_GD_CALL";
        case 27: return "R_386_TLS_GD_POP";
        case 28: return "R_386_TLS_LDM_32";
        case 29: return "R_386_TLS_LDM_PUSH";
        case 30: return "R_386_TLS_LDM_CALL";
        case 31: return "R_386_TLS_LDM_POP";
        case 32: return "R_386_TLS_LDO_32";
        case 33: return "R_386_TLS_IE_32";
        case 34: return "R_386_TLS_LE_32";
        case 35: return "R_386_TLS_DTPMOD32";
        case 36: return "R_386_TLS_DTPOFF32";
        case 37: return "R_386_TLS_TPOFF32";
        case 38: return "R_386_SIZE32";
        case 39: return "R_386_TLS_GOTDESC";
        case 40: return "R_386_TLS_DESC_CALL";
        case 41: return "R_386_TLS_DESC";
        case 42: return "R_386_IRELATIVE";
        case 43: return "R_386_GOT32X";
        default: return "Unknown relocation type";
    }
}
void printRelocationTable32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,uint8_t* pFileBuffer,const char* pSectionHeaderStringTable) {
    printf("Relocation Tables:\n");
    for (int i = 0; i < sectionNum; i++) {
        if (pSectionHeader[i].sh_type == SHT_REL) {
            Elf32_Shdr *pRelocationTableHeader = &pSectionHeader[i];
            Elf32_Rel *pRelocationTable = (Elf32_Rel *) (pFileBuffer + pRelocationTableHeader->sh_offset);
            Elf32_Word relocItemNum = pRelocationTableHeader->sh_size / pRelocationTableHeader->sh_entsize;
            // relocation table sh_link is index of symbol table header
            Elf32_Shdr *pSymbolTableHeader = (Elf32_Shdr *) &pSectionHeader[pSectionHeader[i].sh_link];
            //real symbol table
            Elf32_Sym *pSymbolTable = (Elf32_Sym *) (pFileBuffer + pSymbolTableHeader->sh_offset);
            //string table for symbol name
            char *pSymbolTableStringTable = (char *) pFileBuffer + pSectionHeader[pSymbolTableHeader->sh_link].sh_offset;
 
            printf("Relocation Section '%s' at offset contains %d entries\n",(char*) pSectionHeaderStringTable + pSectionHeader[i].sh_name, relocItemNum);
            printf("\tOffset\t\tInfo\t\tType\t\t\t\tSym.value\t\tSym.name\n");
            for (int j = 0; j < relocItemNum; j++) {
                printf("\t%08x", pRelocationTable[j].r_offset);
                printf("\t%08x", pRelocationTable[j].r_info);
                printf("\t%s\t", getRelocationTypeString32(ELF32_R_TYPE(pRelocationTable[j].r_info)));
                printf("\t%08x\t", pSymbolTable[ELF32_R_SYM(pRelocationTable[j].r_info)].st_value);
                //R_SYM get the index of symbol in symbol table, st_name is index of symbol name in string table
                printf("\t%s", &pSymbolTableStringTable[pSymbolTable[ELF32_R_SYM(pRelocationTable[j].r_info)].st_name]);
                printf("\n");
            }
        }
    }
}

修复重定位表

r_offset指定了待修复的地址,这是一个RVA, 需要将该地址存储的数据加上elf文件加载的基地址

例如readelf读取的重定位表信息如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Relocation section '.rel.dyn' at offset 0x384 contains 8 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00003ee8  00000008 R_386_RELATIVE  
00003eec  00000008 R_386_RELATIVE  
00003fec  00000008 R_386_RELATIVE  
0000400c  00000008 R_386_RELATIVE  
00003fe0  00000206 R_386_GLOB_DAT    00000000   _ITM_deregisterTM[...]
00003fe4  00000306 R_386_GLOB_DAT    00000000   __cxa_finalize@GLIBC_2.1.3
00003fe8  00000506 R_386_GLOB_DAT    00000000   __gmon_start__
00003ff0  00000606 R_386_GLOB_DAT    00000000   _ITM_registerTMCl[...]
 
Relocation section '.rel.plt' at offset 0x3c4 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00004000  00000107 R_386_JUMP_SLOT   00000000   __libc_start_main@GLIBC_2.34
00004004  00000407 R_386_JUMP_SLOT   00000000   puts@GLIBC_2.0
No processor specific unwind information to decode

3ee8和3eec分别在init_array和fini_array段,均为RELATIVE类型重定位项

3fec, 3fe0,3fe4,3fe8,3ff0是GOT表项, 其中3fec (main_ptr) 是RELATIVE类型,其他均为GLOB_DAT类型

表项填充的函数为虚拟extern段中函数的地址,该段在内存中实际不存在

4000,4004是plt表项, 均为JUMP_SLOT类型, 400c是dso_handle, 为RELATIVE类型

got.plt表填充的也是外部函数地址,在虚拟extern段

在elf文件末尾,ida自动追加extern段(该段在内存中不存在,仅供分析)

综上所述,重定位有以下情况:

  1. 将待重定位地址处的内容解引用并加上elf加载的基地址即可

    这种情况是针对elf文件内部变量绝对地址引用需要修复

    例如RELATIVE类型

  2. 加载动态库,写入外部函数地址

    针对外部引用地址修复

    例如GLOB_DAT和JUMP_SLOT类型

Dynamic Segment

如果目标文件参与动态链接,必定包含一个类型为 PT_DYNAMIC 的Program表项, 对应节名为 .dynamic (type=SHT_DYNAMIC)

动态段的作用是提供动态链接器所需要的信息,比如依赖哪些共享库文件,动态链接符号表的位置,动态链接重定位表的位置等

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/* Dynamic section entry.  */
typedef struct
{
  Elf32_Sword   d_tag;          /* Dynamic entry type */
  union
    {
      Elf32_Word d_val;         /* Integer value */
      Elf32_Addr d_ptr;         /* Address value */
    } d_un;
} Elf32_Dyn;
 
typedef struct
{
  Elf64_Sxword  d_tag;          /* Dynamic entry type */
  union
    {
      Elf64_Xword d_val;        /* Integer value */
      Elf64_Addr d_ptr;         /* Address value */
    } d_un;
} Elf64_Dyn;

d_tag

d_tag决定了如何对d_un解析

合法的d_tag值定义如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
/* Legal values for d_tag (dynamic entry type).  */
 
#define DT_NULL     0       /* Marks end of dynamic section */
#define DT_NEEDED   1       /* Name of needed library */
#define DT_PLTRELSZ 2       /* Size in bytes of PLT relocs */
#define DT_PLTGOT   3       /* Processor defined value */
#define DT_HASH     4       /* Address of symbol hash table */
#define DT_STRTAB   5       /* Address of string table */
#define DT_SYMTAB   6       /* Address of symbol table */
#define DT_RELA     7       /* Address of Rela relocs */
#define DT_RELASZ   8       /* Total size of Rela relocs */
#define DT_RELAENT  9       /* Size of one Rela reloc */
#define DT_STRSZ    10      /* Size of string table */
#define DT_SYMENT   11      /* Size of one symbol table entry */
#define DT_INIT     12      /* Address of init function */
#define DT_FINI     13      /* Address of termination function */
#define DT_SONAME   14      /* Name of shared object */
#define DT_RPATH    15      /* Library search path (deprecated) */
#define DT_SYMBOLIC 16      /* Start symbol search here */
#define DT_REL      17      /* Address of Rel relocs */
#define DT_RELSZ    18      /* Total size of Rel relocs */
#define DT_RELENT   19      /* Size of one Rel reloc */
#define DT_PLTREL   20      /* Type of reloc in PLT */
#define DT_DEBUG    21      /* For debugging; unspecified */
#define DT_TEXTREL  22      /* Reloc might modify .text */
#define DT_JMPREL   23      /* Address of PLT relocs */
#define DT_BIND_NOW 24      /* Process relocations of object */
#define DT_INIT_ARRAY   25      /* Array with addresses of init fct */
#define DT_FINI_ARRAY   26      /* Array with addresses of fini fct */
#define DT_INIT_ARRAYSZ 27      /* Size in bytes of DT_INIT_ARRAY */
#define DT_FINI_ARRAYSZ 28      /* Size in bytes of DT_FINI_ARRAY */
#define DT_RUNPATH  29      /* Library search path */
#define DT_FLAGS    30      /* Flags for the object being loaded */
#define DT_ENCODING 32      /* Start of encoded range */
#define DT_PREINIT_ARRAY 32     /* Array with addresses of preinit fct*/
#define DT_PREINIT_ARRAYSZ 33       /* size in bytes of DT_PREINIT_ARRAY */
#define DT_SYMTAB_SHNDX 34      /* Address of SYMTAB_SHNDX section */
#define DT_RELRSZ   35      /* Total size of RELR relative relocations */
#define DT_RELR     36      /* Address of RELR relative relocations */
#define DT_RELRENT  37      /* Size of one RELR relative relocaction */
#define DT_NUM      38      /* Number used */
#define DT_LOOS     0x6000000d  /* Start of OS-specific */
#define DT_HIOS     0x6ffff000  /* End of OS-specific */
#define DT_LOPROC   0x70000000  /* Start of processor-specific */
#define DT_HIPROC   0x7fffffff  /* End of processor-specific */
#define DT_PROCNUM  DT_MIPS_NUM /* Most used by any processor */
 
/* DT_* entries which fall between DT_VALRNGHI & DT_VALRNGLO use the
   Dyn.d_un.d_val field of the Elf*_Dyn structure.  This follows Sun's
   approach.  */
#define DT_VALRNGLO 0x6ffffd00
#define DT_GNU_PRELINKED 0x6ffffdf5 /* Prelinking timestamp */
#define DT_GNU_CONFLICTSZ 0x6ffffdf6    /* Size of conflict section */
#define DT_GNU_LIBLISTSZ 0x6ffffdf7 /* Size of library list */
#define DT_CHECKSUM 0x6ffffdf8
#define DT_PLTPADSZ 0x6ffffdf9
#define DT_MOVEENT  0x6ffffdfa
#define DT_MOVESZ   0x6ffffdfb
#define DT_FEATURE_1    0x6ffffdfc  /* Feature selection (DTF_*).  */
#define DT_POSFLAG_1    0x6ffffdfd  /* Flags for DT_* entries, effecting
                       the following DT_* entry.  */
#define DT_SYMINSZ  0x6ffffdfe  /* Size of syminfo table (in bytes) */
#define DT_SYMINENT 0x6ffffdff  /* Entry size of syminfo */
#define DT_VALRNGHI 0x6ffffdff
#define DT_VALTAGIDX(tag)   (DT_VALRNGHI - (tag))   /* Reverse order! */
#define DT_VALNUM 12
 
/* DT_* entries which fall between DT_ADDRRNGHI & DT_ADDRRNGLO use the
   Dyn.d_un.d_ptr field of the Elf*_Dyn structure.
 
   If any adjustment is made to the ELF object after it has been
   built these entries will need to be adjusted.  */
#define DT_ADDRRNGLO    0x6ffffe00
#define DT_GNU_HASH 0x6ffffef5  /* GNU-style hash table.  */
#define DT_TLSDESC_PLT  0x6ffffef6
#define DT_TLSDESC_GOT  0x6ffffef7
#define DT_GNU_CONFLICT 0x6ffffef8  /* Start of conflict section */
#define DT_GNU_LIBLIST  0x6ffffef9  /* Library list */
#define DT_CONFIG   0x6ffffefa  /* Configuration information.  */
#define DT_DEPAUDIT 0x6ffffefb  /* Dependency auditing.  */
#define DT_AUDIT    0x6ffffefc  /* Object auditing.  */
#define DT_PLTPAD   0x6ffffefd  /* PLT padding.  */
#define DT_MOVETAB  0x6ffffefe  /* Move table.  */
#define DT_SYMINFO  0x6ffffeff  /* Syminfo table.  */
#define DT_ADDRRNGHI    0x6ffffeff
#define DT_ADDRTAGIDX(tag)  (DT_ADDRRNGHI - (tag))  /* Reverse order! */
#define DT_ADDRNUM 11
 
/* The versioning entry types.  The next are defined as part of the GNU extension.  */
#define DT_VERSYM   0x6ffffff0
 
#define DT_RELACOUNT    0x6ffffff9
#define DT_RELCOUNT 0x6ffffffa
 
/* These were chosen by Sun.  */
#define DT_FLAGS_1  0x6ffffffb  /* State flags, see DF_1_* below.  */
#define DT_VERDEF   0x6ffffffc  /* Address of version definition table */
#define DT_VERDEFNUM    0x6ffffffd  /* Number of version definitions */
#define DT_VERNEED  0x6ffffffe  /* Address of table with needed versions */
#define DT_VERNEEDNUM   0x6fffffff  /* Number of needed versions */
#define DT_VERSIONTAGIDX(tag)   (DT_VERNEEDNUM - (tag)) /* Reverse order! */
#define DT_VERSIONTAGNUM 16
 
/* Sun added these machine-independent extensions in the "processor-specific"
   range.  Be compatible.  */
#define DT_AUXILIARY    0x7ffffffd      /* Shared object to load before self */
#define DT_FILTER       0x7fffffff      /* Shared object to get values from */
#define DT_EXTRATAGIDX(tag) ((Elf32_Word)-((Elf32_Sword) (tag) <<1>>1)-1)
#define DT_EXTRANUM 3

DT_NEEDED

该tag对应的即为elf文件依赖的动态库文件,使用d_val解析后得到索引值

通过索引查找.dynstr即可得到链接库名

动态段的sh_link字段是指向动态链接字符串表的索引值

另外通过d_tag==DT_STRTAB解析对应的d_val可以得到.dynstr的文件偏移值

d_un

d_val 代表整数值

d_ptr 代表进程空间的虚拟地址

解析规则如下

名称 d_un 可执行文件 共享目标文件
DT_NULL 0 忽略 必需 必需
DT_NEEDED 1 d_val 可选 可选
DT_PLTRELSZ 2 d_val 可选 可选
DT_PLTGOT 3 d_ptr 可选 可选
DT_HASH 4 d_ptr 必需 必需
DT_STRTAB 5 d_ptr 必需 必需
DT_SYMTAB 6 d_ptr 必需 必需
DT_RELA 7 d_ptr 必需 可选
DT_RELASZ 8 d_val 必需 可选
DT_RELAENT 9 d_val 必需 可选
DT_STRSZ 10 d_val 必需 必需
DT_SYMENT 11 d_val 必需 必需
DT_INIT 12 d_ptr 可选 可选
DT_FINI 13 d_ptr 可选 可选
DT_SONAME 14 d_val 忽略 可选
DT_RPATH 15 d_val 可选 忽略
DT_SYMBOLIC 16 忽略 忽略 可选
DT_REL 17 d_ptr 必需 可选
DT_RELSZ 18 d_val 必需 可选
DT_RELENT 19 d_val 必需 可选
DT_PLTREL 20 d_val 可选 可选
DT_DEBUG 21 d_ptr 可选 忽略
DT_TEXTREL 22 忽略 可选 可选
DT_JMPREL 23 d_ptr 可选 可选
DT_BIND_NOW 24 忽略 可选 可选
DT_LOPROC 0x70000000 未定义 未定义 未定义
DT_HIPROC 0x7fffffff 未定义 未定义 未定义

打印动态段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
// Print Dynamic Segment
#define DT_VAL 0
#define DT_PTR 1
const char *getDynamicType(Elf_Xword value) {
    if (value >= DT_LOOS && value <= DT_HIOS)
        return "OS-Specific";
    if (value >= DT_LOPROC && value <= DT_HIPROC)
        return "Processor-Specific";
    switch (value) {
        case DT_NULL: return "NULL";
        case DT_NEEDED: return "NEEDED";
        case DT_PLTRELSZ: return "PLTRELSZ";
        case DT_PLTGOT: return "PLTGOT";
        case DT_HASH: return "HASH";
        case DT_STRTAB: return "STRTAB";
        case DT_SYMTAB: return "SYMTAB";
        case DT_RELA: return "RELA";
        case DT_RELASZ: return "RELASZ";
        case DT_RELAENT: return "RELAENT";
        case DT_STRSZ: return "STRSZ";
        case DT_SYMENT: return "SYMENT";
        case DT_INIT: return "INIT";
        case DT_FINI: return "FINI";
        case DT_SONAME: return "SONAME";
        case DT_RPATH: return "RPATH";
        case DT_SYMBOLIC: return "SYMBOLIC";
        case DT_REL: return "REL";
        case DT_RELSZ: return "RELSZ";
        case DT_RELENT: return "RELENT";
        case DT_PLTREL: return "PLTREL";
        case DT_DEBUG: return "DEBUG";
        case DT_TEXTREL: return "TEXTREL";
        case DT_JMPREL: return "JMPREL";
        case DT_BIND_NOW: return "BIND_NOW";
        case DT_INIT_ARRAY: return "INIT_ARRAY";
        case DT_FINI_ARRAY: return "FINI_ARRAY";
        case DT_INIT_ARRAYSZ: return "INIT_ARRAYSZ";
        case DT_FINI_ARRAYSZ: return "FINI_ARRAYSZ";
        case DT_RUNPATH: return "RUNPATH";
        case DT_FLAGS: return "FLAGS";
        case DT_ENCODING: return "ENCODING";
        case DT_SYMTAB_SHNDX: return "SYMTAB_SHNDX";
        case DT_RELRSZ: return "RELRSZ";
        case DT_RELR: return "RELR";
        case DT_RELRENT: return "RELRENT";
        case DT_NUM: return "NUM";
        case DT_VALRNGLO: return "VALRNGLO";
        case DT_GNU_PRELINKED: return "GNU_PRELINKED";
        case DT_GNU_CONFLICTSZ: return "GNU_CONFLICTSZ";
        case DT_GNU_LIBLISTSZ: return "GNU_LIBLISTSZ";
        case DT_CHECKSUM: return "CHECKSUM";
        case DT_PLTPADSZ: return "PLTPADSZ";
        case DT_MOVEENT: return "MOVEENT";
        case DT_MOVESZ: return "MOVESZ";
        case DT_FEATURE_1: return "FEATURE_1";
        case DT_POSFLAG_1: return "POSFLAG_1";
        case DT_SYMINSZ: return "SYMINSZ";
        case DT_SYMINENT: return "SYMINENT";
        case DT_ADDRRNGLO: return "ADDRRNGLO";
        case DT_GNU_HASH: return "GNU_HASH";
        case DT_TLSDESC_PLT: return "TLSDESC_PLT";
        case DT_TLSDESC_GOT: return "TLSDESC_GOT";
        case DT_GNU_CONFLICT: return "GNU_CONFLICT";
        case DT_GNU_LIBLIST: return "GNU_LIBLIST";
        case DT_CONFIG: return "CONFIG";
        case DT_DEPAUDIT: return "DEPAUDIT";
        case DT_AUDIT: return "AUDIT";
        case DT_PLTPAD: return "PLTPAD";
        case DT_MOVETAB: return "MOVETAB";
        case DT_SYMINFO: return "SYMINFO";
        case DT_VERSYM: return "VERSYM";
        case DT_RELACOUNT: return "RELACOUNT";
        case DT_RELCOUNT: return "RELCOUNT";
        case DT_FLAGS_1: return "FLAGS_1";
        case DT_VERDEF: return "VERDEF";
        case DT_VERDEFNUM: return "VERDEFNUM";
        case DT_VERNEED: return "VERNEED";
        case DT_VERNEEDNUM: return "VERNEEDNUM";
        case DT_AUXILIARY: return "AUXILIARY";
        case DT_FILTER: return "FILTER";
        default: return "Unknown Type";
    }
}
uint32_t getDynamicDunType(Elf_Xword value) {
    switch (value) {
        case DT_NULL:
        case DT_NEEDED:
        case DT_PLTRELSZ:
        case DT_RELASZ:
        case DT_RELAENT:
        case DT_STRSZ:
        case DT_SYMENT:
        case DT_SONAME:
        case DT_RPATH:
        case DT_SYMBOLIC:
        case DT_RELSZ:
        case DT_RELENT:
        case DT_PLTREL:
        case DT_TEXTREL:
        case DT_BIND_NOW:
        case DT_LOPROC:
        case DT_HIPROC:
            return DT_VAL;
        case DT_PLTGOT:
        case DT_HASH:
        case DT_STRTAB:
        case DT_SYMTAB:
        case DT_RELA:
        case DT_INIT:
        case DT_FINI:
        case DT_JMPREL:
        case DT_DEBUG:
        case DT_REL:
            return DT_PTR;
        default:
            return DT_VAL;
    }
}
void printDynamicSegment32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,uint8_t* pFileBuffer) {
    for (int i = 0; i < sectionNum; i++) {
        if (pSectionHeader[i].sh_type == SHT_DYNAMIC) {
            Elf32_Shdr *pDynamicSection = &pSectionHeader[i];
            Elf32_Word dynamicItemNum = pDynamicSection->sh_size / pDynamicSection->sh_entsize;
            printf("Dynamic Section At File Offset %#x Contains %d Entries:\n", pDynamicSection->sh_offset,dynamicItemNum);
            printf("\tTag \t\tType\t\t\t\tName/Value\n");
            Elf32_Dyn *pDynamicTable = (Elf32_Dyn *) (pFileBuffer + pDynamicSection->sh_offset);
            Elf32_Shdr *pDynamicStringTableHeader = &pSectionHeader[pDynamicSection->sh_link];
            // dynamic string table
            char *pDynamicStringTable = (char *) pFileBuffer + pDynamicStringTableHeader->sh_offset;
            for (int j = 0; j < dynamicItemNum; j++) {
                printf("\t%08x", pDynamicTable[j].d_tag);
                printf("\t%-16s", getDynamicType(pDynamicTable[j].d_tag));
                printf("\t%08x\t", pDynamicTable[j].d_un.d_val);
                if (getDynamicDunType(pDynamicTable[j].d_tag) == DT_PTR) //Some special item is ptr
                    printf("(PTR)");
                //Index of shared library path in dynamic string table
                switch (pDynamicTable[j].d_tag) {
                    case DT_NEEDED: printf("[%s]", pDynamicStringTable + pDynamicTable[j].d_un.d_val);
                        break;
                    case DT_SONAME: printf("[%s]", pDynamicStringTable + pDynamicTable[j].d_un.d_val);
                        break;
                    default: ;
                }
                printf("\n");
            }
        }
    }
}

Hash Table (Export Table)

哈希表可用于查询导出函数, 有两种, 目前的elf文件主要是用GNU HASH表作为导出表

1
2
.hash       //旧版,可以查导入和导出函数 DT_HASH
.gnu.hash   //新版,只能查导出函数 DT_GNU_HASH

ELF Hash

Hash表定义如下

1
2
3
4
5
6
struct ELFHash { 
    uint32_t  nbucket;  //bucket的数目
    uint32_t  nchain;   //chain的数目,和动态符号表的符号数相同
    uint32_t  buckets[];  //nbucket个项的数组
    uint32_t  chains[];   //nchain个项的数组
};

Linux原始Elf Hash算法如下

1
2
3
4
5
6
7
8
9
10
11
12
uint32_t elf_hash(const unsigned char* name)
{
 uint32_t h = 0, g;
 while (*name)
 {
  h = (h << 4) + *name++;
  if (g = h & 0xf0000000)
      h ^= g >> 24;
  h &= ~g;
 }
 return h;
}

ELF Hash Table根据符号名查找符号地址的流程如下

  1. 根据elfhash函数计算符号名的hash

  2. index=buckets[hash%nbucket]

    index即为符号在符号表中的索引

  3. 如果index==SHT_UNDEF(0)则未找到符号,结束

    否则判断符号表中索引index的符号和目标符号是否相同

  4. 如果符号名不同则根据index从chains表找下一个符号索引,继续第3步

    index=chains[index] (如果chains[index]==0说明不存在该符号)

代码表示如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
uint32_t findSymbolIndexByElfHash(const char* symbolName,
                                  uint32_t* pHashTable,
                                  Elf32_Sym* pSymbolTable,
                                  const char* pSymbolStringTable)
{
    uint32_t nbucket=pHashTable[0],nchain=pHashTable[1];
    uint32_t* buckets=&pHashTable[2],*chains=&pHashTable[2+nbucket];
    uint32_t hash = elf_hash(symbolName);
    for (uint32_t index=buckets[hash % nbucket]; index; index = chains[index]) {
        if (strcmp(symbolName, &pSymbolStringTable[pSymbolTable[index].st_name]) == 0) {
            return index;
        }
    }
    return 0;
}

手工查找流程示例:

由于x86_64下gcc编译的elf程序默认只使用gnu.hash,以Android NDK得到的64位so为例

找到.hash节,发现nbucket=nchain=0x36

HashTable-libfindflagso

根据elfhash计算bucket下标, index=hash%nbucket =48

HashTable-计算bucket下标

由于bucket项大小为4字节,从0x960开始+48*4=0xA20

得到动态符号表下标为0xE(14), 查找符号表正好对应dlopen函数

HashTable-找到dlopen

Android Elf Hash

Android的elfhash算法代码有所不同,但和原始elfhash等价

参考 https://cs.android.com/android/platform/superproject/+/android-4.1.2_r2.1:bionic/linker/linker.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
static unsigned elfhash(const char *_name)
{
    const unsigned char *name = (const unsigned char *) _name;
    unsigned h = 0, g;
 
    while(*name) {
        h = (h << 4) + *name++;
        g = h & 0xf0000000;
        h ^= g;
        h ^= g >> 24;
    }
    return h;
}
 
static Elf32_Sym *_elf_lookup(soinfo *si, unsigned hash, const char *name)
{
    Elf32_Sym *s;
    Elf32_Sym *symtab = si->symtab;
    const char *strtab = si->strtab;
    unsigned n;
 
    TRACE_TYPE(LOOKUP, "%5d SEARCH %s in %s@0x%08x %08x %d\n", pid,
               name, si->name, si->base, hash, hash % si->nbucket);
    n = hash % si->nbucket;
 
    for(n = si->bucket[hash % si->nbucket]; n != 0; n = si->chain[n]){
        s = symtab + n;
        if(strcmp(strtab + s->st_name, name)) continue;
 
            /* only concern ourselves with global and weak symbol definitions */
        switch(ELF32_ST_BIND(s->st_info)){
        case STB_GLOBAL:
        case STB_WEAK:
                /* no section == undefined */
            if(s->st_shndx == 0) continue;
 
            TRACE_TYPE(LOOKUP, "%5d FOUND %s in %s (%08x) %d\n", pid,
                       name, si->name, s->st_value, s->st_size);
            return s;
        }
    }
 
    return NULL;
}

Sysv Hash

Elf Hash在Android又定义为为Sysv Hash,参考https://cs.android.com/android/platform/superproject/+/android14-qpr3-release:external/musl/ldso/dynlink.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static uint32_t sysv_hash(const char *s0)
{
    const unsigned char *s = (void *)s0;
    uint_fast32_t h = 0;
    while (*s) {
        h = 16*h + *s++;
        h ^= h>>24 & 0xf0;
    }
    return h & 0xfffffff;
}
static Sym *sysv_lookup(const char *s, uint32_t h, struct dso *dso)
{
    size_t i;
    Sym *syms = dso->syms;
    Elf_Symndx *hashtab = dso->hashtab;
    char *strings = dso->strings;
    for (i=hashtab[2+h%hashtab[0]]; i; i=hashtab[2+hashtab[0]+i]) {
        if ((!dso->versym || dso->versym[i] >= 0)
            && (!strcmp(s, strings+syms[i].st_name)))
            return syms+i;
    }
    return 0;
}

GNU Hash

GNU Hash表项如下

1
2
3
4
5
6
7
8
9
struct GnuHash { 
    uint32_t nbucket; 
    uint32_t symndx;        //支持查找index>=symndx的符号, index<symndx的不能直接通过GNU Hash表查找
    uint32_t bloomSize;     // 布隆过滤器需要的3个数据,用于快速判断某个符号是否查不到
    uint32_t bloomShift;    // 
    ElfW(Addr) blooms[];    // bloomSize个项的数组 32/64位下, 元素大小分别为uint32_t/uint64_t 
    uint32_t  buckets[];    // nbucket个项的数组
    uint32_t  chains[];     // 和符号表索引一一对应, chain的大小等于导出函数个数
};

可以发现,GNU Hash并没有给出nchain字段,如何计算?

  • chains数组前面是连续的blooms和buckets数组,只要根据哈希表大小减去前面的成员大小即可
  • 32位 nchain=GNUHashTable.sh_size/sizeof(uint32_t) - (4+bloomSize+nbucket)
  • 64位 nchain=GNUHashTable.sh_size/sizeof(uint32_t) - (4+bloomSize*2+nbucket)

查找GNU Hash表的示意图如下:

  1. chain表的虚线部分并不存在

    除了导出符号之外的符号chain表并无必要保存,但chain表的索引和符号表要一一对应

    所以chain表的理论起始地址=buckets+nbucket-symndx

    但在文件的排列上,各项是连续的,chains有效内容仍然在buckets后方

  2. chain表每个表项保存符号的哈希值

    最低位为0时表示对应的符号有剩余哈希冲突项

    为1时表示没有剩余冲突项

详细可参考ELF 通过 Sysv Hash & Gnu Hash 查找符号的实现及对比ELF解析07_哈希表, 导出表

参考https://cs.android.com/android/platform/superproject/+/android14-qpr3-release:external/musl/ldso/dynlink.c

Android Linker的源码实现如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
uint32_t gnu_hash(const unsigned char* str)
{
    uint_32 h = 5381;// 0x1505
    while(*str != 0)
    {
        h += (h<<5) +*str++;// 33 * h + *str = h*33 + c = h + h * 32 + c = h + h << 5 + c
    }
    return h;
}
 
static Sym *gnu_lookup(uint32_t h1, uint32_t *hashtab, struct dso *dso, const char *s)
{
    uint32_t nbuckets = hashtab[0];
    uint32_t *buckets = hashtab + 4 + hashtab[2]*(sizeof(size_t)/4);
    uint32_t i = buckets[h1 % nbuckets];
 
    if (!i) return 0;
 
    uint32_t *hashval = buckets + nbuckets + (i - hashtab[1]);
 
    for (h1 |= 1; ; i++) {
        uint32_t h2 = *hashval++;
        if ((h1 == (h2|1)) && (!dso->versym || dso->versym[i] >= 0)
            && !strcmp(s, dso->strings + dso->syms[i].st_name))
            return dso->syms+i;
        if (h2 & 1) break;
    }
 
    return 0;
}

打印哈希表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
unsigned int elf_hash(const char* _name)
{
    const unsigned char* name=(const unsigned char*)_name;
    unsigned int h = 0, g;
    while (*name)
    {
        h = (h << 4) + *name++;
        if (g = h & 0xf0000000)
            h ^= g >> 24;
        h &= ~g;
    }
    return h;
}
void printHashTable32(Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,uint8_t* pFileBuffer,const char* pSectionHeaderStringTable) {
 printf("ELF Hash Tables:\n");
    for(int i=0;i<sectionNum;i++) {
        if(pSectionHeader[i].sh_type==SHT_HASH) {
            //SHT_HASH 可同时查询导入和导出函数,linux默认弃用,android保留该节
            //对于SHT_HASH类型而言,index=buckets[elfhash(symbolName)%nbucket]作为符号表索引
            //如果index==0则符号不存在,如果符号不等则index=chains[index]继续循环判断
            Elf32_Shdr* pDynamicSymbolTableHeader=&pSectionHeader[pSectionHeader[i].sh_link];
            Elf32_Sym* pDynamicSymbolTable=(Elf32_Sym*)(pDynamicSymbolTableHeader->sh_offset+pFileBuffer);
            const char* pDynamicSymbolStringTable=(const char*)(pSectionHeader[pDynamicSymbolTableHeader->sh_link].sh_offset+pFileBuffer);
            uint32_t* pHashTable=(uint32_t*)(pSectionHeader[i].sh_offset+pFileBuffer);
            uint32_t nbucket=pHashTable[0],nchain=pHashTable[1];
            uint32_t* buckets=&pHashTable[2];
            uint32_t* chains=&pHashTable[2+nbucket];
            printf("\tHash Table '%s' contains %d entries\n",&pSectionHeaderStringTable[pSectionHeader[i].sh_name],nchain);
            printf("\t\tNum\t\tHash \% Nbucket\t\tIndex\t\t\tValue\t\t\tName\n");
            for(uint32_t j=0,count=0;j<nbucket;j++) {
                uint32_t index=buckets[j];//遍历buckets
                if(index) {
                    //index!=0 说明存在对应符号,打印首个符号
                    printf("\t\t%d\t\t%08x\t\t%08x\t\t%08x\t\t%s\n",++count,elf_hash(&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name])%nbucket,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
                }
                //判断是否存在chain,打印相同hash%nbucket的其余符号,
                while(chains[index]) {
                    index=chains[index];
                    printf("\t\t%d\t\t%08x\t\t%08x\t\t%08x\t\t%s\n",++count,elf_hash(&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name])%nbucket,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
                }
            }
        }
        if(pSectionHeader[i].sh_type==SHT_GNU_HASH) {
            //SHT_GNU_HASH 只能查询导出函数,作为elf的导出函数表
            Elf32_Shdr* pDynamicSymbolTableHeader=&pSectionHeader[pSectionHeader[i].sh_link];
            Elf32_Sym* pDynamicSymbolTable=(Elf32_Sym*)(pDynamicSymbolTableHeader->sh_offset+pFileBuffer);
            const char* pDynamicSymbolStringTable=(const char*)(pSectionHeader[pDynamicSymbolTableHeader->sh_link].sh_offset+pFileBuffer);
            uint32_t* pGNUHashTable=(uint32_t*)(pSectionHeader[i].sh_offset+pFileBuffer);
            uint32_t nbucket=pGNUHashTable[0];
            uint32_t symndx=pGNUHashTable[1];
            uint32_t bloomSize=pGNUHashTable[2];
            uint32_t bloomShift=pGNUHashTable[3];
            Elf32_Addr* blooms=(Elf32_Addr*)&pGNUHashTable[4];
            uint32_t* buckets=pGNUHashTable+4+bloomSize;
            uint32_t* chains=buckets+nbucket-symndx;
            //chain的个数等于导出符号个数,但GNU HASH没有nchain,需要手动计算
            uint32_t nchain=pSectionHeader[i].sh_size/sizeof(uint32_t)-(4+bloomSize+nbucket);
            printf("\tHash Table '%s' contains %d entries, nbucket: %d, symndx: %#x \n",&pSectionHeaderStringTable[pSectionHeader[i].sh_name],nchain,nbucket,symndx);
            printf("\t\tNum\t\tIndex\t\t\tValue\t\t\tName\n");
            for(int j=0,count=0;j<nbucket;j++) {
                uint32_t index=buckets[j];
                if(index) {
                    printf("\t\t%d\t\t%08x\t\t%08x\t\t%s\n",++count,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
                }
                //chain最低位为0时表示有,为1时表示无
                while((chains[index]&1)==0) {
                    index++;
                    printf("\t\t%d\t\t%08x\t\t%08x\t\t%s\n",++count,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
                }
            }
        }
    }
}

ELF Loader

ELF Program Header描述了ELF文件的哪些段需要映射到内存,ELF程序的加载流程如下:

  1. 将elf文件加载到内存中,成为filebuffer

  2. 根据program header,映射filebuffer至imagebuffer

    这一步需要给予不同段正确的权限

  3. 重定位,修复全局变量地址和外部引用地址

    根据elf加载的基地址修复全局变量地址

    外部引用地址需要加载并遍历needed libso,根据符号查找函数真实地址并修复

  4. 跳转至入口点

分别编译loadelf32/64以加载x86/x64的elf文件

gcc -m32 main.c LoadELF.h LoadELF.c -o loadelf32
gcc -m64 main.c LoadELF.h LoadELF.c -o loadelf64

main.c

1
2
3
4
5
6
7
8
9
10
11
// LoadELF
 #include "LoadELF.h"
 #include <stdio.h>
 int main(int argc, char *argv[]) {
     if (argc!= 2) {
         printf("Usage: %s <filepath>\n", argv[0]);
         return 1;
     }
     LoadAndExecElf(argv[1]);
     return 0;
 }

LoadELF.h

1
2
3
4
5
6
7
#ifndef LOADELF_H
#define LOADELF_H
#include <stddef.h>
#include <stdint.h>
uint8_t* readFileToBytes(const char *fileName,size_t* readSize);
void LoadAndExecElf(const char* filePath);
#endif //LOADELF_H

LoadELF.c

根据x86/x64不同环境,定义对应宏

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
#include "LoadELF.h"
#include <stdio.h>
#include <elf.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <string.h>
#include <sys/mman.h>
#include <link.h>
#ifdef __x86_64__
#define Elf_Ehdr Elf64_Ehdr
#define Elf_Phdr Elf64_Phdr
#define Elf_Shdr Elf64_Shdr
#define Elf_Addr Elf64_Addr
#define Elf_Dyn Elf64_Dyn
#define Elf_Rel Elf64_Rela
#define Elf_Sym Elf64_Sym
#define ELF_R_TYPE ELF64_R_TYPE
#define ELF_R_SYM ELF64_R_SYM
#define DT_REL_ITEM DT_RELA
#define DT_REL_SZ DT_RELASZ
#else
#define Elf_Ehdr Elf32_Ehdr
#define Elf_Phdr Elf32_Phdr
#define Elf_Shdr Elf32_Shdr
#define Elf_Addr Elf32_Addr
#define Elf_Dyn  Elf32_Dyn
#define Elf_Rel Elf32_Rel
#define Elf_Sym Elf32_Sym
#define ELF_R_TYPE ELF32_R_TYPE
#define ELF_R_SYM ELF32_R_SYM
#define DT_REL_ITEM DT_REL
#define DT_REL_SZ DT_RELSZ
#endif
uint8_t* readFileToBytes(const char *fileName,size_t* readSize) {
    FILE *file = fopen(fileName, "rb");
    if (file == NULL) {
        printf("Error opening file\n");
        fclose(file);
        return NULL;
    }
    fseek(file, 0,SEEK_END);
    size_t fileSize = ftell(file);
    fseek(file, 0,SEEK_SET);
    uint8_t *buffer = (uint8_t *) malloc(fileSize);
    if (buffer == NULL) {
        printf("Error allocating memory\n");
        fclose(file);
        return NULL;
    }
    size_t bytesRead = fread(buffer, 1, fileSize, file);
    if(bytesRead!=fileSize) {
        printf("Read bytes not equal file size!\n");
        free(buffer);
        fclose(file);
        return NULL;
    }
    fclose(file);
    if(readSize)
        *readSize=bytesRead;
    return buffer;
}
//以指定对齐值对齐
uint64_t alignValue(uint64_t value, uint64_t alignment) {
    return value % alignment ? (value / alignment + 1) * alignment : value;
}
size_t getElfMemorySize(Elf_Phdr* pProgramHeader,Elf_Half segmentNum) {
    size_t size = 0;
    //从后往前遍历段表,最后一个段的内存起始地址+大小对齐后即为镜像大小
    for (int i = segmentNum - 1; i >= 0; i--) {
        if (pProgramHeader[i].p_type == PT_LOAD) {
            size = pProgramHeader[i].p_vaddr + pProgramHeader[i].p_memsz;
            break;
        }
    }
    return alignValue(size, 0x1000);
}
Elf_Word getDynamicTableValueByType(Elf_Dyn *dynamicTable, size_t dynamicTableSize, int type) {
    for (int i = 0; i < dynamicTableSize; i++) {
        if (dynamicTable[i].d_tag == type) {
            return dynamicTable[i].d_un.d_val;
        }
    }
    return 0;
}
 
const char** getNeededLibraryPath(uint8_t* pElfBuffer,Elf_Dyn *pDynamicTable, size_t dynamicTableSize,size_t* neededLibraryNum) {
    //Traverse dynamic segment find needed library
    char** buffer = NULL;
    int num=0;
    char* pImageStringTable=(char*)pElfBuffer+getDynamicTableValueByType(pDynamicTable,dynamicTableSize,DT_STRTAB);
    for (int i = 0; i < dynamicTableSize; i++) {
        if (pDynamicTable[i].d_tag == DT_NEEDED) {
            num++;
            buffer=(char**)realloc(buffer,num*sizeof(char*));
            if(buffer==NULL) {
                printf("Error reallocating memory\n");
                exit(-1);
            }
            buffer[num-1]=pImageStringTable+ pDynamicTable[i].d_un.d_val;
        }
    }
    *neededLibraryNum=num;
    return (const char**)buffer;
}
 
Elf_Addr getSymbolAddress(const char** neededLibrary, size_t neededLibraryNum, const char *symbolName) {
    //Load needed dynamic libraries,and traverse libraries, get symbol address
    for (int i = 0; i < neededLibraryNum; i++) {
        void *handle = dlopen(neededLibrary[i],RTLD_NOW);
        if (handle == NULL) {
            printf("Error opening library %s\n", dlerror());
            exit(1);
        }
        void *address = dlsym(handle, symbolName);
        if (address == NULL) {
            continue;
        }
        return (Elf_Addr)address;
    }
    printf("Can't find address of symbol: %s\n",symbolName);
    return 0;
}
 
void mapSegmentToMemory(uint8_t* pImageBuffer,uint8_t* pFileBuffer,Elf_Phdr* pProgramHeader,Elf_Half segmentNum) {
    for (int i = 0; i < segmentNum; i++) {
        if (pProgramHeader[i].p_type == PT_LOAD) {
            uint8_t *pImageAddr = pImageBuffer + pProgramHeader[i].p_vaddr;//根据内存地址和大小进行映射
            size_t memorySize = pProgramHeader[i].p_memsz;
            Elf_Word segmentFlags = pProgramHeader[i].p_flags;
            int protection = 0;
            memcpy(pImageAddr, pFileBuffer + pProgramHeader[i].p_offset, pProgramHeader[i].p_filesz);
            if (segmentFlags & PF_R) {
                protection |= PROT_READ;
            }
            if (segmentFlags & PF_W) {
                protection |= PROT_WRITE;
            }
            if (segmentFlags & PF_X) {
                protection |= PROT_EXEC;
            }
            mprotect(pImageAddr, alignValue(memorySize, 0x1000), protection);//页面权限设置
        }
    }
}
void fixRelocationItem(Elf_Rel* pRelocationTable,Elf_Word relocationItemNum,uint8_t* pImageBuffer,const char* pDynamicStringTable,Elf_Sym* pDynamicSymbolTable,const char** neededLibrary,size_t neededLibraryNum) {
    Elf_Addr* fixItem=NULL;//根据位数不同,修复项4/8字节
    Elf_Addr baseAddr=(Elf_Addr)pImageBuffer;
    for(int i=0;i<relocationItemNum;i++) {
        switch (ELF_R_TYPE(pRelocationTable[i].r_info)) {
            //Relocate base address
            case R_386_RELATIVE:
                fixItem=(Elf_Addr*)(pImageBuffer+pRelocationTable[i].r_offset);
                *fixItem+=baseAddr;
                break;
            //Fix GOT and PLT
            case R_386_GLOB_DAT:
            case R_386_JMP_SLOT:
                // Get symbol name and real address
                const char* symbolName=&pDynamicStringTable[ pDynamicSymbolTable[ELF_R_SYM(pRelocationTable[i].r_info)].st_name ];//符号表表项的name属性是字符串表下标
                fixItem=(Elf_Addr*)(pImageBuffer+pRelocationTable[i].r_offset);
                Elf_Addr symbolAddr=getSymbolAddress(neededLibrary,neededLibraryNum,symbolName);
                *fixItem=symbolAddr;
                break;
        }
    }
}
void LoadAndExecElf(const char* filePath) {
    //1. Read file to memory buffer
    size_t readFileSize=0;
    uint8_t* pFileBuffer=readFileToBytes(filePath,&readFileSize);
    if(pFileBuffer==NULL) {
        printf("Error reading file\n");
        return;
    }
    Elf_Ehdr* pElfHeader=(Elf_Ehdr*)pFileBuffer;
    Elf_Phdr *pProgramHeader=(Elf_Phdr*)(pFileBuffer+pElfHeader->e_phoff);
    Elf_Half segmentNum=pElfHeader->e_phnum;
    uint8_t* pImageBuffer=NULL;
 
    //2. Mapping file buffer to image buffer
    size_t elfMemorySize = getElfMemorySize(pProgramHeader,segmentNum);
    if (elfMemorySize == 0) {
        printf("ELF memory size is 0!\n");
        return;
    }
    posix_memalign((void*)&pImageBuffer, 0x1000, elfMemorySize); //Alloc align memory
    if (pImageBuffer == NULL) {
        printf("Error allocating memory\n");
        return;
    }
    memset(pImageBuffer,0  ,elfMemorySize);
    // Mapping segments to memory and set protection
    mapSegmentToMemory(pImageBuffer,pFileBuffer,pProgramHeader,segmentNum);
 
    //3. Relocate
    Elf_Phdr *pDynamicTableHeader=NULL;
    Elf_Dyn *pDynamicTable=NULL;
    for (int i = 0; i < segmentNum; i++) {
        if (pProgramHeader[i].p_type == PT_DYNAMIC) {
            pDynamicTableHeader = &pProgramHeader[i];
            break;
        }
    }
    pDynamicTable = (Elf_Dyn *) (pImageBuffer + pDynamicTableHeader->p_vaddr);
    size_t dynamicItemNum = pDynamicTableHeader->p_filesz / sizeof(Elf_Dyn);
    Elf_Rel *pRelocationTable =NULL;
    size_t relocationItemNum=0;
    Elf_Rel *pJmpRelocationTable = (Elf_Rel *) (pImageBuffer + getDynamicTableValueByType(pDynamicTable, dynamicItemNum,DT_JMPREL));
    size_t jmpRelocationItemNum=0;
    Elf_Sym *pDynamicSymbolTable = NULL;
    char *pDynamicStringTable = NULL;
    for (int i = 0; i <dynamicItemNum; i++) {
        switch (pDynamicTable[i].d_tag) {
            case DT_REL_ITEM: pRelocationTable=(Elf_Rel*)(pImageBuffer+pDynamicTable[i].d_un.d_val); break;
            case DT_JMPREL: pJmpRelocationTable=(Elf_Rel*)(pImageBuffer+pDynamicTable[i].d_un.d_val); break;
            case DT_REL_SZ: relocationItemNum=pDynamicTable[i].d_un.d_val/sizeof(Elf_Rel); break;
            case DT_PLTRELSZ: jmpRelocationItemNum=pDynamicTable[i].d_un.d_val/sizeof(Elf_Rel); break;
            case DT_SYMTAB:pDynamicSymbolTable=(Elf_Sym*)(pImageBuffer+pDynamicTable[i].d_un.d_val);break;
            case DT_STRTAB:pDynamicStringTable=(char*)(pImageBuffer+pDynamicTable[i].d_un.d_val);break;
        }
    }
 
    size_t neededLibraryNum=0;
    const char** neededLibrary=getNeededLibraryPath(pImageBuffer,pDynamicTable,dynamicItemNum,&neededLibraryNum);
    fixRelocationItem(pRelocationTable,relocationItemNum,pImageBuffer,pDynamicStringTable,pDynamicSymbolTable,neededLibrary,neededLibraryNum);
    fixRelocationItem(pJmpRelocationTable,jmpRelocationItemNum,pImageBuffer,pDynamicStringTable,pDynamicSymbolTable,neededLibrary,neededLibraryNum);
 
    //4. Jump to entry point
    typedef void (*VoidFunctionPtr)();
    VoidFunctionPtr entry=(VoidFunctionPtr)(pImageBuffer+pElfHeader->e_entry);
    printf("Load ELF success!Jump to entry point:%#lx\n",(unsigned long long)entry);
    entry();
    printf("Come back\n");
 
}

效果如下

References

ELF文件格式

ELF文件格式解析

《程序员的自我修养》

ELF加载器的原理与实现

【内核】ELF 文件执行流程

说一下Linux可执行文件的格式,ELF格式

ELF解析07_哈希表, 导出表

ELF 通过 Sysv Hash & Gnu Hash 查找符号的实现及对比

[翻译]GNU Hash ELF Sections


[注意]传递专业知识、拓宽行业人脉——看雪讲师团队等你加入!

最后于 7小时前 被东方玻璃编辑 ,原因: 上传附件
收藏
免费 2
支持
分享
最新回复 (0)
游客
登录 | 注册 方可回帖
返回
//