首页
社区
课程
招聘
[原创]Android Linker详解(二)
发表于: 2021-10-21 14:15 23287

[原创]Android Linker详解(二)

2021-10-21 14:15
23287

接上篇Linker源码详解(一),本文继续来分析Linker的链接过程。为了更好的理解Unidbg的原理,我们需要了解很多细节。虽然一个模拟二进制执行框架的弊端很多,但也是未来二进制分析的一个很好的思路。

上篇文章我们讲解了Linker的装载,将So文件按PT_LOAD段的指示来将So加载到内存,那么我们这篇文章就来分析一下加载完之后又干了什么呢?

http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp#702

上篇我们进入了elf_reader.Load()函数,阅读了Linker的装载源码,当装载结束后,对soinfo结构体进行赋值(So文件的头信息/装载的结果),并插入到链表,接着我们回到上层函数继续看

http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp#751

我们从上面这个函数中看到,当调用了load_library函数之后,又调用了soinfo_link_image这个函数。这个函数也就是我们今天分析的一个主要入口--链接

下面的这个函数很长,我给大家把不相关的代码去掉,大家先通过注释来看一遍这个函数在干什么

http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp#1303

上面的函数虽然很长,但是它想表达的意思很简单,我们再来回顾下它干了什么事情

下面我们就来分析它的soinfo_relocate函数,我们看到它调用了两次,只不过入参不同,分别是我们的重定位表和PLT重定位表

http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp#848

上面这个函数就是在处理重定位相关的信息了,我们看到从Dynamic段中拿到的跟重定位相关的表,会经过这个函数来处理,将So本身的地址引用进行重定位,使其可以正常运行。其实在32位So中,需要处理的重定位类型并不是很多,就4种类型需要处理,而且还有两种处理方式相同

现在So就重定位完成了,现在So已经就可以跑起来了,下面我们就来看看从Dynamic段中拿到的各种初始化函数是怎么处理的,还记得吧

我们回到do_dlopen函数

http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp#823

此时我们的find_library函数已经处理完了,So已经被装载且链接过了,最后一步它调用了soinfo的CallConstructors函数,我们来看看这个函数处理了什么

http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp#1192

http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp#1172

至此,Linker就分析结束了

我们在最后说一个Unidbg细节的bug,但是现在已经被修复了,就是作为一个扩展吧。我们来看下面一段Unidbg加载So的代码

如果我们细心的阅读Linker的源码,就会发现Unidbg这里处理的是不恰当的。在本文的最后,我们看到了初始化函数的调用,是DT_INIT函数先被执行,后面再处理DT_INIT_ARRAY,而Unidbg这里就是将他们都添加到一个List,再一起调用。这样就会产生一个问题,在某些加壳的So中,它的DT_INIT_ARRAY是在DT_INIT函数执行之后,才会有值的(进行修复),所以按照Unidbg这个写法就无法执行INIT_ARRAY或部分INIT_ARRAY无法执行。处理方法也很简单,注释在上面了,只需要让DT_INIT先执行就可以了。

那么本篇文章对Linker的讲解就到这里了,如果您觉得有用,可以加个VX一起学习呀:roy5ue

 
static soinfo* load_library(const char* name) {
    //...
    ElfReader elf_reader(name, fd);
    if (!elf_reader.Load()) {
        return NULL;
    }
 
    const char* bname = strrchr(name, '/');
    soinfo* si = soinfo_alloc(bname ? bname + 1 : name);
    if (si == NULL) {
        return NULL;
    }
    si->base = elf_reader.load_start();
    si->size = elf_reader.load_size();
    si->load_bias = elf_reader.load_bias();
    si->flags = 0;
    si->entry = 0;
    si->dynamic = NULL;
    si->phnum = elf_reader.phdr_count();
    si->phdr = elf_reader.loaded_phdr();
    return si;
}
static soinfo* load_library(const char* name) {
    //...
    ElfReader elf_reader(name, fd);
    if (!elf_reader.Load()) {
        return NULL;
    }
 
    const char* bname = strrchr(name, '/');
    soinfo* si = soinfo_alloc(bname ? bname + 1 : name);
    if (si == NULL) {
        return NULL;
    }
    si->base = elf_reader.load_start();
    si->size = elf_reader.load_size();
    si->load_bias = elf_reader.load_bias();
    si->flags = 0;
    si->entry = 0;
    si->dynamic = NULL;
    si->phnum = elf_reader.phdr_count();
    si->phdr = elf_reader.loaded_phdr();
    return si;
}
static soinfo* find_library_internal(const char* name) {
  //...
  si = load_library(name);
  if (si == NULL) {
    return NULL;
  }
 
  // At this point we know that whatever is loaded @ base is a valid ELF
  // shared library whose segments are properly mapped in.
  TRACE("[ init_library base=0x%08x sz=0x%08x name='%s' ]",
        si->base, si->size, si->name);
 
  if (!soinfo_link_image(si)) {
    munmap(reinterpret_cast<void*>(si->base), si->size);
    soinfo_free(si);
    return NULL;
  }
 
  return si;
}
static soinfo* find_library_internal(const char* name) {
  //...
  si = load_library(name);
  if (si == NULL) {
    return NULL;
  }
 
  // At this point we know that whatever is loaded @ base is a valid ELF
  // shared library whose segments are properly mapped in.
  TRACE("[ init_library base=0x%08x sz=0x%08x name='%s' ]",
        si->base, si->size, si->name);
 
  if (!soinfo_link_image(si)) {
    munmap(reinterpret_cast<void*>(si->base), si->size);
    soinfo_free(si);
    return NULL;
  }
 
  return si;
}
 
static bool soinfo_link_image(soinfo* si) {
    //拿到地址、段表指针、段表数
    Elf32_Addr base = si->load_bias;
    const Elf32_Phdr *phdr = si->phdr;
    int phnum = si->phnum;
 
    //...
 
    size_t dynamic_count;
    Elf32_Word dynamic_flags;
    //这个函数很简单,就是遍历段表,找到类型为PT_DYNAMIC的段
    phdr_table_get_dynamic_section(phdr, phnum, base, &si->dynamic,
                                   &dynamic_count, &dynamic_flags);
    if (si->dynamic == NULL) {
        if (!relocating_linker) {
            DL_ERR("missing PT_DYNAMIC in \"%s\"", si->name);
        }
        return false;
    }
 
#ifdef ANDROID_ARM_LINKER
    //异常相关,有兴趣的同学可以看看
    (void) phdr_table_get_arm_exidx(phdr, phnum, base,
                                    &si->ARM_exidx, &si->ARM_exidx_count);
#endif
    //上面我们解析到了Dynamic段的地址跟数量,下面就开始遍历Dynamic信息
    uint32_t needed_count = 0;
    //DT_NULL表示结束
    for (Elf32_Dyn* d = si->dynamic; d->d_tag != DT_NULL; ++d) {
        DEBUG("d = %p, d[0](tag) = 0x%08x d[1](val) = 0x%08x", d, d->d_tag, d->d_un.d_val);
        switch(d->d_tag){
        case DT_HASH:
            //哈希表
            si->nbucket = ((unsigned *) (base + d->d_un.d_ptr))[0];
            si->nchain = ((unsigned *) (base + d->d_un.d_ptr))[1];
            si->bucket = (unsigned *) (base + d->d_un.d_ptr + 8);
            si->chain = (unsigned *) (base + d->d_un.d_ptr + 8 + si->nbucket * 4);
            break;
        case DT_STRTAB:
            //字符串表
            si->strtab = (const char *) (base + d->d_un.d_ptr);
            break;
        case DT_SYMTAB:
            //符号表
            si->symtab = (Elf32_Sym *) (base + d->d_un.d_ptr);
            break;
        case DT_PLTREL:
            //未处理
            if (d->d_un.d_val != DT_REL) {
                DL_ERR("unsupported DT_RELA in \"%s\"", si->name);
                return false;
            }
            break;
        case DT_JMPREL:
            //PLT重定位表
            si->plt_rel = (Elf32_Rel*) (base + d->d_un.d_ptr);
            break;
        case DT_PLTRELSZ:
            //PLT重定位表大小
            si->plt_rel_count = d->d_un.d_val / sizeof(Elf32_Rel);
            break;
        case DT_REL:
            //重定位表
            si->rel = (Elf32_Rel*) (base + d->d_un.d_ptr);
            break;
        case DT_RELSZ:
            //重定位表大小
            si->rel_count = d->d_un.d_val / sizeof(Elf32_Rel);
            break;
        case DT_PLTGOT:
            //GOT全局偏移表,跟PLT延时绑定相关,此处未处理,在Unidbg中也没有处理此项
            si->plt_got = (unsigned *)(base + d->d_un.d_ptr);
            break;
        case DT_DEBUG:
            //调试相关, Unidbg未处理,不必理会
            if ((dynamic_flags & PF_W) != 0) {
                d->d_un.d_val = (int) &_r_debug;
            }
            break;
         case DT_RELA:
            //RELA表跟REL表在Unidbg中的处理方案是相同的,这两个值有哪个就用哪个,RELA只是比REL表多了一个adden常量
            DL_ERR("unsupported DT_RELA in \"%s\"", si->name);
            return false;
        case DT_INIT:
            //初始化函数
            si->init_func = reinterpret_cast<linker_function_t>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_INIT) found at %p", si->name, si->init_func);
            break;
        case DT_FINI:
            //析构函数
            si->fini_func = reinterpret_cast<linker_function_t>(base + d->d_un.d_ptr);
            DEBUG("%s destructors (DT_FINI) found at %p", si->name, si->fini_func);
            break;
        case DT_INIT_ARRAY:
            //init.array 初始化函数列表,后面我们会看到这些初始化函数的调用顺序
            si->init_array = reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_INIT_ARRAY) found at %p", si->name, si->init_array);
            break;
        case DT_INIT_ARRAYSZ:
            //init.array 大小
            si->init_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_FINI_ARRAY:
            //析构函数列表
            si->fini_array = reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr);
            DEBUG("%s destructors (DT_FINI_ARRAY) found at %p", si->name, si->fini_array);
            break;
        case DT_FINI_ARRAYSZ:
            //fini.array 大小
            si->fini_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_PREINIT_ARRAY:
            //也是初始化函数,但是跟init.array不同,这个段大多只出现在可执行文件中,在So中我选择了忽略
            si->preinit_array = reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_PREINIT_ARRAY) found at %p", si->name, si->preinit_array);
            break;
        case DT_PREINIT_ARRAYSZ:
            //preinit 列表大小
            si->preinit_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_TEXTREL:
            si->has_text_relocations = true;
            break;
        case DT_SYMBOLIC:
            si->has_DT_SYMBOLIC = true;
            break;
        case DT_NEEDED:
            //当前So的依赖
            ++needed_count;
            break;
#if defined DT_FLAGS
        // TODO: why is DT_FLAGS not defined?
        case DT_FLAGS:
            if (d->d_un.d_val & DF_TEXTREL) {
                si->has_text_relocations = true;
            }
            if (d->d_un.d_val & DF_SYMBOLIC) {
                si->has_DT_SYMBOLIC = true;
            }
            break;
#endif
        }
    }
 
    //... Sanity checks.
 
    //至此,Dynamic段的信息就解析完毕了,其中想表达的信息也被处理后放到了soinfo中,后面直接就可以拿来用了
    // 开辟依赖库的soinfo空间,准备处理依赖
    soinfo** needed = (soinfo**) alloca((1 + needed_count) * sizeof(soinfo*));
    soinfo** pneeded = needed;
    //再次遍历Dynamic段
    for (Elf32_Dyn* d = si->dynamic; d->d_tag != DT_NULL; ++d) {
        if (d->d_tag == DT_NEEDED) {
            //查找DT_NEEDED项
            const char* library_name = si->strtab + d->d_un.d_val;
            DEBUG("%s needs %s", si->name, library_name);
            //进行依赖处理,跟加载so一样的路线,还是已加载直接返回,未加载进行查找加载
            soinfo* lsi = find_library(library_name);
            if (lsi == NULL) {
                strlcpy(tmp_err_buf, linker_get_error_buffer(), sizeof(tmp_err_buf));
                DL_ERR("could not load library \"%s\" needed by \"%s\"; caused by %s",
                       library_name, si->name, tmp_err_buf);
                return false;
            }
            *pneeded++ = lsi;
        }
    }
    *pneeded = NULL;
    //至此依赖库也已经加载完毕
 
    //处理重定位
    if (si->plt_rel != NULL) {
        DEBUG("[ relocating %s plt ]", si->name );
        if (soinfo_relocate(si, si->plt_rel, si->plt_rel_count, needed)) {
            return false;
        }
    }
    if (si->rel != NULL) {
        DEBUG("[ relocating %s ]", si->name );
        if (soinfo_relocate(si, si->rel, si->rel_count, needed)) {
            return false;
        }
    }
    //设置soinfo的LINKED标志,表示已进行链接
    si->flags |= FLAG_LINKED;
    DEBUG("[ finished linking %s ]", si->name);
 
    //...
    return true;
}
static bool soinfo_link_image(soinfo* si) {
    //拿到地址、段表指针、段表数
    Elf32_Addr base = si->load_bias;
    const Elf32_Phdr *phdr = si->phdr;
    int phnum = si->phnum;
 
    //...
 
    size_t dynamic_count;
    Elf32_Word dynamic_flags;
    //这个函数很简单,就是遍历段表,找到类型为PT_DYNAMIC的段
    phdr_table_get_dynamic_section(phdr, phnum, base, &si->dynamic,
                                   &dynamic_count, &dynamic_flags);
    if (si->dynamic == NULL) {
        if (!relocating_linker) {
            DL_ERR("missing PT_DYNAMIC in \"%s\"", si->name);
        }
        return false;
    }
 
#ifdef ANDROID_ARM_LINKER
    //异常相关,有兴趣的同学可以看看
    (void) phdr_table_get_arm_exidx(phdr, phnum, base,
                                    &si->ARM_exidx, &si->ARM_exidx_count);
#endif
    //上面我们解析到了Dynamic段的地址跟数量,下面就开始遍历Dynamic信息
    uint32_t needed_count = 0;
    //DT_NULL表示结束
    for (Elf32_Dyn* d = si->dynamic; d->d_tag != DT_NULL; ++d) {
        DEBUG("d = %p, d[0](tag) = 0x%08x d[1](val) = 0x%08x", d, d->d_tag, d->d_un.d_val);
        switch(d->d_tag){
        case DT_HASH:
            //哈希表
            si->nbucket = ((unsigned *) (base + d->d_un.d_ptr))[0];
            si->nchain = ((unsigned *) (base + d->d_un.d_ptr))[1];
            si->bucket = (unsigned *) (base + d->d_un.d_ptr + 8);
            si->chain = (unsigned *) (base + d->d_un.d_ptr + 8 + si->nbucket * 4);
            break;
        case DT_STRTAB:
            //字符串表
            si->strtab = (const char *) (base + d->d_un.d_ptr);
            break;
        case DT_SYMTAB:
            //符号表
            si->symtab = (Elf32_Sym *) (base + d->d_un.d_ptr);
            break;
        case DT_PLTREL:
            //未处理
            if (d->d_un.d_val != DT_REL) {
                DL_ERR("unsupported DT_RELA in \"%s\"", si->name);
                return false;
            }
            break;
        case DT_JMPREL:
            //PLT重定位表
            si->plt_rel = (Elf32_Rel*) (base + d->d_un.d_ptr);
            break;
        case DT_PLTRELSZ:
            //PLT重定位表大小
            si->plt_rel_count = d->d_un.d_val / sizeof(Elf32_Rel);
            break;
        case DT_REL:
            //重定位表
            si->rel = (Elf32_Rel*) (base + d->d_un.d_ptr);
            break;
        case DT_RELSZ:
            //重定位表大小
            si->rel_count = d->d_un.d_val / sizeof(Elf32_Rel);
            break;
        case DT_PLTGOT:
            //GOT全局偏移表,跟PLT延时绑定相关,此处未处理,在Unidbg中也没有处理此项
            si->plt_got = (unsigned *)(base + d->d_un.d_ptr);
            break;
        case DT_DEBUG:
            //调试相关, Unidbg未处理,不必理会
            if ((dynamic_flags & PF_W) != 0) {
                d->d_un.d_val = (int) &_r_debug;
            }
            break;
         case DT_RELA:
            //RELA表跟REL表在Unidbg中的处理方案是相同的,这两个值有哪个就用哪个,RELA只是比REL表多了一个adden常量
            DL_ERR("unsupported DT_RELA in \"%s\"", si->name);
            return false;
        case DT_INIT:
            //初始化函数
            si->init_func = reinterpret_cast<linker_function_t>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_INIT) found at %p", si->name, si->init_func);
            break;
        case DT_FINI:
            //析构函数
            si->fini_func = reinterpret_cast<linker_function_t>(base + d->d_un.d_ptr);
            DEBUG("%s destructors (DT_FINI) found at %p", si->name, si->fini_func);
            break;
        case DT_INIT_ARRAY:
            //init.array 初始化函数列表,后面我们会看到这些初始化函数的调用顺序
            si->init_array = reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_INIT_ARRAY) found at %p", si->name, si->init_array);
            break;
        case DT_INIT_ARRAYSZ:
            //init.array 大小
            si->init_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_FINI_ARRAY:
            //析构函数列表
            si->fini_array = reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr);
            DEBUG("%s destructors (DT_FINI_ARRAY) found at %p", si->name, si->fini_array);
            break;
        case DT_FINI_ARRAYSZ:
            //fini.array 大小
            si->fini_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_PREINIT_ARRAY:
            //也是初始化函数,但是跟init.array不同,这个段大多只出现在可执行文件中,在So中我选择了忽略
            si->preinit_array = reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_PREINIT_ARRAY) found at %p", si->name, si->preinit_array);
            break;
        case DT_PREINIT_ARRAYSZ:
            //preinit 列表大小
            si->preinit_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_TEXTREL:
            si->has_text_relocations = true;
            break;
        case DT_SYMBOLIC:
            si->has_DT_SYMBOLIC = true;
            break;
        case DT_NEEDED:
            //当前So的依赖
            ++needed_count;
            break;
#if defined DT_FLAGS
        // TODO: why is DT_FLAGS not defined?
        case DT_FLAGS:
            if (d->d_un.d_val & DF_TEXTREL) {
                si->has_text_relocations = true;
            }
            if (d->d_un.d_val & DF_SYMBOLIC) {
                si->has_DT_SYMBOLIC = true;
            }
            break;
#endif
        }
    }
 
    //... Sanity checks.
 
    //至此,Dynamic段的信息就解析完毕了,其中想表达的信息也被处理后放到了soinfo中,后面直接就可以拿来用了
    // 开辟依赖库的soinfo空间,准备处理依赖
    soinfo** needed = (soinfo**) alloca((1 + needed_count) * sizeof(soinfo*));
    soinfo** pneeded = needed;
    //再次遍历Dynamic段
    for (Elf32_Dyn* d = si->dynamic; d->d_tag != DT_NULL; ++d) {
        if (d->d_tag == DT_NEEDED) {
            //查找DT_NEEDED项
            const char* library_name = si->strtab + d->d_un.d_val;
            DEBUG("%s needs %s", si->name, library_name);
            //进行依赖处理,跟加载so一样的路线,还是已加载直接返回,未加载进行查找加载
            soinfo* lsi = find_library(library_name);
            if (lsi == NULL) {
                strlcpy(tmp_err_buf, linker_get_error_buffer(), sizeof(tmp_err_buf));
                DL_ERR("could not load library \"%s\" needed by \"%s\"; caused by %s",
                       library_name, si->name, tmp_err_buf);
                return false;
            }
            *pneeded++ = lsi;
        }
    }
    *pneeded = NULL;
    //至此依赖库也已经加载完毕
 
    //处理重定位
    if (si->plt_rel != NULL) {
        DEBUG("[ relocating %s plt ]", si->name );
        if (soinfo_relocate(si, si->plt_rel, si->plt_rel_count, needed)) {
            return false;
        }
    }
    if (si->rel != NULL) {
        DEBUG("[ relocating %s ]", si->name );
        if (soinfo_relocate(si, si->rel, si->rel_count, needed)) {
            return false;
        }
    }

[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课

收藏
免费 5
支持
分享
最新回复 (0)
游客
登录 | 注册 方可回帖
返回
//