首页
社区
课程
招聘
[原创] 细说So动态库的加载流程
发表于: 2019-11-17 16:52 20265

[原创] 细说So动态库的加载流程

2019-11-17 16:52
20265

博客:www.wireghost.cn

dlopen用来打开一个动态链接库,并将其装入内存。它的定义在Android源码中的路径为/bionic/linker/dlfcn.cpp,执行流程如下:

其核心代码在do_dlopen中实现,根据传入的路径或文件名去查找一个动态库,并执行该动态链接库的初始化代码。

再来看find_library这个方法,它会先在solist(已经加载的动态链接库链表)里进行查找,如果找到了就返回对应的soinfo结构体指针。否则,就调用load_library进行加载。然后,调用soinfo_link_image方法,根据soinfo结构体解析相应的Section。

load_library调用open_library打开一个动态链接库,返回一个句柄,将其与共享库所在的路径作为参数,对ElfReader进行初始化。


ElfReader作用域中的Load函数,会执行以下操作:

bool ElfReader::ReadElfHeader() {
ssize_t rc = TEMP_FAILURERETRY(read(fd, &header, sizeof(header)));
if (rc < 0) {
DLERR("can't read file \"%s\": %s", name, strerror(errno));
return false;
}
if (rc != sizeof(header_)) {
DLERR("\"%s\" is too small to be an ELF executable", name);
return false;
}
return true;
}

<center><font color=#006666 size=3 face="黑体">读ELF程序头,并映射到内存</font></center>

<center><font color=#006666 size=3 face="黑体">调用mmap申请一块足够大的内存空间,为后面进行映射Load段的映射做准备</font></center>

<center><font color=#006666 size=3 face="黑体">将类型为Load的Segment映射到内存</font></center>
接下来,soinfo_alloc方法会为该库在共享库链表中分配一个soinfo节点,并初始化其数据结构。

再回过头来看下soinfo_link_image这个方法,它主要实现了动态链接库中section信息的解析:

完成rel节的重定位;

最后,CallConstructors函数会根据动态节区中的信息,获取该共享库所依赖的所有so文件名,并在已加载的动态链接库链表中进行查找、递归调用它们的初始化函数。当运行所需的依赖库都初始化完成后,再执行init_func、init_array方法初始化该动态库。。

Java层通过System.load或System.loadLibrary来加载一个so文件,它的定义在Android源码中的路径为/libcore/luni/src/main/java/java/lang/System.java,执行流程如下:

接下来,让我们具体看下System.loadLibrary这个方法的实现。可以发现它实际是先通过VMStack.getCallingClassLoader()获取到ClassLoader,然后调用运行时的loadLibrary。

以上代码块的主要功能为:

然而,这里其实又牵扯出了几个问题:首先,可用的library path都是哪些?这实际上也决定了我们的so文件放在哪些目录下,才可以真正的被load起来。其次,在native层的nativeLoad又是如何实现加载的?下面会对这两个问题,逐一分析介绍。。

先来看看当传入的ClassLoader为空的情况(执行System.loadLibrary时并不会发生),那么就需要关注下mLibPaths的赋值,相应代码如下:

这里library path list实际上读取自一个system property,直接到System.java下查看初始化代码,它其实是LD_LIBRARY_PATH环境变量的值,具体内容可以查看注释,为"/vendor/lib:/system/lib"


然后再来看下传入的ClassLoader非空的情况,也就是ClassLoader的findLibrary的执行过程。

结果发现竟然是一个空函数,而ClassLoader本身也只是个抽象类,那系统中实际运行的ClassLoader是哪个呢?这里可以写个小程序,将实际运行的ClassLoader输出:


于是,得知android系统中ClassLoader真正的实现在dalvik.system.PathClassLoader。此外,在这条日志中,还顺带将PathClassLoader初始化的参数一同打印了出来。其中,libraryPath为"/data/app-lib/elf.xuexi-1"..
不过PathClassLoader只是继承 BaseDexClassLoader,并没有实际内容。

继续到BaseDexClassLoader下看findLibrary的实现。


可以看到,这里又是在调用DexPathList类下的findLibrary。关注splitLibraryPath方法,它返回了需要加载的动态库所在目录。


这里简单说下splitLibraryPath方法的作用,它是根据传进来的libraryPath和system property中"java.library.path"的属性值即“/vendor/lib:/system/lib”来构造出要加载的动态库所在目录列表。

现在可以对动态链接库的加载路径做个总结了,系统默认的目录为"/vendor/lib"和"/system/lib"。当使用System.loadLibrary或System.load来加载一个共享库的时候,会将VM栈中的ClassLoader传入。之后调用findLibrary方法,在两个目录中去寻找指定的so文件:一个是构造ClassLoader时,传进来的那个libraryPath;另一个则是system property中"java.library.path"的属性值。也就是说,实际上是会在如下的3个目录中进行查找:

对于"/data/app-lib/包名-n"这个路径,大家可能会比较陌生,但应该都知道"/data/data/包名/lib"目录,这里就简单讲解下apk安装过程中的一点细节,以说明二者之间的关系(在Android源码中的路径为"/frameworks/native/cmds/installd/commands.c")

以上代码会先构造几个目录名:pkgdir为"/data/data/包名",libsymlink为"/data/data/包名/lib",applibdir为"/data/app-lib/包名"。然后创建相应目录,并赋权限。之后,建立"/data/data/包名/lib"指向"/data/app-lib/包名"的符号链接。
现在再回过头来说明下"/data/app-lib/包名-n"、"/data/data/包名/lib"这二者之间的关系。在"/data/data/包名/"目录下执行ls –l命令,就会发现lib是一个链接,So其实是放在"/data/app-lib/包名-n"路径下的。。

doLoad实际上是调用本地的nativeLoad方法,nativeLoad会先更新LD_LIBRARY_PATH,然后执行dvmLoadNativeCode函数,真正实现so文件的加载。。

dvmLoadNativeCode定义在Android源码中的路径为/dalvik/vm/Native.cpp,它的主要功能如下:

在了解So在内存中的加载原理后,可以得知以下几点:

 
 
void* dlopen(const char* filename, int flags) {
  ScopedPthreadMutexLocker locker(&gDlMutex);
  soinfo* result = do_dlopen(filename, flags);
  if (result == NULL) {
    __bionic_format_dlerror("dlopen failed", linker_get_error_buffer());
    return NULL;
  }
  return result;
}

soinfo* do_dlopen(const char* name, int flags) {
  if ((flags & ~(RTLD_NOW|RTLD_LAZY|RTLD_LOCAL|RTLD_GLOBAL)) != 0) {
    DL_ERR("invalid flags to dlopen: %x", flags);
    return NULL;
  }
  set_soinfo_pool_protection(PROT_READ | PROT_WRITE);
  soinfo* si = find_library(name);
  if (si != NULL) {
    si->CallConstructors();
  }
  set_soinfo_pool_protection(PROT_READ);
  return si;
}
static soinfo *find_loaded_library(const char *name)
{
    soinfo *si;
    const char *bname;

    // TODO: don't use basename only for determining libraries
    // http://code.google.com/p/android/issues/detail?id=6670

    bname = strrchr(name, '/');
    bname = bname ? bname + 1 : name;

    for (si = solist; si != NULL; si = si->next) {
        if (!strcmp(bname, si->name)) {
            return si;
        }
    }
    return NULL;
}

static soinfo* find_library_internal(const char* name) {
  if (name == NULL) {
    return somain;
  }

  soinfo* si = find_loaded_library(name);
  if (si != NULL) {
    if (si->flags & FLAG_LINKED) {
      return si;
    }
    DL_ERR("OOPS: recursive link to \"%s\"", si->name);
    return NULL;
  }

  TRACE("[ '%s' has not been loaded yet.  Locating...]", name);
  si = load_library(name);
  if (si == NULL) {
    return NULL;
  }

  // At this point we know that whatever is loaded @ base is a valid ELF
  // shared library whose segments are properly mapped in.
  TRACE("[ init_library base=0x%08x sz=0x%08x name='%s' ]",
        si->base, si->size, si->name);

  if (!soinfo_link_image(si)) {
    munmap(reinterpret_cast<void*>(si->base), si->size);
    soinfo_free(si);
    return NULL;
  }

  return si;
}

static soinfo* find_library(const char* name) {
  soinfo* si = find_library_internal(name);
  if (si != NULL) {
    si->ref_count++;
  }
  return si;
}
  VerifyElfHeader() &&
  ReadProgramHeader() &&
  ReserveAddressSpace() &&
  LoadSegments() &&
  FindPhdr();
<center><font color=#006666 size=3 face="黑体">**读ELF文件头**</font></center>
``` C++
// Loads the program header table from an ELF file into a read-only private
// anonymous mmap-ed block.
bool ElfReader::ReadProgramHeader() {
  phdr_num_ = header_.e_phnum;

  // Like the kernel, we only accept program header tables that
  // are smaller than 64KiB.
  if (phdr_num_ < 1 || phdr_num_ > 65536/sizeof(Elf32_Phdr)) {
    DL_ERR("\"%s\" has invalid e_phnum: %d", name_, phdr_num_);
    return false;
  }

  Elf32_Addr page_min = PAGE_START(header_.e_phoff);  //页的起始地址
  Elf32_Addr page_max = PAGE_END(header_.e_phoff + (phdr_num_ * sizeof(Elf32_Phdr)));  //页的结束地址
  Elf32_Addr page_offset = PAGE_OFFSET(header_.e_phoff);  //程序头部在页中的偏移

  phdr_size_ = page_max - page_min;

  void* mmap_result = mmap(NULL, phdr_size_, PROT_READ, MAP_PRIVATE, fd_, page_min);  //将程序头映射到内存
  if (mmap_result == MAP_FAILED) {
    DL_ERR("\"%s\" phdr mmap failed: %s", name_, strerror(errno));
    return false;
  }

  phdr_mmap_ = mmap_result;
  phdr_table_ = reinterpret_cast<Elf32_Phdr*>(reinterpret_cast<char*>(mmap_result) + page_offset);  //程序头表在内存中的地址
  return true;
}
// Reserve a virtual address range big enough to hold all loadable
// segments of a program header table. This is done by creating a
// private anonymous mmap() with PROT_NONE.
bool ElfReader::ReserveAddressSpace() {
  Elf32_Addr min_vaddr;
  load_size_ = phdr_table_get_load_size(phdr_table_, phdr_num_, &min_vaddr);  //根据页对齐来计算Load段所占用的大小
  if (load_size_ == 0) {
    DL_ERR("\"%s\" has no loadable segments", name_);
    return false;
  }

  uint8_t* addr = reinterpret_cast<uint8_t*>(min_vaddr);
  int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS;  //匿名私有
  void* start = mmap(addr, load_size_, PROT_NONE, mmap_flags, -1, 0);  //调用mmap为动态库分配一块内存空间
  if (start == MAP_FAILED) {
    DL_ERR("couldn't reserve %d bytes of address space for \"%s\"", load_size_, name_);
    return false;
  }

  load_start_ = start;
  load_bias_ = reinterpret_cast<uint8_t*>(start) - addr;  //真实的加载地址与计算出来的(读ELF程序头中的p_vaddr)加载地址之差
  return true;
}
// Map all loadable segments in process' address space.
// This assumes you already called phdr_table_reserve_memory to
// reserve the address space range for the library.
// TODO: assert assumption.
bool ElfReader::LoadSegments() {
  for (size_t i = 0; i < phdr_num_; ++i) {
    const Elf32_Phdr* phdr = &phdr_table_[i];

    if (phdr->p_type != PT_LOAD) {
      continue;
    }

    // Segment addresses in memory.
    Elf32_Addr seg_start = phdr->p_vaddr + load_bias_;
    Elf32_Addr seg_end   = seg_start + phdr->p_memsz;

    Elf32_Addr seg_page_start = PAGE_START(seg_start);
    Elf32_Addr seg_page_end   = PAGE_END(seg_end);

    Elf32_Addr seg_file_end   = seg_start + phdr->p_filesz;

    // File offsets.
    Elf32_Addr file_start = phdr->p_offset;
    Elf32_Addr file_end   = file_start + phdr->p_filesz;

    Elf32_Addr file_page_start = PAGE_START(file_start);
    Elf32_Addr file_length = file_end - file_page_start;

    if (file_length != 0) {
      void* seg_addr = mmap((void*)seg_page_start,          //将Load Segment映射到内存,大小为在ELF文件中所占用的长度
                            file_length,
                            PFLAGS_TO_PROT(phdr->p_flags),
                            MAP_FIXED|MAP_PRIVATE,
                            fd_,
                            file_page_start);
      if (seg_addr == MAP_FAILED) {
        DL_ERR("couldn't map \"%s\" segment %d: %s", name_, i, strerror(errno));
        return false;
      }
    }

    // if the segment is writable, and does not end on a page boundary,
    // zero-fill it until the page limit.
    if ((phdr->p_flags & PF_W) != 0 && PAGE_OFFSET(seg_file_end) > 0) {
      memset((void*)seg_file_end, 0, PAGE_SIZE - PAGE_OFFSET(seg_file_end));  //如果这块Segment是可写的,且在内存中的结束地址不在页的边界上,则将后面的数据都填充0
    }

    seg_file_end = PAGE_END(seg_file_end);

    // seg_file_end is now the first page address after the file
    // content. If seg_end is larger, we need to zero anything
    // between them. This is done by using a private anonymous
    // map for all extra pages.
    if (seg_page_end > seg_file_end) {
      void* zeromap = mmap((void*)seg_file_end,                 //如果seg_end大于它在文件中的长度,则继续为多出的那部分申请内存空间,并填充0。这里应该是主要针对bss段
                           seg_page_end - seg_file_end,
                           PFLAGS_TO_PROT(phdr->p_flags),
                           MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
                           -1,
                           0);
      if (zeromap == MAP_FAILED) {
        DL_ERR("couldn't zero fill \"%s\" gap: %s", name_, strerror(errno));
        return false;
      }
    }
  }
  return true;
}
static soinfo* load_library(const char* name) {
    // Open the file.
    int fd = open_library(name);
    if (fd == -1) {
        DL_ERR("library \"%s\" not found", name);
        return NULL;
    }

    // Read the ELF header and load the segments.
    ElfReader elf_reader(name, fd);
    if (!elf_reader.Load()) {
        return NULL;
    }

    const char* bname = strrchr(name, '/');
    soinfo* si = soinfo_alloc(bname ? bname + 1 : name);
    if (si == NULL) {
        return NULL;
    }
    si->base = elf_reader.load_start();
    si->size = elf_reader.load_size();
    si->load_bias = elf_reader.load_bias();
    si->flags = 0;
    si->entry = 0;
    si->dynamic = NULL;
    si->phnum = elf_reader.phdr_count();
    si->phdr = elf_reader.loaded_phdr();
    return si;
}

static soinfo* soinfo_alloc(const char* name) {
  if (strlen(name) >= SOINFO_NAME_LEN) {
    DL_ERR("library name \"%s\" too long", name);
    return NULL;
  }

  if (!ensure_free_list_non_empty()) {
    DL_ERR("out of memory when loading \"%s\"", name);
    return NULL;
  }

  // Take the head element off the free list.
  soinfo* si = gSoInfoFreeList;
  gSoInfoFreeList = gSoInfoFreeList->next;

  // Initialize the new element.
  memset(si, 0, sizeof(soinfo));
  strlcpy(si->name, name, sizeof(si->name));
  sonext->next = si;
  sonext = si;

  TRACE("name %s: allocated soinfo @ %p", name, si);
  return si;
}
void soinfo::CallConstructors() {
if (constructors_called) {
 return;
}

// We set constructors_called before actually calling the constructors, otherwise it doesn't
// protect against recursive constructor calls. One simple example of constructor recursion
// is the libc debug malloc, which is implemented in libc_malloc_debug_leak.so:
// 1. The program depends on libc, so libc's constructor is called here.
// 2. The libc constructor calls dlopen() to load libc_malloc_debug_leak.so.
// 3. dlopen() calls the constructors on the newly created
//    soinfo for libc_malloc_debug_leak.so.
// 4. The debug .so depends on libc, so CallConstructors is
//    called again with the libc soinfo. If it doesn't trigger the early-
//    out above, the libc constructor will be called again (recursively!).
constructors_called = true;

if ((flags & FLAG_EXE) == 0 && preinit_array != NULL) {
 // The GNU dynamic linker silently ignores these, but we warn the developer.
 PRINT("\"%s\": ignoring %d-entry DT_PREINIT_ARRAY in shared library!",
       name, preinit_array_count);
}

if (dynamic != NULL) {
 for (Elf32_Dyn* d = dynamic; d->d_tag != DT_NULL; ++d) {
   if (d->d_tag == DT_NEEDED) {
     const char* library_name = strtab + d->d_un.d_val;
     TRACE("\"%s\": calling constructors in DT_NEEDED \"%s\"", name, library_name);
     find_loaded_library(library_name)->CallConstructors();
   }
 }
}

TRACE("\"%s\": calling constructors", name);

// DT_INIT should be called before DT_INIT_ARRAY if both are present.
CallFunction("DT_INIT", init_func);
CallArray("DT_INIT_ARRAY", init_array, init_array_count, false);
}
/**
 * Loads and links the library with the specified name. The mapping of the
 * specified library name to the full path for loading the library is
 * implementation-dependent.
 *
 * @param libName
 *            the name of the library to load.
 * @throws UnsatisfiedLinkError
 *             if the library can not be loaded.
 */
public void loadLibrary(String libName) {
    loadLibrary(libName, VMStack.getCallingClassLoader());
}

/*
 * Searches for a library, then loads and links it without security checks.
 */
void loadLibrary(String libraryName, ClassLoader loader) {
    if (loader != null) {
        String filename = loader.findLibrary(libraryName);
        if (filename == null) {
            throw new UnsatisfiedLinkError("Couldn't load " + libraryName +
                                           " from loader " + loader +
                                           ": findLibrary returned null");
        }
        String error = doLoad(filename, loader);
        if (error != null) {
            throw new UnsatisfiedLinkError(error);
        }
        return;
    }

    String filename = System.mapLibraryName(libraryName);
    List<String> candidates = new ArrayList<String>();
    String lastError = null;
    for (String directory : mLibPaths) {
        String candidate = directory + filename;
        candidates.add(candidate);

        if (IoUtils.canOpenReadOnly(candidate)) {
            String error = doLoad(candidate, loader);
            if (error == null) {
                return; // We successfully loaded the library. Job done.
            }
            lastError = error;
        }
    }

    if (lastError != null) {
        throw new UnsatisfiedLinkError(lastError);
    }
    throw new UnsatisfiedLinkError("Library " + libraryName + " not found; tried " + candidates);
}
private static final Runtime mRuntime = new Runtime();

/**
 * Holds the library paths, used for native library lookup.
 */
private final String[] mLibPaths = initLibPaths();

private static String[] initLibPaths() {
    String javaLibraryPath = System.getProperty("java.library.path");
    if (javaLibraryPath == null) {
        return EmptyArray.STRING;
    }
    String[] paths = javaLibraryPath.split(":");
    // Add a '/' to the end of each directory so we don't have to do it every time.
    for (int i = 0; i < paths.length; ++i) {
        if (!paths[i].endsWith("/")) {
            paths[i] += "/";
        }
    }
    return paths;
}
/**
 * Returns the absolute path of the native library with the specified name,
 * or {@code null}. If this method returns {@code null} then the virtual
 * machine searches the directories specified by the system property
 * "java.library.path".
 * <p>
 * This implementation always returns {@code null}.
 * </p>
 *
 * @param libName
 *            the name of the library to find.
 * @return the absolute path of the library.
 */
protected String findLibrary(String libName) {
    return null;
}

[招生]科锐逆向工程师培训(2024年11月15日实地,远程教学同时开班, 第51期)

收藏
免费 12
支持
分享
最新回复 (9)
雪    币: 268
活跃值: (625)
能力值: ( LV3,RANK:35 )
在线值:
发帖
回帖
粉丝
2
棒棒哒
2019-11-18 11:07
0
雪    币: 276
活跃值: (167)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
3
能够实现和windwos一样的loader吗?
2019-11-18 17:15
0
雪    币: 180
活跃值: (1313)
能力值: ( LV4,RANK:40 )
在线值:
发帖
回帖
粉丝
4
大佬怎么突然发飙了
2019-11-19 09:16
0
雪    币: 1380
活跃值: (1626)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
5
摩拜一下
2019-11-27 21:40
0
雪    币: 154
活跃值: (3786)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
6
很棒,通过源码来分析整个加载流程很厉害。
2021-4-7 15:40
0
雪    币: 116
活跃值: (1012)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
7
666
2021-4-11 11:43
0
雪    币: 737
能力值: ( LV1,RANK:0 )
在线值:
发帖
回帖
粉丝
8
很强啊
2023-6-18 16:42
0
雪    币: 61
活跃值: (1175)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
9
666
2023-8-16 09:45
0
雪    币: 15
能力值: ( LV1,RANK:0 )
在线值:
发帖
回帖
粉丝
10
学习一下
2023-8-17 10:04
0
游客
登录 | 注册 方可回帖
返回
//