首页
社区
课程
招聘
[原创]LLVM Pass转储类或结构的内存布局
发表于: 2024-12-2 08:56 2620

[原创]LLVM Pass转储类或结构的内存布局

scz 活跃值
5
2024-12-2 08:56
2620

☆ 背景介绍

有次因故需要了解std::string类型内存布局,简单折腾一番,分享了一篇

bluerust随即让我看下面这篇

他的原话是,主要看"llvm pass dump data type"。看了这篇,于我而言,属于"每个字都认识"系列,大概明白其基本原理是啥,但完全不了解所涉及的"LLVM Pass"技术,看过之后,老虎吃天、无处下爪。我不会C++编程,基本未碰上过C++ STL容器逆向需求,不在意上文中那些具体容器的实现细节。我感兴趣的是,如何转储类或结构的内存布局,也就是上文第一部分的内容。原作者有句话,随便简单写个pass来dump,深深刺激了我,别人随便简单弄的东西,代码都给了,我还是不知如何实践。或许有些同道遭遇类似囧境,本文面向"LLVM Pass"小白提供完整可操作示例,聚焦"转储内存布局",是上文降阶后的狗尾续貂、画蛇添足。

☆ dumpclass.cpp

参看

看雪那篇是Legacy格式的"LLVM Pass",此处dumpclass.cpp改写成New格式。支持两个命令行参数,允许成员名中包含相对偏移或绝对偏移,允许过滤类或结构名。

从dumpclass.cpp生成dumpclass.so

后面会演示如何将dumpclass.so用作"LLVM Pass"来转储类或结构的内存布局。

☆ dumptarget.cpp

dumptarget.cpp是假想的目标程序,将来根据dumptarget.cpp转储其中的类或结构。

☆ 用dumpclass.so处理dumptarget.cpp

有多种办法加载dumpclass.so,此处演示其中之一,依次执行这两条命令

先从dumptarget.cpp生成dumptarget.ll,再用dumpclass.so处理dumptarget.ll。正常情况下会得到

尝试不给opt指定passmode、substr参数,观察输出,加强理解。

☆ pahole

pahole也能转储类或结构的内存布局,不如dumpclass.cpp,出于完备性写在此处。

正常情况下会得到

☆ clang -Xclang -fdump-record-layouts

正常情况下会得到

☆ VC有隐藏选项

假设VirtualBaseClass.cpp如下

VC编译时有隐藏选项,查看C++类的内存布局

用ASCII图显示内存布局,向stdout输出,不影响其他编译选项。

创建: 2024-12-01 19:55
 
目录:
 
    ☆ 背景介绍
    ☆ dumpclass.cpp
    ☆ dumptarget.cpp
    ☆ 用dumpclass.so处理dumptarget.cpp
    ☆ pahole
    ☆ clang -Xclang -fdump-record-layouts
    ☆ VC有隐藏选项
创建: 2024-12-01 19:55
 
目录:
 
    ☆ 背景介绍
    ☆ dumpclass.cpp
    ☆ dumptarget.cpp
    ☆ 用dumpclass.so处理dumptarget.cpp
    ☆ pahole
    ☆ clang -Xclang -fdump-record-layouts
    ☆ VC有隐藏选项
《GDB查看结构或类的内存布局及分离终端》
https://scz.617.cn/unix/202411151604.txt
《GDB查看结构或类的内存布局及分离终端》
https://scz.617.cn/unix/202411151604.txt
STL容器逆向与实战 - [2023-02-07]
https://bbs.kanxue.com/thread-275133.htm
STL容器逆向与实战 - [2023-02-07]
https://bbs.kanxue.com/thread-275133.htm
Writing an LLVM Pass (legacy PM version)
https://llvm.org/docs/WritingAnLLVMPass.html
 
Writing an LLVM Pass
https://llvm.org/docs/WritingAnLLVMNewPMPass.html
Writing an LLVM Pass (legacy PM version)
https://llvm.org/docs/WritingAnLLVMPass.html
 
Writing an LLVM Pass
https://llvm.org/docs/WritingAnLLVMNewPMPass.html
#include "llvm/Passes/PassBuilder.h"
#include "llvm/Passes/PassPlugin.h"
#include "llvm/Support/raw_ostream.h"
 
#define DEFAULTSUBSTR   "<default>"
 
using namespace llvm;
 
namespace {
 
static cl::opt<int> passmode
(
"passmode",
cl::desc("absolute offset or not"),
cl::value_desc("int"),
cl::init(0)
);
 
static cl::opt<std::string> substr
(
"substr",
cl::desc("part of struct name"),
cl::value_desc("std::string"),
cl::init(DEFAULTSUBSTR)
);
 
struct DumpClass : PassInfoMixin<DumpClass>
{
 
    std::string getTypeName ( Type *type, const DataLayout &data )
    {
        if ( type->isIntegerTy() )
        {
            IntegerType    *i   = cast<IntegerType>( type );
 
            return "uint" + std::to_string( i->getBitWidth() ) + "_t";
        }
        else if ( type->isPointerTy() )
        {
            PointerType    *ptr = cast<PointerType>( type );
 
            return getTypeName( ptr->getPointerElementType(), data ) + "*";
        }
        else if ( type->isArrayTy() )
        {
            ArrayType      *arr = cast<ArrayType>( type );
 
            return getTypeName( arr->getArrayElementType(), data ) + "[" + std::to_string( arr->getArrayNumElements() ) + "]";
        }
        else if ( type->isFloatTy() )
        {
            return "float";
        }
        else if ( type->isStructTy() )
        {
            StructType     *stc = cast<StructType>( type );
 
            return std::string( stc->getStructName() );
        }
        else
        {
            return "unknown_" + std::to_string( data.getTypeAllocSizeInBits( type ) );
        }
    }
 
    void dumpType ( int depth, Type *type, const std::string &suffix, const DataLayout *data, unsigned base, int mode )
    {
        std::string blank( depth * 4, ' ' );
 
        if ( type->isStructTy() )
        {
            StructType         *stc = cast<StructType>( type );
            const StructLayout *sl  = data->getStructLayout( stc );
 
            errs() << blank + stc->getStructName() + "\n" + blank + "{\n";
            for ( size_t i = 0; i < stc->getStructNumElements(); i++ )
            {
                Type       *subType = stc->getStructElementType( i );
                unsigned    offset  = sl->getElementOffset( i );
                unsigned    size    = data->getTypeAllocSize( subType );
 
                if ( mode > 0 )
                {
                    offset += base;
                    dumpType( depth+1, subType, std::to_string(offset)+"_"+std::to_string(size), data, offset, mode );
                }
                else
                {
                    dumpType( depth+1, subType, std::to_string(offset)+"_"+std::to_string(size), data, 0, mode );
                }
            }
            errs() << blank + "} field_" + suffix + ";\n";
        }
        else
        {
            errs() << blank + getTypeName( type, *data ) + " field_" + suffix + ";\n";
        }
    }
 
    void visitor ( Function &F )
    {
        if ( F.getName() != "main" )
        {
            return;
        }
 
        std::set<StructType*>   types;
        const DataLayout       &data    = F.getParent()->getDataLayout();
 
        for ( auto &B : F )
        {
            for ( auto &I : B )
            {
                if ( auto *A = dyn_cast<AllocaInst>( &I ) )
                {
                    Type   *type    = A->getAllocatedType();
                    if ( type->isStructTy() )
                    {
                        StructType *stc = cast<StructType>( type );
 
                        if ( stc->isOpaque() )
                        {
                            continue;
                        }
                        std::string struct_name
                                        = std::string( stc->getStructName() );
                        if ( substr != DEFAULTSUBSTR && struct_name.find( substr ) == std::string::npos )
                        {
                            continue;
                        }
                        types.insert( stc );
                    }
                }
            }
        }
 
        int                     index = 0;
 
        for ( StructType *type : types )
        {
            dumpType( 0, type, std::to_string( index++ ), &data, 0, passmode );
        }
    }
 
    PreservedAnalyses run ( Function &F, FunctionAnalysisManager &FAM )
    {
        visitor( F );
        return PreservedAnalyses::all();
    }
 
};
 
}
 
PassPluginLibraryInfo getDumpClassPluginInfo ()
{
    const auto  callback = []( PassBuilder &PB )
    {
        PB.registerPipelineParsingCallback
        (
            [](
                StringRef               Name,
                FunctionPassManager    &FPM,
                ArrayRef<PassBuilder::PipelineElement>
            )
            {
                if ( Name == "DumpClass" )
                {
                    FPM.addPass( DumpClass() );
                    return true;
                }
                return false;
            }
        );
        PB.registerPipelineStartEPCallback
        (
            [&]( ModulePassManager &MPM, auto )
            {
                FunctionPassManager FPM;
 
                FPM.addPass( DumpClass() );
                MPM.addPass( createModuleToFunctionPassAdaptor( std::move( FPM ) ) );
                return true;
            }
        );
    };
 
    return { LLVM_PLUGIN_API_VERSION, "DumpClass", LLVM_VERSION_STRING, callback };
}
 
extern "C" LLVM_ATTRIBUTE_WEAK ::llvm::PassPluginLibraryInfo llvmGetPassPluginInfo ()
{
    return getDumpClassPluginInfo();
}
#include "llvm/Passes/PassBuilder.h"
#include "llvm/Passes/PassPlugin.h"
#include "llvm/Support/raw_ostream.h"
 
#define DEFAULTSUBSTR   "<default>"
 
using namespace llvm;
 
namespace {
 
static cl::opt<int> passmode
(
"passmode",
cl::desc("absolute offset or not"),
cl::value_desc("int"),
cl::init(0)
);
 
static cl::opt<std::string> substr
(
"substr",
cl::desc("part of struct name"),
cl::value_desc("std::string"),
cl::init(DEFAULTSUBSTR)
);
 
struct DumpClass : PassInfoMixin<DumpClass>
{
 
    std::string getTypeName ( Type *type, const DataLayout &data )
    {
        if ( type->isIntegerTy() )
        {
            IntegerType    *i   = cast<IntegerType>( type );
 
            return "uint" + std::to_string( i->getBitWidth() ) + "_t";
        }
        else if ( type->isPointerTy() )
        {
            PointerType    *ptr = cast<PointerType>( type );
 
            return getTypeName( ptr->getPointerElementType(), data ) + "*";
        }
        else if ( type->isArrayTy() )
        {
            ArrayType      *arr = cast<ArrayType>( type );
 
            return getTypeName( arr->getArrayElementType(), data ) + "[" + std::to_string( arr->getArrayNumElements() ) + "]";
        }
        else if ( type->isFloatTy() )
        {
            return "float";
        }
        else if ( type->isStructTy() )
        {
            StructType     *stc = cast<StructType>( type );
 
            return std::string( stc->getStructName() );
        }
        else
        {
            return "unknown_" + std::to_string( data.getTypeAllocSizeInBits( type ) );
        }
    }
 
    void dumpType ( int depth, Type *type, const std::string &suffix, const DataLayout *data, unsigned base, int mode )
    {
        std::string blank( depth * 4, ' ' );
 
        if ( type->isStructTy() )
        {
            StructType         *stc = cast<StructType>( type );
            const StructLayout *sl  = data->getStructLayout( stc );
 
            errs() << blank + stc->getStructName() + "\n" + blank + "{\n";
            for ( size_t i = 0; i < stc->getStructNumElements(); i++ )
            {
                Type       *subType = stc->getStructElementType( i );
                unsigned    offset  = sl->getElementOffset( i );
                unsigned    size    = data->getTypeAllocSize( subType );
 
                if ( mode > 0 )
                {
                    offset += base;
                    dumpType( depth+1, subType, std::to_string(offset)+"_"+std::to_string(size), data, offset, mode );
                }
                else
                {
                    dumpType( depth+1, subType, std::to_string(offset)+"_"+std::to_string(size), data, 0, mode );
                }
            }
            errs() << blank + "} field_" + suffix + ";\n";
        }
        else
        {
            errs() << blank + getTypeName( type, *data ) + " field_" + suffix + ";\n";
        }
    }
 
    void visitor ( Function &F )
    {
        if ( F.getName() != "main" )
        {
            return;
        }
 
        std::set<StructType*>   types;
        const DataLayout       &data    = F.getParent()->getDataLayout();
 
        for ( auto &B : F )
        {
            for ( auto &I : B )
            {
                if ( auto *A = dyn_cast<AllocaInst>( &I ) )
                {
                    Type   *type    = A->getAllocatedType();
                    if ( type->isStructTy() )
                    {
                        StructType *stc = cast<StructType>( type );
 
                        if ( stc->isOpaque() )
                        {
                            continue;
                        }
                        std::string struct_name
                                        = std::string( stc->getStructName() );
                        if ( substr != DEFAULTSUBSTR && struct_name.find( substr ) == std::string::npos )
                        {
                            continue;
                        }
                        types.insert( stc );
                    }
                }
            }
        }
 
        int                     index = 0;
 
        for ( StructType *type : types )
        {
            dumpType( 0, type, std::to_string( index++ ), &data, 0, passmode );
        }
    }
 
    PreservedAnalyses run ( Function &F, FunctionAnalysisManager &FAM )
    {
        visitor( F );
        return PreservedAnalyses::all();
    }
 
};
 
}
 
PassPluginLibraryInfo getDumpClassPluginInfo ()
{
    const auto  callback = []( PassBuilder &PB )
    {
        PB.registerPipelineParsingCallback
        (
            [](
                StringRef               Name,

[招生]科锐逆向工程师培训(2024年11月15日实地,远程教学同时开班, 第51期)

收藏
免费 3
支持
分享
最新回复 (3)
雪    币: 1372
活跃值: (5338)
能力值: ( LV13,RANK:240 )
在线值:
发帖
回帖
粉丝
2

现在大量都是c/c++的尤其是std 容器以及template,包括Boost的template。rolf做了一个ida还原,针对特定版本的std template识别,但是他想商业化这个插件。没找到合适的买家。


再说了大佬。c++的核心特性 和c结构体+函数指针差别也不大啊

最后于 2024-12-2 09:25 被IamHuskar编辑 ,原因:
2024-12-2 09:22
0
雪    币: 10
能力值: ( LV1,RANK:0 )
在线值:
发帖
回帖
粉丝
3
x86的结构体还原插件还挺多的,arm的没有
2024-12-2 13:04
0
雪    币: 1372
活跃值: (5338)
能力值: ( LV13,RANK:240 )
在线值:
发帖
回帖
粉丝
4
mb_ldbucrik x86的结构体还原插件还挺多的,arm的没有
是吗?除了hexrayscodexplorer和HexRaysPyTools 还有牛逼的推荐吗?我没见过了
2024-12-2 19:56
0
游客
登录 | 注册 方可回帖
返回
//