[原创]利用官方工具生成IDAPython本地doc-编程技术-看雪-安全社区|安全招聘|kanxue.com

[原创]利用官方工具生成IDAPython本地doc

发表于: 2024-11-25 11:11 3182

[原创]利用官方工具生成IDAPython本地doc

8_5_0

2024-11-25 11:11

3182

IDAPython是IDA中一个很重要的工具，可以让用户使用python脚本来操作IDA实现各种各样的操作。但是IDAPython不同版本之间差异很大，每次发布IDA新版本都会作废一批旧的接口并引入新的函数，这使得IDAPytho使用起来非常依赖文档。Hexrays提供的在线文档访问较慢而且笔者由于工作原因经常需要在离线的情况下使用，所以萌生了生成离线doc的想法。于是折腾了一晚上的时间，终于搞定，特此记录一下折腾过程，以供有相同需要的朋友参考。
一开始想着使用HTTrack直接镜像一份官方的doc文档不就好了么，结果发现镜像站中跳转全都乱掉了，完全没有改的欲望，于是放弃。几经周转，找到了 官方Repo ，并且在repo的tools/docs目录下找到一个hrdoc.py文件，看起来是开发者自己生成doc用的。
脚本需要提供5个参数，分别为
同时，在repo根目录下的makefile里docs目标提供了用法
简单来说就是，-o参数指定输出目录，-m参数跟idapython模块(以逗号","分隔)，-s -x 参数照抄makefile里提供的命令。但是直接运行会有各种坑。
官方doc生成脚本依赖一个三方库pdoc，但是不能直接用pip安装，需要clone下来。在仓库根目录新建third-party文件夹，clone pdoc。
同时修改脚本来解决import路径问题
修改后的脚本如下
因为脚本需要提供参数，因此无法在ida图形界面中的Script file执行，需要以命令行的方式执行脚本。笔者的环境是osx其他环境可以微调一下。
首先进入ida可执行文件的目录
使用命令
目前没找到工具将pdoc生成的文件转化成docset格式文件，因此暂时只能用html2dash工具来导入。
然后dash手动导入即可。
最后附上一份生成好的离线文档。
DOCS_MODULES=$(foreach mod,$(MODULES_NAMES),ida_$(mod))

SORTED_DOCS_MODULES=$(sort $(DOCS_MODULES))

docs:   tools/docs/hrdoc.py tools/docs/hrdoc.css
ifndef __NT__

    $(IDAT_CMD) $(BATCH_SWITCH) -S"tools/docs/hrdoc.py -o docs/hr-html -m $(subst $(space),$(comma),$(SORTED_DOCS_MODULES)),idc,idautils -s idc,idautils -x ida_allins" -t > /dev/null
#   $(IDAT_CMD) $(BATCH_SWITCH) -S"tools/docs/hrdoc.py -o docs/hr-html -m ida_pro,ida_kernwin -s idc,idautils -x ida_allins" -t > /dev/null  # use this one for testing (faster)
else

    $(R)ida -Stools/docs/hrdoc.py -t
endif
DOCS_MODULES=$(foreach mod,$(MODULES_NAMES),ida_$(mod))

SORTED_DOCS_MODULES=$(sort $(DOCS_MODULES))

docs:   tools/docs/hrdoc.py tools/docs/hrdoc.css
ifndef __NT__

    $(IDAT_CMD) $(BATCH_SWITCH) -S"tools/docs/hrdoc.py -o docs/hr-html -m $(subst $(space),$(comma),$(SORTED_DOCS_MODULES)),idc,idautils -s idc,idautils -x ida_allins" -t > /dev/null
#   $(IDAT_CMD) $(BATCH_SWITCH) -S"tools/docs/hrdoc.py -o docs/hr-html -m ida_pro,ida_kernwin -s idc,idautils -x ida_allins" -t > /dev/null  # use this one for testing (faster)
else

    $(R)ida -Stools/docs/hrdoc.py -t
endif

mkdir third-party

cd third-party

git clone https://github.com/pdoc3/pdoc

mkdir third-party

cd third-party

git clone https://github.com/pdoc3/pdoc

from __future__ import print_function

import os

import sys

import shutil

import json

from glob import glob

from typing import Dict, List

from functools import lru_cache
 
tools_docs_path = os.path.abspath(os.path.dirname(__file__))

idapython_path = os.path.abspath(os.path.join(tools_docs_path, "..", ".."))
# idasrc_path = os.path.abspath(os.path.join(idapython_path, "..", "..", ".."))

idasrc_path = idapython_path
 
import idc
 
from argparse import ArgumentParser

parser = ArgumentParser()

parser.add_argument("-o", "--output", required=True)

parser.add_argument("-m", "--modules", required=True)

parser.add_argument("-s", "--include-source-for-modules", required=True)

parser.add_argument("-x", "--exclude-modules-from-searchable-index", required=True)

parser.add_argument("-v", "--verbose", default=False, action="store_true")
 
args = parser.parse_args(idc.ARGV[1:])
 
args.modules = args.modules.split(",")

args.include_source_for_modules = args.include_source_for_modules.split(",")

args.exclude_modules_from_searchable_index = args.exclude_modules_from_searchable_index.split(",")
 
try:
# pdoc location

    pdoc_path = os.path.join(idasrc_path, "third_party", "pdoc")

    sys.path.append(pdoc_path)

    sys.path.append(tools_docs_path)
# for the custom epytext

    import pdoc

except ImportError as e:

    import traceback

    idc.msg("Couldn't import module %s\n" % traceback.format_exc())

    idc.qexit(-1)
 
# --------------------------------------------------------------------------

def gen_docs():

    sys.path.insert(0, os.path.join(idapython_path, "tools"))

    # trash existing doc

    if os.path.isdir(args.output):

        shutil.rmtree(args.output)
 
    # generate new doc

    build_documentation()
 
# --------------------------------------------------------------------------
# This is a ripoff of pdoc's cli.py, w/ minor adjustments

def gen_lunr_search(modules: List[pdoc.Module],

                          index_docstrings: bool,

                          template_config: dict):

    """Generate index.js for search"""
 
    def trim_docstring(docstring):

        return re.sub(r'''

            \s+|                   # whitespace sequences

            \s+[-=~]{3,}\s+|       # title underlines

            ^[ \t]*[`~]{3,}\w*$|   # code blocks

            \s*[`#*]+\s*|          # common markdown chars

            \s*([^\w\d_>])\1\s*|   # sequences of punct of the same kind

            \s*</?\w*[^>]*>\s*     # simple HTML tags

        ''', ' ', docstring, flags=re.VERBOSE | re.MULTILINE)
 
    def recursive_add_to_index(dobj):

        info = {

            'ref': dobj.refname,

            'url': to_url_id(dobj.module),

        }

        if index_docstrings:

            info['doc'] = trim_docstring(dobj.docstring)

        if isinstance(dobj, pdoc.Function):

            info['func'] = 1

        index.append(info)

        for member_dobj in getattr(dobj, 'doc', {}).values():

            recursive_add_to_index(member_dobj)
 
    @lru_cache()

    def to_url_id(module):

        url = module.url()

        if url not in url_cache:

            url_cache[url] = len(url_cache)

        return url_cache[url]
 
    index: List[Dict] = []

    url_cache: Dict[str, int] = {}

    for top_module in modules:

        recursive_add_to_index(top_module)

    urls = sorted(url_cache.keys(), key=url_cache.__getitem__)
 
    main_path = args.output

    with open(os.path.join(main_path, 'index.js'), "w", encoding="utf-8") as f:

        f.write("URLS=")

        json.dump(urls, f, indent=0, separators=(',', ':'))

        f.write(";\nINDEX=")

        json.dump(index, f, indent=0, separators=(',', ':'))
 
    # Generate search.html

    with open(os.path.join(main_path, 'doc-search.html'), "w", encoding="utf-8") as f:

        rendered_template = pdoc._render_template('/search.mako', **template_config)

        f.write(rendered_template)
 
# --------------------------------------------------------------------------

def build_documentation():
 
    # import all modules

    def docfilter(obj):

        # print("OBJ: %s" % str(obj))

        if obj.name in [

                "thisown",

                "SWIG_PYTHON_LEGACY_BOOL",

        ]:

            return False

        return True
 
    modules = []

    for module in args.modules:

        print("Loading: %s" % module)

        modules.append(pdoc.Module(module, docfilter=docfilter))
 
    print("  {} module{} in the list.".format(

          len(modules), "" if len(modules) == 1 else "s"))
 
    pdoc.link_inheritance()
 
    #

    # ida_*.html

    #

    pdoc.tpl_lookup.directories.insert(0, os.path.join(tools_docs_path, "templates"))

    show_source_code = set(args.include_source_for_modules)
 
    def all_modules(module_collection):

        for module in module_collection:

            yield module
 
            yield from all_modules(module.submodules())
 
    for module in all_modules(modules):

        module.obj.__docformat__ = "hr_epy"
 
        print("Processing: %s" % module.name)

        html = module.html(

            show_source_code=module.name in show_source_code,

            search_prefix=module.name)
 
        path = os.path.join(args.output, module.url())

        dirname = os.path.dirname(path)

        os.makedirs(dirname, exist_ok=True)
 
        print("Writing: %s" % path)

        with open(path, "w", encoding="utf-8") as f:

            f.write(html)
 
    #

    # doc-search.html, index.js

    #

    template_config = {}

    gen_lunr_search(

        [mod for mod in modules if mod.name not in args.exclude_modules_from_searchable_index],

        index_docstrings=True,

        template_config=pdoc._get_config(**template_config).get('lunr_search'))
 
    #

    # index.html

    #

    path = os.path.join(args.output, "index.html")

    class fake_module_t(object):

        def __init__(self, name, url):

            self.name = name

            self._url = url

        def url(self):

            return self._url
 
    index_module = fake_module_t("index", "index.html")

    html = pdoc._render_template('/index.mako', module=index_module, modules=modules)

    with open(path, "w", encoding="utf-8") as f:

        f.write(html)
 
# --------------------------------------------------------------------------

def main():

    print("Generating documentation.....")

    gen_docs()

    print("Documentation generated!")
 
# --------------------------------------------------------------------------

if __name__ == "__main__":

    main()

    qexit(0)

from __future__ import print_function

import os

import sys

import shutil

import json

from glob import glob

from typing import Dict, List

from functools import lru_cache
 
tools_docs_path = os.path.abspath(os.path.dirname(__file__))

idapython_path = os.path.abspath(os.path.join(tools_docs_path, "..", ".."))
# idasrc_path = os.path.abspath(os.path.join(idapython_path, "..", "..", ".."))

idasrc_path = idapython_path
 
import idc
 
from argparse import ArgumentParser

parser = ArgumentParser()

parser.add_argument("-o", "--output", required=True)

parser.add_argument("-m", "--modules", required=True)

parser.add_argument("-s", "--include-source-for-modules", required=True)

parser.add_argument("-x", "--exclude-modules-from-searchable-index", required=True)

parser.add_argument("-v", "--verbose", default=False, action="store_true")
 
args = parser.parse_args(idc.ARGV[1:])
 
args.modules = args.modules.split(",")

args.include_source_for_modules = args.include_source_for_modules.split(",")

args.exclude_modules_from_searchable_index = args.exclude_modules_from_searchable_index.split(",")
 
try:
# pdoc location

    pdoc_path = os.path.join(idasrc_path, "third_party", "pdoc")

    sys.path.append(pdoc_path)

    sys.path.append(tools_docs_path)
# for the custom epytext

    import pdoc

except ImportError as e:

    import traceback

    idc.msg("Couldn't import module %s\n" % traceback.format_exc())

    idc.qexit(-1)
 
# --------------------------------------------------------------------------

def gen_docs():

    sys.path.insert(0, os.path.join(idapython_path, "tools"))

    # trash existing doc

    if os.path.isdir(args.output):

        shutil.rmtree(args.output)
 
    # generate new doc

    build_documentation()
 
# --------------------------------------------------------------------------
# This is a ripoff of pdoc's cli.py, w/ minor adjustments

def gen_lunr_search(modules: List[pdoc.Module],

                          index_docstrings: bool,

                          template_config: dict):

    """Generate index.js for search"""
 
    def trim_docstring(docstring):

        return re.sub(r'''

            \s+|                   # whitespace sequences

            \s+[-=~]{3,}\s+|       # title underlines

            ^[ \t]*[`~]{3,}\w*$|   # code blocks

            \s*[`#*]+\s*|          # common markdown chars

            \s*([^\w\d_>])\1\s*|   # sequences of punct of the same kind

            \s*</?\w*[^>]*>\s*     # simple HTML tags

        ''', ' ', docstring, flags=re.VERBOSE | re.MULTILINE)
 
    def recursive_add_to_index(dobj):

        info = {

            'ref': dobj.refname,

            'url': to_url_id(dobj.module),

        }

        if index_docstrings:

            info['doc'] = trim_docstring(dobj.docstring)

        if isinstance(dobj, pdoc.Function):

            info['func'] = 1

        index.append(info)

        for member_dobj in getattr(dobj, 'doc', {}).values():

            recursive_add_to_index(member_dobj)
 
    @lru_cache()

    def to_url_id(module):

        url = module.url()

        if url not in url_cache:

            url_cache[url] = len(url_cache)

        return url_cache[url]
 
    index: List[Dict] = []

    url_cache: Dict[str, int] = {}

    for top_module in modules:

				登录后可查看完整内容
			
[培训]内核驱动高级班，冲击BAT一流互联网大厂工作，每周日13:00-18:00直播授课

		#工具脚本
	
上传的附件：

			IDAPython_doc.zip
		
		（1.37MB，37次下载）

收藏・12

免费・5

支持

最新回复 (4)
VirtualCC 雪币： 1907 活跃值： (6059) 能力值： ( LV7，RANK：116 ) 在线值：发帖 43 回帖 141 粉丝 76 关注私信	VirtualCC 2 楼实用 2024-12-3 09:23 0
wandering 雪币： 1913 活跃值： (7574) 能力值： ( LV2，RANK：10 ) 在线值：发帖 42 回帖 525 粉丝 35 关注私信	wandering 3 楼下载使用，谢谢！ 2024-12-3 10:09 0
墨穹呢雪币： 2299 活跃值： (3012) 能力值： ( LV3，RANK：20 ) 在线值：发帖 2 回帖 64 粉丝 16 关注私信	墨穹呢 4 楼感谢分享 2024-12-3 10:14 0
猫子雪币： 439 活跃值： (1223) 能力值： ( LV3，RANK：30 ) 在线值：发帖 5 回帖 344 粉丝 1 关注私信	猫子 5 楼 https://github.com/HexRaysSA/IDAPython/tree/release/9.0 现在看起来这个应该是官方的地址了吧最后于 2024-12-5 09:42 被猫子编辑，原因： 2024-12-5 09:41 0
	游客登录 \| 注册方可回帖回帖表情雪币赚取及消费高级回复

8_5_0

发帖

回帖

RANK

关注

私信

他的文章

[原创]利用官方工具生成IDAPython本地doc 3183

关于我们

联系我们

企业服务

看雪公众号

最新回复 (4)
VirtualCC 雪币： 1907 活跃值： (6059) 能力值： ( LV7，RANK：116 ) 在线值：发帖 43 回帖 141 粉丝 76 关注私信	VirtualCC 2 楼实用 2024-12-3 09:23 0
wandering 雪币： 1913 活跃值： (7574) 能力值： ( LV2，RANK：10 ) 在线值：发帖 42 回帖 525 粉丝 35 关注私信	wandering 3 楼下载使用，谢谢！ 2024-12-3 10:09 0
墨穹呢雪币： 2299 活跃值： (3012) 能力值： ( LV3，RANK：20 ) 在线值：发帖 2 回帖 64 粉丝 16 关注私信	墨穹呢 4 楼感谢分享 2024-12-3 10:14 0
猫子雪币： 439 活跃值： (1223) 能力值： ( LV3，RANK：30 ) 在线值：发帖 5 回帖 344 粉丝 1 关注私信	猫子 5 楼 https://github.com/HexRaysSA/IDAPython/tree/release/9.0 现在看起来这个应该是官方的地址了吧最后于 2024-12-5 09:42 被猫子编辑，原因： 2024-12-5 09:41 0
	游客登录 \| 注册方可回帖回帖表情雪币赚取及消费高级回复