首页
社区
课程
招聘
[原创]利用官方工具生成IDAPython本地doc
发表于: 2024-11-25 11:11 3182

[原创]利用官方工具生成IDAPython本地doc

2024-11-25 11:11
3182

IDAPython是IDA中一个很重要的工具,可以让用户使用python脚本来操作IDA实现各种各样的操作。但是IDAPython不同版本之间差异很大,每次发布IDA新版本都会作废一批旧的接口并引入新的函数,这使得IDAPytho使用起来非常依赖文档。Hexrays提供的在线文档访问较慢而且笔者由于工作原因经常需要在离线的情况下使用,所以萌生了生成离线doc的想法。于是折腾了一晚上的时间,终于搞定,特此记录一下折腾过程,以供有相同需要的朋友参考。

一开始想着使用HTTrack直接镜像一份官方的doc文档不就好了么,结果发现镜像站中跳转全都乱掉了,完全没有改的欲望,于是放弃。几经周转,找到了 官方Repo ,并且在repo的tools/docs目录下找到一个hrdoc.py文件,看起来是开发者自己生成doc用的。

脚本需要提供5个参数,分别为

同时,在repo根目录下的makefile里docs目标提供了用法

简单来说就是,-o参数指定输出目录,-m参数跟idapython模块(以逗号","分隔),-s -x 参数照抄makefile里提供的命令。但是直接运行会有各种坑。

官方doc生成脚本依赖一个三方库pdoc,但是不能直接用pip安装,需要clone下来。在仓库根目录新建third-party文件夹,clone pdoc。

同时修改脚本来解决import路径问题

修改后的脚本如下

因为脚本需要提供参数,因此无法在ida图形界面中的Script file执行,需要以命令行的方式执行脚本。笔者的环境是osx其他环境可以微调一下。

首先进入ida可执行文件的目录

使用命令

目前没找到工具将pdoc生成的文件转化成docset格式文件,因此暂时只能用html2dash工具来导入。

然后dash手动导入即可。

最后附上一份生成好的离线文档。

DOCS_MODULES=$(foreach mod,$(MODULES_NAMES),ida_$(mod))
SORTED_DOCS_MODULES=$(sort $(DOCS_MODULES))
docs:   tools/docs/hrdoc.py tools/docs/hrdoc.css
ifndef __NT__
    $(IDAT_CMD) $(BATCH_SWITCH) -S"tools/docs/hrdoc.py -o docs/hr-html -m $(subst $(space),$(comma),$(SORTED_DOCS_MODULES)),idc,idautils -s idc,idautils -x ida_allins" -t > /dev/null
#   $(IDAT_CMD) $(BATCH_SWITCH) -S"tools/docs/hrdoc.py -o docs/hr-html -m ida_pro,ida_kernwin -s idc,idautils -x ida_allins" -t > /dev/null  # use this one for testing (faster)
else
    $(R)ida -Stools/docs/hrdoc.py -t
endif
DOCS_MODULES=$(foreach mod,$(MODULES_NAMES),ida_$(mod))
SORTED_DOCS_MODULES=$(sort $(DOCS_MODULES))
docs:   tools/docs/hrdoc.py tools/docs/hrdoc.css
ifndef __NT__
    $(IDAT_CMD) $(BATCH_SWITCH) -S"tools/docs/hrdoc.py -o docs/hr-html -m $(subst $(space),$(comma),$(SORTED_DOCS_MODULES)),idc,idautils -s idc,idautils -x ida_allins" -t > /dev/null
#   $(IDAT_CMD) $(BATCH_SWITCH) -S"tools/docs/hrdoc.py -o docs/hr-html -m ida_pro,ida_kernwin -s idc,idautils -x ida_allins" -t > /dev/null  # use this one for testing (faster)
else
    $(R)ida -Stools/docs/hrdoc.py -t
endif
mkdir third-party
cd third-party
git clone https://github.com/pdoc3/pdoc
mkdir third-party
cd third-party
git clone https://github.com/pdoc3/pdoc
from __future__ import print_function
import os
import sys
import shutil
import json
from glob import glob
from typing import Dict, List
from functools import lru_cache
 
tools_docs_path = os.path.abspath(os.path.dirname(__file__))
idapython_path = os.path.abspath(os.path.join(tools_docs_path, "..", ".."))
# idasrc_path = os.path.abspath(os.path.join(idapython_path, "..", "..", ".."))
idasrc_path = idapython_path
 
import idc
 
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument("-o", "--output", required=True)
parser.add_argument("-m", "--modules", required=True)
parser.add_argument("-s", "--include-source-for-modules", required=True)
parser.add_argument("-x", "--exclude-modules-from-searchable-index", required=True)
parser.add_argument("-v", "--verbose", default=False, action="store_true")
 
args = parser.parse_args(idc.ARGV[1:])
 
args.modules = args.modules.split(",")
args.include_source_for_modules = args.include_source_for_modules.split(",")
args.exclude_modules_from_searchable_index = args.exclude_modules_from_searchable_index.split(",")
 
try:
# pdoc location
    pdoc_path = os.path.join(idasrc_path, "third_party", "pdoc")
    sys.path.append(pdoc_path)
    sys.path.append(tools_docs_path)
# for the custom epytext
    import pdoc
except ImportError as e:
    import traceback
    idc.msg("Couldn't import module %s\n" % traceback.format_exc())
    idc.qexit(-1)
 
# --------------------------------------------------------------------------
def gen_docs():
    sys.path.insert(0, os.path.join(idapython_path, "tools"))
    # trash existing doc
    if os.path.isdir(args.output):
        shutil.rmtree(args.output)
 
    # generate new doc
    build_documentation()
 
 
# --------------------------------------------------------------------------
# This is a ripoff of pdoc's cli.py, w/ minor adjustments
def gen_lunr_search(modules: List[pdoc.Module],
                          index_docstrings: bool,
                          template_config: dict):
    """Generate index.js for search"""
 
    def trim_docstring(docstring):
        return re.sub(r'''
            \s+|                   # whitespace sequences
            \s+[-=~]{3,}\s+|       # title underlines
            ^[ \t]*[`~]{3,}\w*$|   # code blocks
            \s*[`#*]+\s*|          # common markdown chars
            \s*([^\w\d_>])\1\s*|   # sequences of punct of the same kind
            \s*</?\w*[^>]*>\s*     # simple HTML tags
        ''', ' ', docstring, flags=re.VERBOSE | re.MULTILINE)
 
    def recursive_add_to_index(dobj):
        info = {
            'ref': dobj.refname,
            'url': to_url_id(dobj.module),
        }
        if index_docstrings:
            info['doc'] = trim_docstring(dobj.docstring)
        if isinstance(dobj, pdoc.Function):
            info['func'] = 1
        index.append(info)
        for member_dobj in getattr(dobj, 'doc', {}).values():
            recursive_add_to_index(member_dobj)
 
    @lru_cache()
    def to_url_id(module):
        url = module.url()
        if url not in url_cache:
            url_cache[url] = len(url_cache)
        return url_cache[url]
 
    index: List[Dict] = []
    url_cache: Dict[str, int] = {}
    for top_module in modules:
        recursive_add_to_index(top_module)
    urls = sorted(url_cache.keys(), key=url_cache.__getitem__)
 
    main_path = args.output
    with open(os.path.join(main_path, 'index.js'), "w", encoding="utf-8") as f:
        f.write("URLS=")
        json.dump(urls, f, indent=0, separators=(',', ':'))
        f.write(";\nINDEX=")
        json.dump(index, f, indent=0, separators=(',', ':'))
 
    # Generate search.html
    with open(os.path.join(main_path, 'doc-search.html'), "w", encoding="utf-8") as f:
        rendered_template = pdoc._render_template('/search.mako', **template_config)
        f.write(rendered_template)
 
 
# --------------------------------------------------------------------------
def build_documentation():
 
    # import all modules
    def docfilter(obj):
        # print("OBJ: %s" % str(obj))
        if obj.name in [
                "thisown",
                "SWIG_PYTHON_LEGACY_BOOL",
        ]:
            return False
        return True
 
    modules = []
    for module in args.modules:
        print("Loading: %s" % module)
        modules.append(pdoc.Module(module, docfilter=docfilter))
 
    print("  {} module{} in the list.".format(
          len(modules), "" if len(modules) == 1 else "s"))
 
    pdoc.link_inheritance()
 
    #
    # ida_*.html
    #
    pdoc.tpl_lookup.directories.insert(0, os.path.join(tools_docs_path, "templates"))
    show_source_code = set(args.include_source_for_modules)
 
    def all_modules(module_collection):
        for module in module_collection:
            yield module
 
            yield from all_modules(module.submodules())
 
    for module in all_modules(modules):
        module.obj.__docformat__ = "hr_epy"
 
        print("Processing: %s" % module.name)
        html = module.html(
            show_source_code=module.name in show_source_code,
            search_prefix=module.name)
 
        path = os.path.join(args.output, module.url())
        dirname = os.path.dirname(path)
        os.makedirs(dirname, exist_ok=True)
 
        print("Writing: %s" % path)
        with open(path, "w", encoding="utf-8") as f:
            f.write(html)
 
    #
    # doc-search.html, index.js
    #
    template_config = {}
    gen_lunr_search(
        [mod for mod in modules if mod.name not in args.exclude_modules_from_searchable_index],
        index_docstrings=True,
        template_config=pdoc._get_config(**template_config).get('lunr_search'))
 
    #
    # index.html
    #
    path = os.path.join(args.output, "index.html")
    class fake_module_t(object):
        def __init__(self, name, url):
            self.name = name
            self._url = url
        def url(self):
            return self._url
 
    index_module = fake_module_t("index", "index.html")
    html = pdoc._render_template('/index.mako', module=index_module, modules=modules)
    with open(path, "w", encoding="utf-8") as f:
        f.write(html)
 
# --------------------------------------------------------------------------
def main():
    print("Generating documentation.....")
    gen_docs()
    print("Documentation generated!")
 
# --------------------------------------------------------------------------
if __name__ == "__main__":
    main()
    qexit(0)
from __future__ import print_function
import os
import sys
import shutil
import json
from glob import glob
from typing import Dict, List
from functools import lru_cache
 
tools_docs_path = os.path.abspath(os.path.dirname(__file__))
idapython_path = os.path.abspath(os.path.join(tools_docs_path, "..", ".."))
# idasrc_path = os.path.abspath(os.path.join(idapython_path, "..", "..", ".."))
idasrc_path = idapython_path
 
import idc
 
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument("-o", "--output", required=True)
parser.add_argument("-m", "--modules", required=True)
parser.add_argument("-s", "--include-source-for-modules", required=True)
parser.add_argument("-x", "--exclude-modules-from-searchable-index", required=True)
parser.add_argument("-v", "--verbose", default=False, action="store_true")
 
args = parser.parse_args(idc.ARGV[1:])
 
args.modules = args.modules.split(",")
args.include_source_for_modules = args.include_source_for_modules.split(",")
args.exclude_modules_from_searchable_index = args.exclude_modules_from_searchable_index.split(",")
 
try:
# pdoc location
    pdoc_path = os.path.join(idasrc_path, "third_party", "pdoc")
    sys.path.append(pdoc_path)
    sys.path.append(tools_docs_path)
# for the custom epytext
    import pdoc
except ImportError as e:
    import traceback
    idc.msg("Couldn't import module %s\n" % traceback.format_exc())
    idc.qexit(-1)
 
# --------------------------------------------------------------------------
def gen_docs():
    sys.path.insert(0, os.path.join(idapython_path, "tools"))
    # trash existing doc
    if os.path.isdir(args.output):
        shutil.rmtree(args.output)
 
    # generate new doc
    build_documentation()
 
 
# --------------------------------------------------------------------------
# This is a ripoff of pdoc's cli.py, w/ minor adjustments
def gen_lunr_search(modules: List[pdoc.Module],
                          index_docstrings: bool,
                          template_config: dict):
    """Generate index.js for search"""
 
    def trim_docstring(docstring):
        return re.sub(r'''
            \s+|                   # whitespace sequences
            \s+[-=~]{3,}\s+|       # title underlines
            ^[ \t]*[`~]{3,}\w*$|   # code blocks
            \s*[`#*]+\s*|          # common markdown chars
            \s*([^\w\d_>])\1\s*|   # sequences of punct of the same kind
            \s*</?\w*[^>]*>\s*     # simple HTML tags
        ''', ' ', docstring, flags=re.VERBOSE | re.MULTILINE)
 
    def recursive_add_to_index(dobj):
        info = {
            'ref': dobj.refname,
            'url': to_url_id(dobj.module),
        }
        if index_docstrings:
            info['doc'] = trim_docstring(dobj.docstring)
        if isinstance(dobj, pdoc.Function):
            info['func'] = 1
        index.append(info)
        for member_dobj in getattr(dobj, 'doc', {}).values():
            recursive_add_to_index(member_dobj)
 
    @lru_cache()
    def to_url_id(module):
        url = module.url()
        if url not in url_cache:
            url_cache[url] = len(url_cache)
        return url_cache[url]
 
    index: List[Dict] = []
    url_cache: Dict[str, int] = {}
    for top_module in modules:

[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课

上传的附件:
收藏
免费 5
支持
分享
最新回复 (4)
雪    币: 1907
活跃值: (6059)
能力值: ( LV7,RANK:116 )
在线值:
发帖
回帖
粉丝
2
实用
2024-12-3 09:23
0
雪    币: 1913
活跃值: (7574)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
3
下载使用,谢谢!
2024-12-3 10:09
0
雪    币: 2299
活跃值: (3012)
能力值: ( LV3,RANK:20 )
在线值:
发帖
回帖
粉丝
4
感谢分享
2024-12-3 10:14
0
雪    币: 439
活跃值: (1223)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
5

https://github.com/HexRaysSA/IDAPython/tree/release/9.0 现在看起来这个应该是官方的地址了吧

最后于 2024-12-5 09:42 被猫子编辑 ,原因:
2024-12-5 09:41
0
游客
登录 | 注册 方可回帖
返回
//