首页
社区
课程
招聘
[原创]一个基于x86的C/C++语言反编译c-decompiler
发表于: 2009-10-2 04:42 135417

[原创]一个基于x86的C/C++语言反编译c-decompiler

2009-10-2 04:42
135417
收藏
免费 7
支持
分享
最新回复 (220)
雪    币: 303
活跃值: (41)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
76
谢谢hume大牛的建议。
asm->c没找到,指的是除了我前面提到的dcc,boomerang, exetoc,disc_dos这四个开源反编译器。rockinuk提到的sourcer也只是个反汇编器。

boomerang,我用的是boomerang-win32-alpha-0.3.1:这一版本。
boomerang是个通用的反编译器,目标是适用于多种处理器,所有编译器。难度非常大。但其算法是通用的,值得借鉴。
我个人觉得,现在做反编译器应该是针对某些已知的编译器来,毕竟市场上主流的编译器就那么几个。这样可行性较大。另外,对那些用汇编写的程序无需反编译成C,直接用内嵌汇编。检测条件是:代码特征不符合已知编译器。所有不符合的,一律内嵌汇编处理。C++的行为更是跟编译器息息相关,因此也得针对不同编译器做处理。
正如你所说的,C++有很多难题尚未解决,当我们做工程的,可以将问题逐一细化分解,不求通用地解决,针对某一编译器,部分解决也是可以的
比如函数模版,无法还原成最初的模版,我用几个函数来表示总不影响阅读理解和重新编译连接吧。

对于这种理论上尚未很好解决的问题,投入跟产出肯定是不正比的。在做工程的时候,同时也给理论研究搭建了一个时间的平台。问题总能解决的。
投入会很多,但研发成功后,那是至少可以让你用十几年的好工具!对整个软件技术的发展起到了推动作用,消除了技术壁垒。

研发有风险,我不确信能做出这样的产品来,只是试试看。但看雪这么多好钻研的朋友,总有人能做出来的。
2009-10-9 21:26
0
雪    币: 517
活跃值: (64)
能力值: ( LV8,RANK:130 )
在线值:
发帖
回帖
粉丝
77
一个领人入迷的课题
2009-10-9 21:51
0
雪    币: 303
活跃值: (41)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
78
欢迎来到这个迷人的领域
2009-10-9 23:07
0
雪    币: 392
活跃值: (89)
能力值: ( LV9,RANK:280 )
在线值:
发帖
回帖
粉丝
79
向楼主苦心孤诣的精神致敬!
2009-10-9 23:46
0
雪    币: 10
活跃值: (130)
能力值: ( LV4,RANK:50 )
在线值:
发帖
回帖
粉丝
80
可否传一份论文 “Static Single Assignment for Decompilation” 的 pdf 版本上来?原始地址已经不能下载了~ 谢谢!
2009-10-10 01:27
0
雪    币: 303
活跃值: (41)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
81
压缩后2.xM 请下载看看
上传的附件:
2009-10-10 01:54
0
雪    币: 124
活跃值: (10)
能力值: ( LV3,RANK:20 )
在线值:
发帖
回帖
粉丝
82
强人,太强了
2009-10-10 02:59
0
雪    币: 2096
活跃值: (100)
能力值: (RANK:420 )
在线值:
发帖
回帖
粉丝
83
RelogixTM Assembler-to-C translator
上传的附件:
2009-10-10 07:37
0
雪    币: 2096
活跃值: (100)
能力值: (RANK:420 )
在线值:
发帖
回帖
粉丝
84
http://www.textmaestro.com/InfoEx_17_Convert_Assembly.htm

Example 17: Convert Assembly Code to C

Introduction: If you were to ask us, Why did you create TextMaestro?, we would simply reply: to convert Assembly code. All other features here evolved out of this. No other tool out there can handle the hugely repetitive task of rewriting Assembly code into corresponding C code like TextMaestro does. TextMaestro can handle any type of Assembly language, including Alpha,  Intel-8086, Motorola, and MIPS. All you need is a library for each kind.

Why would somebody convert Assembly code?, you might ask. The answer is simple. Vast amounts of critical code have been written in Assembly. There comes a time when porting this code to new hardware or new operating system becomes a necessity and nightmare.

However, developers embarking on such a project will soon find themselves entangled in two formidable problems: 1. They do not know the nuances of the Assembly language well enough, 2. They are not intimately acquainted with the algorithms behind the code. This leads to a state where reverse engineering becomes almost impossible.

Remember, learning Assembly instructions by picking up a book is not difficult. Being able to juggle those instructions and express one's thoughts elegantly takes a lifetime.

With TextMaestro, you have the capability to transliterate legacy Assembly code to C without knowing Assembly in depth and without knowing the algorithms crafted in the code. After you have the complete code in C, you can readily port it to a new platform or environment (just because its in C). Then you can comfortably begin the process of reverse engineering from a position of strength.

With that said, a cautionary note is needed. TextMaestro has no magic formula to perform this feat. You, the user, prepare a library by studying the original code. You will need to put forth a good amount of effort to bring the library up to a working stage.

Developing the library is an iterative process. We provide various libraries in our Repository section (which is under development). When it comes to converting Assembly code, the provided libraries do not guarantee completeness. It is almost certain that you will need to enhance the provided library. Thats where we can provide additional assistance with our customized service. Below we provide a simple step by step example.
上传的附件:
2009-10-10 08:00
0
雪    币: 303
活跃值: (41)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
85
RelogixTM Assembler-to-C translator

刚刚看了一下,反出来的代码很漂亮。不过,这也得益于asm代码中丰富的信息。
如变量名称,结构,函数指针等等。
因此,可以省了很多关键步骤,如变量识别,变量传递,数据类型识别以及复杂数据类型识别。
以上都属于数据流分析部分,反编译中最关键部分。
2009-10-10 09:37
0
雪    币: 303
活跃值: (41)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
86
开源IDA 反编译插件:

http://desquirr.sourceforge.net/desquirr/

Introduction

November 13, 2003 Binary for IDA Pro 4.6 kindly contributed by Joe Stewart. No code improvements.

May 7, 2003 Binary for IDA Pro 4.5. No code improvements.

October 21, 2002 Early support for ARM machine code and a binary for IDA Pro 4.3

June 20, 2002 Desquirr is now available for download!

Desquirr is a decompiler plugin for Interactive Disassembler Pro. It is currently capable of simple data flow analysis of binaries with Intel x86 machine code.

This program is currently under development. Suggestions, bug reports and patches are welcome.

See Downloads for documentation, binary and source code.
2009-10-10 09:39
0
雪    币: 2096
活跃值: (100)
能力值: (RANK:420 )
在线值:
发帖
回帖
粉丝
87
不只有 asm2c, 還有 bin2c.
多花一點時間找,應該可以找的到。
2009-10-10 09:54
0
雪    币: 303
活跃值: (41)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
88
最新的博士论文,硕士论文都不曾提及到。。。。
不排除某些单位机构拥有着,但没公开。
2009-10-10 10:03
0
雪    币: 2096
活跃值: (100)
能力值: (RANK:420 )
在线值:
发帖
回帖
粉丝
89
/****************************************/
/*      ASM to C Hex - Converter        */
/*      ''''''''''''''''''''''''        */
/*             22.05.2005               */
/*    CopyLeft 2005 by Jonas Gehring    */
/****************************************/

Usage: asmtochex [flag] <type> <infile> <outfile>

         flags: -quiet    -  Don't ouput standard messages
                -noarr    -  Don't use standard array format

         types: -uchar    -  Convert to unsigned char array
                -ushort   -  Convert to unsigned short array
                -ulong    -  Convert to unsigned long array
      
                infile    -  Name of input file
                outfile   -  Name of output file

                Converts 68k ASM data to C Hex arrays
                of given type.

ASM to C Hex is a comfortable data converter to convert 68k ASM data
into C Hex arrays of choosable format.

The outputfile can directly copied (or included) into a TIGCC project
(or any other C project). The only thing to change might be the array
name ('data' is default).

If you want to convert the hex data directly into a binary (e.g. with
the TIGCC Tools Suite by TI-Chess Team), you may use the flag -noarr.
By doing so, the output file contains only the real data and the
comments of the ASM file.

I wrote this program as a tool for my RPG project "Shadow Falls".
When working on it, I had to convert LARGE tile arrays and maps,
because CalcGS by Rusty Wagner only gives output in 68k ASM format.
This converting always took some time and was very boring, so this
tool was very helpful for me.

Contact me:
        e-mail: saubue@mobifiles.de
        WWW:    saubue68k.de.vu
        
        

History:
        22.05.2005             v1.2        - Added quiet-flag
                                           - Added noarr-flag

        13.05.2005             v1.1        - Fixed counting of tiles (before, the ","s of the
                                               ASM data were counted, now the "$"s are counted
                                               and then divided with the type)
                                           - Changed time output to X.XXX seconds

        12.05.2005             v1.0        - First version
上传的附件:
2009-10-10 11:57
0
雪    币: 303
活跃值: (41)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
90
感谢rockinuk提供这么多材料
2009-10-10 15:18
0
雪    币: 1233
活跃值: (907)
能力值: ( LV12,RANK:750 )
在线值:
发帖
回帖
粉丝
91
rockinuk不错啊,谢谢了
2009-10-10 15:41
0
雪    币: 201
活跃值: (10)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
92
有时候很有用的哈
2009-10-10 15:46
0
雪    币: 105
活跃值: (10)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
93
传说中的牛人吗?
2009-10-10 17:01
0
雪    币: 303
活跃值: (41)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
94
上传 逆向C++.pdf供大家下载
上传的附件:
2009-10-10 19:57
0
雪    币: 303
活跃值: (41)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
95
不知大家对C++的逆向有啥看法,大家讨论讨论
2009-10-10 20:03
0
雪    币: 442
活跃值: (43)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
96
这个东西

王道啊

等你多发几个版本以后我在下吧
2009-10-10 20:45
0
雪    币: 303
活跃值: (41)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
97
抛砖引玉:
如何对于_cdecl调用约定,如何确定函数参数的个数?
根据调用者负责压栈和堆栈恢复的行为,可推出参数个数,
如:
push eax ; push parameter
call function_a ; call function A
pop ecx ; restore stack pointer
push eax ; push parameter
call function_b ; call function B
pop ecx ; restore stack pointer
函数a,b各有1个参数

push eax ; push second parameter
push ebx ; push first parameter
call function_c ; call function C
add esp, 8 ; restore stack for call C
push eax ; push second parameter
push ebx ; push first parameter
call function_d ; call function D
add esp, 8 ; restore stack for call D

函数c,d各有2个参数

但下面的情况就不好搞了:
push eax ; push second parameter
push ebx ; push first parameter
call function_c ; call function C
push eax ; push second parameter
push ebx ; push first parameter
call function_d ; call function D
add esp, 10h ; restore stack for both function call C and D

一种简单粗暴的方法:
记录每个call之前的Push深度,沿着指令顺序向下查找,直到堆栈平衡。拿压栈的深度跟恢复堆栈的深度比较,取其小者为参数堆栈深度。
fun_d的参数堆栈是8,恢复堆栈是16,因此其参数堆栈是8,2个参数。
fun_c的参数堆栈是16-8,也是2个参数

下面例子也是一样分析:
    015 401027 56                  PUSH                esi
    016 401028 E893000000          CALL                proc_1
    017 40102D 8B442438            MOV                 eax, dword ptr [esp + 0x38]
    018 401031 50                  PUSH                eax
    019 401032 56                  PUSH                esi
    020 401033 E818010000          CALL                proc_2
    021 401038 83C40C              ADD                 esp, 0Ch

大家有啥好方法?讨论讨论。另外,该方法在那些情况下会失效呢
2009-10-10 22:28
0
雪    币: 2096
活跃值: (100)
能力值: (RANK:420 )
在线值:
发帖
回帖
粉丝
98
http://www.totalembedded.com/open_source/jtag/mips32_ejtag/daemon/asmtoc.perl
#
# ASMTOC.PL
#
# Takes a file that is an actual listing from the cross assembler and
# makes a C file with a structure containing all the machine code from
# the assembly file.
#
# Copyright (c) 2002, Jason Riffel - TotalEmbedded LLC.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without 
# modification, are permitted provided that the following conditions 
# are met:
#
# Redistributions of source code must retain the above copyright 
# notice, this list of conditions and the following disclaimer. 
#
# Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in 
# the documentation and/or other materials provided with the
# distribution. 
#
# Neither the name of TotalEmbedded nor the names of its 
# contributors may be used to endorse or promote products derived 
# from this software without specific prior written permission.
# 
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 
# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
# COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 
# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 
# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 
# POSSIBILITY OF SUCH DAMAGE.
#

$filename = $ARGV[0]; 

# Print out the C header file
print "//\n";
print "// $filename.c\n";
print "//\n";
print "// DO NOT EDIT THIS FILE.  This file was generated automatically\n";
print "// by a script that converts an assembler listing into a C structure\n";
print "// containing the hex values of the instructions in the listing.\n";
print "// You must edit the assembler source directly and execute the build\n";
print "// again.\n";
print "//\n\n";
print "unsigned int aui_$filename\_code\[\] = {\n";

while(<STDIN>)
{
  $line = $_;
  $line =~ s/\t/ /g;
  $line =~ s/^..........//;
  if ($line =~ m/^[0-9A-Fa-f]{8,8}/)
  {
    print "  0x";
    print $&;
    print ",  // ";
    $line = $';
    $line =~ s/^ +//;
    print $line;
  }
  else
  {
    $line =~ s/^ +//;
    print "               // $line";
  }
}

print "  0x00000000}; // <- Inserted by script to terminate array.\n\n";

close(FH);


2009-10-11 01:39
0
雪    币: 0
活跃值: (10)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
99
好东西,收藏了!
2009-10-11 20:56
0
雪    币: 190
活跃值: (14)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
100
非常强大,期望早点看到完整版本!
2009-10-12 10:17
0
游客
登录 | 注册 方可回帖
返回
//