首页
社区
课程
招聘
逆向工程-----Classes Restoration
发表于: 2004-11-17 14:59 4250

逆向工程-----Classes Restoration

2004-11-17 14:59
4250
Classes Restoration   
  

Here is the first article on our site. It’s written by our reverse engineer and, in fact, is a kind of a lesson for reversers.

This peace of our company experience may be useful for you, that is why this text is placed here.

If you would like to use our services in Reverse Engineering area, please write [email]info@apriorit.com[/email] to discuss your task.

There are some misperceptions regarding reverse engineering (its legacy, ethic and so on) and you can clear all for yourself  here and much more - here   

Classes restoration is a complicated procedure which requires knowledge of OOP and the way this OOP is organized in specific compiler.

  

Our task is to get class, its methods and members. Let’s begin with Delphi, because it’s relatively easy to find a class here.

  

Class restoration begins with looking for constructor, because here is the memory for object is being allocated and also we can gain some insight into constructor’s components.

  

It’s easy to find a constructor in Delphi ? we just need to look for a string in which the class name occurs. For example, for TList the next structure can be found:  

  

CODE:0040D598 TList           dd offset TList_VTBL

CODE:0040D59C                 dd 7 dup(0)

CODE:0040D5B8                 dd offset aTlist        ; "TList"

CODE:0040D5BC SizeOfObject    dd 10h

CODE:0040D5C0                 dd offset off_4010C8

CODE:0040D5C4                 dd offset TObject::SafeCallException

CODE:0040D5C8                 dd offset nullsub_8

CODE:0040D5CC                 dd offset TObject::NewInstance

CODE:0040D5D0                 dd offset TObject::FreeInstance

CODE:0040D5D4                 dd offset sub_40EA08

CODE:0040D5D8 TList_VTBL           dd offset TList::Grow

CODE:0040D5DC                 dd offset unknown_libname_107

CODE:0040D5E0 aTlist          db 5,'TList'

  

This is, if we can say so, an ‘object descriptor’. Pointer to it is being passed to the constructor. The constructor takes from it the data required for object creation. Using XREF on 40D598 we can find all the places where the constructor is being called. Here is an example of one of such calls:

  

CODE:0040E72E                 mov     eax, ds:TList

CODE:0040E733                 call    CreateClass

CODE:0040E738                 mov     ds:dword_4A45F8, eax

  

The constructor function we named by ourselves. We can determine whether it is really a  CreateClass by the contents of the function:

  

CODE:00402F48 CreateClass     proc near               ; CODE XREF: @BeginGlobalLoading+17p

CODE:00402F48                                         ; @CollectionsEqual+48p ...

CODE:00402F48                 test    dl, dl

CODE:00402F4A                 jz      short loc_402F54

CODE:00402F4C                 add     esp, 0FFFFFFF0h

CODE:00402F4F                 call    __linkproc__ ClassCreate

CODE:00402F54

CODE:00402F54 loc_402F54:                             ; CODE XREF: CreateClass+2j

CODE:00402F54                 test    dl, dl

CODE:00402F56                 jz      short locret_402F62

CODE:00402F58                 pop     large dword ptr fs:0

CODE:00402F5F                 add     esp, 0Ch

CODE:00402F62

CODE:00402F62 locret_402F62:                          ; CODE XREF: CreateClass+Ej

CODE:00402F62                 retn

CODE:00402F62 CreateClass     endp

I.e., if there is __linkproc__ ClassCreate inside the function, it’s a constructor. Now we can look at how particularly the class creation happens:

  

CODE:00403200 __linkproc__ ClassCreate proc near      ; CODE XREF: CreateClass+7p

CODE:00403200                                         ; sub_40AA58+Ap ...

CODE:00403200

CODE:00403200 arg_0           = dword ptr  10h

CODE:00403200

CODE:00403200                 push    edx

CODE:00403201                 push    ecx

CODE:00403202                 push    ebx

CODE:00403203                 call    dword ptr [eax-0Ch]

CODE:00403206                 xor     edx, edx

CODE:00403208                 lea     ecx, [esp+arg_0]

CODE:0040320C                 mov     ebx, fs:[edx]

CODE:0040320F                 mov     [ecx], ebx

CODE:00403211                 mov     [ecx+8], ebp

CODE:00403214                 mov     dword ptr [ecx+4], offset loc_403225

CODE:0040321B                 mov     [ecx+0Ch], eax

CODE:0040321E                 mov     fs:[edx], ecx

CODE:00403221                 pop     ebx

CODE:00403222                 pop     ecx

CODE:00403223                 pop     edx

CODE:00403224                 retn

CODE:00403224 __linkproc__ ClassCreate endp

  

So, the command

  

CODE:0040E72E mov eax, ds:TList

  

loads contents into EAX to the address of TList, i.e. it’s TList_VTBL. Since we use Delphi, here is the Borland’s convention of __fastcall is being used (parameters are being passed in the next order: EAX, EDX, ECX, stack...). It means that the pointer to the virtual methods table is being passed to the function CreateClass as a first parameter. Further EAX is not changing and gets into __linkproc__ClassCreate, and here we see:

CODE:00403203                 call    dword ptr [eax-0Ch]

  

Where is it going? The pointer to TList_VTBL=0х40D5D8 is still lies in EAX.  0x40D5D8-0xC=40D5CC, and this is

  

CODE:0040D5CC                 dd offset TObject::NewInstance

  

This is the ancestor’s constructor. So, TList is inherited by TObject. Let’s look what is in the depth:

  

CODE:00402F0C TObject::NewInstance proc near          ; DATA XREF: CODE:004010FCo

CODE:00402F0C                                         ; CODE:004011DCo ...

CODE:00402F0C                 push    eax

CODE:00402F0D                 mov     eax, [eax-1Ch]

CODE:00402F10                 call    __linkproc__ GetMem

CODE:00402F15                 mov     edx, eax

CODE:00402F17                 pop     eax

CODE:00402F18                 jmp     TObject::InitInstance

CODE:00402F18 TObject::NewInstance endp

  

The value of EAX is the same, so 0х40D5D8-0x1C=0x40D5BC.
Thus, the object size which is stored in 0x40D5BC, is being passed into GetMem

  

CODE:0040D5BC SizeOfObject    dd 10h

  

So, the total size of object members =0x10.

  

The function TObject::InitInstance doesn’t do anything special, it’s just stuffs object members with zero and sets the value of pointer to VTBL in the just created instance of the object. Then the exit from CreateClass will happen and the pointer to the instance of the object will be returned into EAX. That’s why the call of constructors looks like:

  

CODE:0040E72E                 mov     eax, ds:TList

CODE:0040E733                 call    CreateClass

CODE:0040E738                 mov     ds:dword_4A45F8, eax

  

Restoration of the object structure

  

We have known the object size already. It’s 0x10, where 0x4 bytes were taken by the pointer to VTBL. But there are 0xC bytes left and they contain object members, so we need to find them. Here an intuition is required. First of all, objects can’t be created for no particular reason and members can be filled either in constructor (fully or partly) or after creating by Set-methods. Our TList in the constructor is being stuffed with zero through rep stosd (in TObject::InitInstance). So there is no info about class members in the constructor. Thus let’s trace life cycle after the creation.

  

In our example the pointer to the instance of the class is being driven into global variable  dword_4A45F8. So we can just set breakpoint on reading from dword_4A45F8 and look at how the object methods will be called. First event:

  

CODE:0041319D mov     eax, [ebp+var_4]

CODE:004131A0 mov     edx, ds:pTList

CODE:004131A6 mov     [eax+30h], edx  ; copied a pointer to the instance of an object
CODE:004131A9 jmp     short loc_4131BD

.............

CODE:004131BD

CODE:004131BD loc_4131BD:                             ; CODE XREF: sub_4130BC+EDj

CODE:004131BD xor     eax, eax

CODE:004131BF push    ebp

CODE:004131C0 push    offset loc_413276

CODE:004131C5 push    dword ptr fs:[eax]

CODE:004131C8 mov     fs:[eax], esp

CODE:004131CB mov     eax, [ebp+var_4]

CODE:004131CE mov     edx, [eax+18h]

CODE:004131D1 mov     eax, [ebp+var_4]

CODE:004131D4 mov     eax, [eax+30h] ;’implicit passing of a pointer to the object itself’

CODE:004131D7 call    Classes::TList::Add(void *)

  

Now look into Classes::TList::Add:

  

CODE:0040EA28 __fastcall Classes::TList::Add(void *) proc near

CODE:0040EA28                                         ; CODE XREF: @RegisterClass+9Bp

CODE:0040EA28                                         ; @RegisterIntegerConsts+20p ...

CODE:0040EA28 push    ebx

CODE:0040EA29 push    esi

CODE:0040EA2A push    edi

CODE:0040EA2B mov     edi, edx

CODE:0040EA2D mov     ebx, eax ; a kind of This

CODE:0040EA2F mov     esi, [ebx+8] ; addressing to the object member №1

CODE:0040EA32 cmp     esi, [ebx+0Ch] ; addressing to the object member №3

CODE:0040EA35 jnz     short loc_40EA3D

CODE:0040EA37 mov     eax, ebx

CODE:0040EA39 mov     edx, [eax] ;addressing to TList->pVTBL

CODE:0040EA3B call    dword ptr [edx]

CODE:0040EA3D

CODE:0040EA3D loc_40EA3D:                             ; CODE XREF: Classes::TList::Add(void *)+Dj

CODE:0040EA3D mov     eax, [ebx+4] ; addressing to the object member №2

CODE:0040EA40 mov     [eax+esi*4], edi

CODE:0040EA43 inc     dword ptr [ebx+8]

CODE:0040EA46 mov     eax, esi

CODE:0040EA48 pop     edi

CODE:0040EA49 pop     esi

CODE:0040EA4A pop     ebx

CODE:0040EA4B retn

CODE:0040EA4B __fastcall Classes::TList::Add(void *) endp

  

That is… 3 last members have been found. All of them are of 4 bytes size. To simplify the work with classes in IDA Pro we use structures. Classes are the same structures, anyway:)))

After using the next structure:

00000000 TList_obj struc ; (sizeof=0X10)

00000000 pVTBL dd ?

00000004 Property1 dd ?

00000008 Property2 dd ?

0000000C Property3 dd ?

00000010 TList_obj ends

  

things become more clear:

CODE:0040EA28 __fastcall Classes::TList::Add(void *) proc near

CODE:0040EA28                                         ; CODE XREF: @RegisterClass+9Bp

CODE:0040EA28                                         ; @RegisterIntegerConsts+20p ...

CODE:0040EA28 push    ebx

CODE:0040EA29 push    esi

CODE:0040EA2A push    edi

CODE:0040EA2B mov     edi, edx

CODE:0040EA2D mov     ebx, eax

CODE:0040EA2F mov     esi, [ebx+TList_obj.Property2]

CODE:0040EA32 cmp     esi, [ebx+TList_obj.Property3]

CODE:0040EA35 jnz     short loc_40EA3D

CODE:0040EA37 mov     eax, ebx

CODE:0040EA39 mov     edx, [eax+TList_obj.pVTBL]

CODE:0040EA3B call    dword ptr [edx] ;TList::Grow

CODE:0040EA3D

CODE:0040EA3D loc_40EA3D:                             ; CODE XREF: Classes::TList::Add(void *)+Dj

CODE:0040EA3D mov     eax, [ebx+TList_obj.Property1]

CODE:0040EA40 mov     [eax+esi*4], edi

CODE:0040EA43 inc     [ebx+TList_obj.Property2]

CODE:0040EA46 mov     eax, esi

CODE:0040EA48 pop     edi

CODE:0040EA49 pop     esi

CODE:0040EA4A pop     ebx

CODE:0040EA4B retn

CODE:0040EA4B __fastcall Classes::TList::Add(void *) endp

  

Think of VBTL look and it will be easy to guess that:

  

CODE:0040EA3B call    dword ptr [edx]

  

is  TList::Grow,
because

CODE:0040D5D8 pVTBL dd offset TList::Grow  

  

Now we can make a deeper analyze of the class members. For example, if we have a look at the next code:

CODE:0040EA3D mov     eax, [ebx+TList_obj.Property1]

CODE:0040EA40 mov     [eax+esi*4], edi

CODE:0040EA43 inc     [ebx+TList_obj.Property2]

  

we can say that Property2 is a counter for the list elements, because it increases when an element is added.

  

And Property1 is the pointer to the array of list elements. Property 2 in this array is an index. Property 3 is the maximum number of the elements in a list, as method TList::Grow is being called just when Property2==Property3. We found out this by using  logic. Now, when all is clear, we may look in Help and give names to the members:

  

CODE:0040EA28 __fastcall Classes::TList::Add(void *) proc near

CODE:0040EA28                                         ; CODE XREF: @RegisterClass+9Bp

CODE:0040EA28                                         ; @RegisterIntegerConsts+20p ...

CODE:0040EA28                 push    ebx

CODE:0040EA29                 push    esi

CODE:0040EA2A                 push    edi

CODE:0040EA2B                 mov     edi, edx

CODE:0040EA2D                 mov     ebx, eax

CODE:0040EA2F                 mov     esi, [ebx+TList_obj.Count]

CODE:0040EA32                 cmp     esi, [ebx+TList_obj.Capacity]

CODE:0040EA35                 jnz     short loc_40EA3D

CODE:0040EA37                 mov     eax, ebx

CODE:0040EA39                 mov     edx, [eax+TList_obj.pVTBL]

CODE:0040EA3B                 call    dword ptr [edx]

CODE:0040EA3D

CODE:0040EA3D loc_40EA3D:                             ; CODE XREF: Classes::TList::Add(void *)+Dj

CODE:0040EA3D                 mov     eax, [ebx+TList_obj.Items]

CODE:0040EA40                 mov     [eax+esi*4], edi

CODE:0040EA43                 inc     [ebx+TList_obj.Count]

CODE:0040EA46                 mov     eax, esi

CODE:0040EA48                 pop     edi

CODE:0040EA49                 pop     esi

CODE:0040EA4A                 pop     ebx

CODE:0040EA4B                 retn

CODE:0040EA4B __fastcall Classes::TList::Add(void *) endp

  

We have restored the structure, let’s look into the class methods.

  

Looking for the class methods

  

Methods can be: public/private (protected), virtual/non-virtual and static.

Static methods can’t be found because after the compilation was made they look like common procedures. Affiliation of such function with a specific class is also impossible to determine. But is there a sense in such search? If the function is called somewhere in the class methods, it, anyway, will be viewed while the code is being extracted. Otherwise, it is wasting of time.

Virtual functions are easy to find to ? they all are in VTBL.

  

But how we should look for non-virtual ones? Let’s think of OOP: when the object methods are called, the pointer to the object itself is implicitly passed to them.  In fact, it means that each method accepts the pointer to the object as its first parameter.  I.e., if the method was declared as __fastcall, the pointer to the object will be pushed into EAX. But for __cdecl or __stdcall methods it’s the first parameter in the stack. Let’s look on where is the pointer to the object is stored…absolutely right! In dword_4A45F8. On XREF to 4A45F8 we can find lots of non-virtual methods. Further we can set a breakpoint on 4A45F8 and trace the copying of a pointer to the instance to find where else the call of methods can take place. All is easy in our example, because global variable is used.  But what we should do, if the local variable is used or if the code can’t be executed (for example, we research driver’s code or the code is not allowed for execution)? Here we need a specific method.

  

Step-by-step:

1)     we have to find all the points of constructor’s calls

  

For each call

2)  trace where the pointer to the instance of an object is being written (local variable)

3)  looking through the function which has called the constructor  for all the calls of the object methods

4)  if there are no such calls, look at the next call of the constructor, otherwise look for all XREF to the method that had been found. In such way we can find calls that are not beside the constructor. And, as we know that the first parameter is the pointer to an object, we can go to each XREF and look where else the pointer to an object was used. And in such way we are going up the levels of the code, till we reach a deadlock or the method that had been found.

5)  reviewing the next method that had been found

  

For example, we have found Classes::TList::Add method. On one of the XREF we find Classes::TList::Add method here:

  

CODE:0040F020 TThreadList::Add proc near              ; CODE XREF: TCanvas::`...'+9Ep

CODE:0040F020                                         ; Graphics::_16725+C4p

CODE:0040F020

CODE:0040F020 var_4           = dword ptr -4

CODE:0040F020

CODE:0040F020                 push    ebp

CODE:0040F021                 mov     ebp, esp

CODE:0040F023                 push    ecx

CODE:0040F024                 push    ebx

CODE:0040F025                 mov     ebx, edx

CODE:0040F027                 mov     [ebp+var_4], eax

CODE:0040F02A                 mov     eax, [ebp+var_4]

CODE:0040F02D                 call    TThreadList::LockList

CODE:0040F032                 xor     eax, eax

CODE:0040F034                 push    ebp

CODE:0040F035                 push    offset loc_40F073

CODE:0040F03A                 push    dword ptr fs:[eax]

CODE:0040F03D                 mov     fs:[eax], esp

CODE:0040F040                 mov     eax, [ebp+var_4]

CODE:0040F043                 mov     eax, [eax+4]

CODE:0040F046                 mov     edx, ebx

CODE:0040F048                 call    TList::IndexOf

CODE:0040F04D                 inc     eax

CODE:0040F04E                 jnz     short loc_40F05D

CODE:0040F050                 mov     eax, [ebp+var_4]

CODE:0040F053                 mov     eax, [eax+4]

CODE:0040F056                 mov     edx, ebx

CODE:0040F058                 call    Classes::TList::Add(void *)

  

I.e. we have found TList::IndexOf method.

  

Further we see that we are in the method of TthreadList object and TList is its member. Here we have nothing to look at. Let’s assume that there are no more XREF to Classes::TList::Add. Go in TList::IndexOf method and look at its XREF. One of them directs us here:

CODE:0040EE38 TList::Remove   proc near               ; CODE XREF: TThreadList::Remove+28p

CODE:0040EE38                                         ; TCollection::RemoveItem+Bp ...

CODE:0040EE38                 push    ebx

CODE:0040EE39                 push    esi

CODE:0040EE3A                 mov     ebx, eax

CODE:0040EE3C                 mov     eax, ebx

CODE:0040EE3E                 call    TList::IndexOf

CODE:0040EE43                 mov     esi, eax

CODE:0040EE45                 cmp     esi, 0FFFFFFFFh

CODE:0040EE48                 jz      short loc_40EE53

CODE:0040EE4A                 mov     edx, esi

CODE:0040EE4C                 mov     eax, ebx

CODE:0040EE4E                 call    TList::Delete

CODE:0040EE53

CODE:0040EE53 loc_40EE53:                             ; CODE XREF: TList::Remove+10j

CODE:0040EE53                 mov     eax, esi

CODE:0040EE55                 pop     esi

CODE:0040EE56                 pop     ebx

CODE:0040EE57                 retn

CODE:0040EE57 TList::Remove   endp

  

So, TList::Delete and TList::Remove are found.

  

And so forth for all XREF and variables that contain a pointer to the instance of a class.

Here is an example of looking through the variable:

CODE:0041319D mov     eax, [ebp+var_4]

CODE:004131A0 mov     edx, ds:pTList

CODE:004131A6 mov     [eax+30h], edx  ;a pointer to the instance of an object is being copied

CODE:004131A9 jmp     short loc_4131BD

  

We see below:

CODE:00413236 mov     eax, [eax+30h]

CODE:00413239 mov     edx, [ebp+var_10]

CODE:0041323C call    TList::Get

  

How we can identify public or private methods? We can try to do that only when all the set of methods is found. Private methods are called only inside the other object methods. I.e. we should look at XREF.

  

While looking for methods we advise to number them first. It means as you find the method, you name it Object1::Method1, Object1::Method2 and so on, and when all the methods are found you may begin restoration of type and number of elements.

  

Determination of the number of method arguments

  

For  __cdecl и __stdcall there are few things to tell about, you just need to look on how much of them have IDA found and subtract the 1 (i.e. the 1 is a pointer to the instance of an object, and others are method arguments). There are more complications for __fastcall. First we need to remember the sequence order of arguments: EAX,EDX,ECX,stack.

  

The analysis begins with how much arguments that had been transmitted via stack does IDA have counted. If there are at least one, we add to it 3 (3 register’s plus the ones for stack). As first argument is allocated for This, we need to subtract the 1 from the number. The summary value is the net number of arguments.

  

If there are no stack arguments, we look at the beginning of the function. Delphi tries not to spoil arguments values, so each __fastcall function begins with copying from registers EAX, EDX and ECX in such way:

  

mov esi, edx ; first parameter
mov ebx, eax ; pThis
mov edi, ecx ; second parameter

Depending on the number of registers that are being copied, one can conclude what is the number of arguments. For example:

  

CODE:0040EBE0 TList::Get      proc near               ; CODE XREF: @GetClass+1Dp

CODE:0040EBE0                                         ; @UnRegisterModuleClasses+24p ...

CODE:0040EBE0

CODE:0040EBE0 var_4           = dword ptr -4

CODE:0040EBE0

CODE:0040EBE0                 push    ebp

CODE:0040EBE1                 mov     ebp, esp

CODE:0040EBE3                 push    0

CODE:0040EBE5                 push    ebx

CODE:0040EBE6                 push    esi

CODE:0040EBE7                 mov     esi, edx

CODE:0040EBE9                 mov     ebx, eax

CODE:0040EBEB                 xor     eax, eax

  

There are 2 arguments, 1 of them is pThis, thus TList::Get has 1 argument.

CODE:004198CC                 push    ebp

CODE:004198CD                 mov     ebp, esp

CODE:004198CF                 add     esp, 0FFFFFF8Ch

CODE:004198D2                 push    ebx

CODE:004198D3                 push    esi

CODE:004198D4                 push    edi

CODE:004198D5                 mov     [ebp+var_C], ecx

CODE:004198D8                 mov     [ebp+var_8], edx

CODE:004198DB                 mov     [ebp+var_4], eax

  

There are 3 arguments, 1 of them is for pThis, so total is 2 arguments.

  

We should  remind you that we restore the number of arguments in initial method which is described in Delphi, and in IDA, naturally, while declaring the function type we should write all the arguments in consideration with This.

  

Types of arguments try to determine on your own.

[注意]传递专业知识、拓宽行业人脉——看雪讲师团队等你加入!

收藏
免费 1
支持
分享
最新回复 (2)
雪    币: 390
活跃值: (707)
能力值: ( LV12,RANK:650 )
在线值:
发帖
回帖
粉丝
2
下次记得注明ZT

另外,这篇东西已经有人发了。不过,可以告诉我原始出处吗?
2004-11-17 15:09
0
雪    币: 218
活跃值: (40)
能力值: ( LV4,RANK:50 )
在线值:
发帖
回帖
粉丝
3
最初由 firstrose 发布
下次记得注明ZT

另外,这篇东西已经有人发了。不过,可以告诉我原始出处吗?


Sorry,我是新手。注明ZT是转载的意思吗?

http://www.apriorit.com/index.php?option=content&task=blogcategory&id=78&Itemid=53
2004-11-17 15:29
0
游客
登录 | 注册 方可回帖
返回
//