Here is the first article on our site. It’s written by our reverse engineer and, in fact, is a kind of a lesson for reversers.
This peace of our company experience may be useful for you, that is why this text is placed here.
If you would like to use our services in Reverse Engineering area, please write [email]info@apriorit.com[/email] to discuss your task.
There are some misperceptions regarding reverse engineering (its legacy, ethic and so on) and you can clear all for yourself here and much more - here
Classes restoration is a complicated procedure which requires knowledge of OOP and the way this OOP is organized in specific compiler.
Our task is to get class, its methods and members. Let’s begin with Delphi, because it’s relatively easy to find a class here.
Class restoration begins with looking for constructor, because here is the memory for object is being allocated and also we can gain some insight into constructor’s components.
It’s easy to find a constructor in Delphi ? we just need to look for a string in which the class name occurs. For example, for TList the next structure can be found:
This is, if we can say so, an ‘object descriptor’. Pointer to it is being passed to the constructor. The constructor takes from it the data required for object creation. Using XREF on 40D598 we can find all the places where the constructor is being called. Here is an example of one of such calls:
CODE:0040E72E mov eax, ds:TList
CODE:0040E733 call CreateClass
CODE:0040E738 mov ds:dword_4A45F8, eax
The constructor function we named by ourselves. We can determine whether it is really a CreateClass by the contents of the function:
CODE:00402F48 CreateClass proc near ; CODE XREF: @BeginGlobalLoading+17p
loads contents into EAX to the address of TList, i.e. it’s TList_VTBL. Since we use Delphi, here is the Borland’s convention of __fastcall is being used (parameters are being passed in the next order: EAX, EDX, ECX, stack...). It means that the pointer to the virtual methods table is being passed to the function CreateClass as a first parameter. Further EAX is not changing and gets into __linkproc__ClassCreate, and here we see:
CODE:00403203 call dword ptr [eax-0Ch]
Where is it going? The pointer to TList_VTBL=0х40D5D8 is still lies in EAX. 0x40D5D8-0xC=40D5CC, and this is
CODE:0040D5CC dd offset TObject::NewInstance
This is the ancestor’s constructor. So, TList is inherited by TObject. Let’s look what is in the depth:
CODE:00402F0C TObject::NewInstance proc near ; DATA XREF: CODE:004010FCo
CODE:00402F0C ; CODE:004011DCo ...
CODE:00402F0C push eax
CODE:00402F0D mov eax, [eax-1Ch]
CODE:00402F10 call __linkproc__ GetMem
CODE:00402F15 mov edx, eax
CODE:00402F17 pop eax
CODE:00402F18 jmp TObject::InitInstance
CODE:00402F18 TObject::NewInstance endp
The value of EAX is the same, so 0х40D5D8-0x1C=0x40D5BC.
Thus, the object size which is stored in 0x40D5BC, is being passed into GetMem
CODE:0040D5BC SizeOfObject dd 10h
So, the total size of object members =0x10.
The function TObject::InitInstance doesn’t do anything special, it’s just stuffs object members with zero and sets the value of pointer to VTBL in the just created instance of the object. Then the exit from CreateClass will happen and the pointer to the instance of the object will be returned into EAX. That’s why the call of constructors looks like:
CODE:0040E72E mov eax, ds:TList
CODE:0040E733 call CreateClass
CODE:0040E738 mov ds:dword_4A45F8, eax
Restoration of the object structure
We have known the object size already. It’s 0x10, where 0x4 bytes were taken by the pointer to VTBL. But there are 0xC bytes left and they contain object members, so we need to find them. Here an intuition is required. First of all, objects can’t be created for no particular reason and members can be filled either in constructor (fully or partly) or after creating by Set-methods. Our TList in the constructor is being stuffed with zero through rep stosd (in TObject::InitInstance). So there is no info about class members in the constructor. Thus let’s trace life cycle after the creation.
In our example the pointer to the instance of the class is being driven into global variable dword_4A45F8. So we can just set breakpoint on reading from dword_4A45F8 and look at how the object methods will be called. First event:
CODE:0041319D mov eax, [ebp+var_4]
CODE:004131A0 mov edx, ds:pTList
CODE:004131A6 mov [eax+30h], edx ; copied a pointer to the instance of an object
CODE:004131A9 jmp short loc_4131BD
That is… 3 last members have been found. All of them are of 4 bytes size. To simplify the work with classes in IDA Pro we use structures. Classes are the same structures, anyway:)))
After using the next structure:
00000000 TList_obj struc ; (sizeof=0X10)
00000000 pVTBL dd ?
00000004 Property1 dd ?
00000008 Property2 dd ?
0000000C Property3 dd ?
00000010 TList_obj ends
things become more clear:
CODE:0040EA28 __fastcall Classes::TList::Add(void *) proc near
Think of VBTL look and it will be easy to guess that:
CODE:0040EA3B call dword ptr [edx]
is TList::Grow,
because
CODE:0040D5D8 pVTBL dd offset TList::Grow
Now we can make a deeper analyze of the class members. For example, if we have a look at the next code:
CODE:0040EA3D mov eax, [ebx+TList_obj.Property1]
CODE:0040EA40 mov [eax+esi*4], edi
CODE:0040EA43 inc [ebx+TList_obj.Property2]
we can say that Property2 is a counter for the list elements, because it increases when an element is added.
And Property1 is the pointer to the array of list elements. Property 2 in this array is an index. Property 3 is the maximum number of the elements in a list, as method TList::Grow is being called just when Property2==Property3. We found out this by using logic. Now, when all is clear, we may look in Help and give names to the members:
CODE:0040EA28 __fastcall Classes::TList::Add(void *) proc near
We have restored the structure, let’s look into the class methods.
Looking for the class methods
Methods can be: public/private (protected), virtual/non-virtual and static.
Static methods can’t be found because after the compilation was made they look like common procedures. Affiliation of such function with a specific class is also impossible to determine. But is there a sense in such search? If the function is called somewhere in the class methods, it, anyway, will be viewed while the code is being extracted. Otherwise, it is wasting of time.
Virtual functions are easy to find to ? they all are in VTBL.
But how we should look for non-virtual ones? Let’s think of OOP: when the object methods are called, the pointer to the object itself is implicitly passed to them. In fact, it means that each method accepts the pointer to the object as its first parameter. I.e., if the method was declared as __fastcall, the pointer to the object will be pushed into EAX. But for __cdecl or __stdcall methods it’s the first parameter in the stack. Let’s look on where is the pointer to the object is stored…absolutely right! In dword_4A45F8. On XREF to 4A45F8 we can find lots of non-virtual methods. Further we can set a breakpoint on 4A45F8 and trace the copying of a pointer to the instance to find where else the call of methods can take place. All is easy in our example, because global variable is used. But what we should do, if the local variable is used or if the code can’t be executed (for example, we research driver’s code or the code is not allowed for execution)? Here we need a specific method.
Step-by-step:
1) we have to find all the points of constructor’s calls
For each call
2) trace where the pointer to the instance of an object is being written (local variable)
3) looking through the function which has called the constructor for all the calls of the object methods
4) if there are no such calls, look at the next call of the constructor, otherwise look for all XREF to the method that had been found. In such way we can find calls that are not beside the constructor. And, as we know that the first parameter is the pointer to an object, we can go to each XREF and look where else the pointer to an object was used. And in such way we are going up the levels of the code, till we reach a deadlock or the method that had been found.
5) reviewing the next method that had been found
For example, we have found Classes::TList::Add method. On one of the XREF we find Classes::TList::Add method here:
CODE:0040F020 TThreadList::Add proc near ; CODE XREF: TCanvas::`...'+9Ep
CODE:0040F020 ; Graphics::_16725+C4p
CODE:0040F020
CODE:0040F020 var_4 = dword ptr -4
CODE:0040F020
CODE:0040F020 push ebp
CODE:0040F021 mov ebp, esp
CODE:0040F023 push ecx
CODE:0040F024 push ebx
CODE:0040F025 mov ebx, edx
CODE:0040F027 mov [ebp+var_4], eax
CODE:0040F02A mov eax, [ebp+var_4]
CODE:0040F02D call TThreadList::LockList
CODE:0040F032 xor eax, eax
CODE:0040F034 push ebp
CODE:0040F035 push offset loc_40F073
CODE:0040F03A push dword ptr fs:[eax]
CODE:0040F03D mov fs:[eax], esp
CODE:0040F040 mov eax, [ebp+var_4]
CODE:0040F043 mov eax, [eax+4]
CODE:0040F046 mov edx, ebx
CODE:0040F048 call TList::IndexOf
CODE:0040F04D inc eax
CODE:0040F04E jnz short loc_40F05D
CODE:0040F050 mov eax, [ebp+var_4]
CODE:0040F053 mov eax, [eax+4]
CODE:0040F056 mov edx, ebx
CODE:0040F058 call Classes::TList::Add(void *)
I.e. we have found TList::IndexOf method.
Further we see that we are in the method of TthreadList object and TList is its member. Here we have nothing to look at. Let’s assume that there are no more XREF to Classes::TList::Add. Go in TList::IndexOf method and look at its XREF. One of them directs us here:
CODE:0040EE38 TList::Remove proc near ; CODE XREF: TThreadList::Remove+28p
And so forth for all XREF and variables that contain a pointer to the instance of a class.
Here is an example of looking through the variable:
CODE:0041319D mov eax, [ebp+var_4]
CODE:004131A0 mov edx, ds:pTList
CODE:004131A6 mov [eax+30h], edx ;a pointer to the instance of an object is being copied
CODE:004131A9 jmp short loc_4131BD
We see below:
CODE:00413236 mov eax, [eax+30h]
CODE:00413239 mov edx, [ebp+var_10]
CODE:0041323C call TList::Get
How we can identify public or private methods? We can try to do that only when all the set of methods is found. Private methods are called only inside the other object methods. I.e. we should look at XREF.
While looking for methods we advise to number them first. It means as you find the method, you name it Object1::Method1, Object1::Method2 and so on, and when all the methods are found you may begin restoration of type and number of elements.
Determination of the number of method arguments
For __cdecl и __stdcall there are few things to tell about, you just need to look on how much of them have IDA found and subtract the 1 (i.e. the 1 is a pointer to the instance of an object, and others are method arguments). There are more complications for __fastcall. First we need to remember the sequence order of arguments: EAX,EDX,ECX,stack.
The analysis begins with how much arguments that had been transmitted via stack does IDA have counted. If there are at least one, we add to it 3 (3 register’s plus the ones for stack). As first argument is allocated for This, we need to subtract the 1 from the number. The summary value is the net number of arguments.
If there are no stack arguments, we look at the beginning of the function. Delphi tries not to spoil arguments values, so each __fastcall function begins with copying from registers EAX, EDX and ECX in such way:
mov esi, edx ; first parameter
mov ebx, eax ; pThis
mov edi, ecx ; second parameter
Depending on the number of registers that are being copied, one can conclude what is the number of arguments. For example:
CODE:0040EBE0 TList::Get proc near ; CODE XREF: @GetClass+1Dp
CODE:0040EBE0 ; @UnRegisterModuleClasses+24p ...
CODE:0040EBE0
CODE:0040EBE0 var_4 = dword ptr -4
CODE:0040EBE0
CODE:0040EBE0 push ebp
CODE:0040EBE1 mov ebp, esp
CODE:0040EBE3 push 0
CODE:0040EBE5 push ebx
CODE:0040EBE6 push esi
CODE:0040EBE7 mov esi, edx
CODE:0040EBE9 mov ebx, eax
CODE:0040EBEB xor eax, eax
There are 2 arguments, 1 of them is pThis, thus TList::Get has 1 argument.
CODE:004198CC push ebp
CODE:004198CD mov ebp, esp
CODE:004198CF add esp, 0FFFFFF8Ch
CODE:004198D2 push ebx
CODE:004198D3 push esi
CODE:004198D4 push edi
CODE:004198D5 mov [ebp+var_C], ecx
CODE:004198D8 mov [ebp+var_8], edx
CODE:004198DB mov [ebp+var_4], eax
There are 3 arguments, 1 of them is for pThis, so total is 2 arguments.
We should remind you that we restore the number of arguments in initial method which is described in Delphi, and in IDA, naturally, while declaring the function type we should write all the arguments in consideration with This.