The Target:
none
The Tools:
Ollydbg 1.10
The Protection:
none
Other Information:
This tutorial will cover some basic concepts about attacking applications in general and anti-tampering techniques. Explicit aim of this tutorial is to help making a clear understanding and classification of concepts seen in the beginners tutorials on this series.
This tutorial is meant to cover a topic that is usually just slightly pointed in common tutorials. The argument is the anti-tampering techniques used by programs to counteract the modifications of their code. Told this way, it could seem something you never seen around, but this is the name of techniques such as CRC self-checking of an application, MD5, SHA1, and so on. Did you realize it?
Self-checking (also called self-validation or integrity checking) is a technique in which a program, while running, checks itself to verify that it has not been modified. Usually it's useful to distinguish between static self-checking, in which the program checks its integrity only once, during startup, and dynamic self-checking, in which the program repeatedly verifies its integrity, as it is running. While self-checking alone is not sufficient to robustly protect software, is nowadays one of the most easier protections to implement and widely used either in commercial or custom protections.
To correctly discuss this issue I'll resemble some things coming from different sources and some contributions on my own. As usual the distinction between my original work and what is coming from another source is left intentionally hidden.
This tutorial presents the anti-tampering problem and some details of the CPU memory management which will help understanding the whole thing. An example in C is then built and debugged as a proof of concepts.
"The security of a cypher scheme shouldn't rely on keeping secret the algorithm.
The security depends only on keeping the used key secreted"
Auguste Kerckhoffs, La Cryptographie Militaire,
Journal des Sciences Militaires, vol.9, February 1883.
Before starting to discuss the argument it is better to introduce a classification of attacks types, which helps keeping ideas clear.
Broadly speaking the attacks to a system (software or not) can be classified into three categories based on the nature of malicious agent.
2.攻击的分类
“加密方案的安全性不应该依赖对算法的保密,而只需依靠对密钥的的保密。”
――Auguste Kerckhoffs, La Cryptographie Militaire,
Journal des Sciences Militaires, vol.9, February 1883.
讨论开始前不如先介绍一下攻击类型的分类,使你保持清晰的概念。
概括来讲,对一个系统(软件或其他)的攻击可基于恶意主体的性质而分为三类。
The first one, the simplest to detect and handle is when the perpetrator breaches communications access controls to attack the system. The malicious agent is still under the restrictions of the communication protocol. A good example of this genre would be a standard hacker type attack. Robust access control mechanism that isolates the system hardware and software totally form the user is enough to counter this attack.
The second, comparatively more serious attack would be like a computer virus. Such attacks originate as software running on the platform. Though the attackers have breached the communications, they must still depend on the BIOS and OS interfaces. Such attacks are generally the aftermath of a class I attack, and forebode class 3 attacks. Detection of these types of attacks is facilitated by the fact that the perpetrators attack classes of files.
Class 3 attacks are the most difficult to resolve and detect as the perpetrator is an insider. The perpetrator may modify software or hardware for the system at will. The only form of prevention for this type of attacks is to raise the technological bar to such an extent that the perpetrator would end up wasting so much time and resources that it would be a poor investment. The technical bar could be varied from no-specialized-analysis-tools to specialized-hardware-analysis tools.
Under the heading of software protection, there is another little classification to discuss. The techniques for protecting programs can be distinguished roughly into
techniques for protecting for protecting software from reverse engineering (by obfuscation),
modification (by software tamper resistance),
program-based attacks (by software diversity),
and BORE ? Break-Once Run Everywhere ? attacks (by architectural design).
在软件保护的范围下,有其他小类去讨论。程序保护技术可初步分为:
防止软件被逆向的保护技术(混淆代码)
代码修改(软件抗篡改)
基于程序的攻击(软件相异性)
BORE(Break-Once Run Everywhere)――泄露一次各处运行的攻击(架构设计)
3. Tamper Resistance Techniques and Checksumming
Introduction.
Software tamper resistance is the art of crafting a program such that it cannot be modified by a potentially malicious attacker without the attack being detected. In some respects, it is similar to fault-tolerant computing, in that potentially dangerous changes in program state are detected at runtime. For tamper resistance, however, it is assumed that intelligent, malicious attackers (rather than hardware flaws or software errors) may be responsible for such changes. Self-checking tamper resistance is distinguished in its ability to run on current unmodified commodity hardware without requiring third parties, because all the tests are performed by the application on its own code/data.
There are of course different techniques for self-checking software (e.g. program or result checking and generation of the correct executable or part of code on the fly, decryption according to a calculated digest) partially derived from asymmetric signature algorithms, but we'll focus more on checksumming.
The standard threat model for software tamper resistance is the hostile host model. The challenge is to protect an application running on a system controlled by a malicious, intelligent user. Because such a user can in theory change any code on the computer, other software on such a system, including the operating system, is untrusted; in the case of particularly determined adversaries, even the hardware is untrusted. This situation is in contrast with the hostile client problem in which we assume a trusted host and untrusted application. The hostile client problem appears to be an easier problem to solve; numerous solutions have been developed, e.g. sandboxing and is typical of mobile agents such as smartphones or PDAs.
Since a single checksum is relatively easy for an attacker to disable, stronger proposals rely on a network of inter-connected checksums, all of which must be disabled to defeat tamper resistance. A tester reads the area of memory occupied by code and readonly data, building up a checksum result based on the data read. A subsequent section of the code may operate on the checksum result, affecting program stability or correctness in a negative way if a checksum result is not the same as a known good value pre-computed at compile time. The sections of code which perform the checksumming operations may be further hidden using code obfuscation techniques to prevent static analysis. Ideally the effects of a bad checksum result in the program are subtle (e.g. causing mysterious failures much later in execution) thus making it much more difficult for an attacker to locate the checksum code.
Figure 1 - A good distribution of checksum blocks within a code segment.
Figure 1 gives a simplified view of a typical distribution of checksum code within an application. In practise, there may be hundreds of checksum blocks hidden within the main application code. Each allows verification of the integrity of a predetermined section of the code segment. The read-only data segment may also be similarly checked.
The checksumming code is inserted at compile time and integrated with regular execution code. The application also requires a correct checksum result for each block in order to work properly.
The resulting scheme is an Interlocking Trust Model, for which each segment not only on itself but also on other segments to effectively perform its task. Thus each segment of code would be made responsible for maintaining and verifying the integrity of other segments.
High level view of how the protection works, Testers and Guards
The overall protection works like this: at runtime, a large number of embedded code fragments called testers test small segments of code for integrity (using a linear hash function and an expected has value); if integrity fails, an appropriate response is pursued. The peices of code called on check failures are called guards and can be programmed to carry out arbitrary tasks (for example common guards are peices of code repairing the code, erasing the breakpoints or downloading a fresh copy of the application).
Two of the most common guards are checksumming and repairing.
Checksum Guards: The checksum guards are programmed to check a code fragment within the software. If this code is tampered with in any way the guard takes action. The type of action the guard takes can vary in degree from logging the infraction halting the program.
Repair Guards: The repair guards can fix a tampered section of code. Thus, the changes an attacker made are wiped out and the program remains secure. The repair can be made by storing a clean copy of the segment of code the guard is protection somewhere, and then replacing the dirty copy with the clean one.
The security of the guards actually comes from guards watching each other's backs. Each guard protects fragments of code, and that includes protecting other guards. Figure 1.1 shows an example of a possible network of guards.
Figure 1.1 - A possible distribution of cross-checking guards of an application's integrity.
A group of guards working together implement a sophisticated protection scheme that is more resilient against attacks than a single guard. These schemes are used by the most modern protectors, such as Armadillo also: the Nanomities trick is just a way to implements a Guard scheme and an interlocking scheme, made of two processes, controlling one each other.
In order to perform their duties, a network of guards is placed into the program and hooked to its execution flows in proper ways, before of the controlled code goes into execution: a repairing guard has to be inserted into a point in the control flow that is to be reached before guarded code (in execution order). In other words the guard has do dominate the guarded code. Guards can also be controlled by other guards too. Figure 1.2 reports another real more sophisticated example where a network of Guards (G) and Guarded Code (C) is presented beside it's distribution into the real application's memory.
Figure 1.2 - A Guards graph and the corresponding memory layout of the guarded program.
Defeating complex Guard Networks can be, generally speaking, a nightmare.
图1.2――防护码网络图示及受保护程序在相应内存中的布局
一般来说,破解复杂的防护码网络就如恶梦一般。
Anti-Tampering details
There are several aspects of such checksumming which an attacker must keep in mind:
1. Because of the overlapping network of testers, almost every checksumming block must be disabled at the same time in order for a tampering attack to be successful.
2.The resulting value from a checksum block must remain the same as the original value determined during compilation (or all uses of the checksum value must be determined and adjusted accordingly), if the results of a checksum are used during standard program execution.
3.The checksum values are only computed for static (i.e. runtime invariant) sections of the program.
4.Checksumming code is obfuscated, hard to find, and the use of checksum results is also hidden.
Note that a critical (implicit) assumption of checksumming algorithms is that D(x) = I(x), where D(x) is the bit-string result of a “data read” from memory address x, and I(x) is the bit-string result of an “instruction fetch” of corresponding length from x. In the practical everyday cases this is the situation, because D(x) and I(x) are the same portion of memory "read as data and as instructions".
If I(x) were different from D(x), then the checksumming code would always check using D(x) while the processor would always execute I(x).
Checksumming aims to verify that the code the processor executes is the original code, and thus assumes that the code it reads is the code the processor executes.
While current checksumming proposal critically rely on this assumptions there are opportunities to violate this assumption on several modern processors..
This section provides background information for those less familiar with the virtual memory subsystems of modern processors, including translation look-aside buffers (TLBs). Thos of you already familiar with processor architecture can jump directly to the next section.
Modern processors do much more than execute a sequence of instructions. Advances in processor speed and flexibility have resulted in a very complex architecture. A significant part of this complexity comes from mechanisms designed to efficiently support virtual memory. Virtual memory, first introduced in the late 1950’s, involves splitting main memory into an array of frames (pages) which can be subsequently manipulated. Virtual addresses used by an application program are mapped into physical addresses by the virtual memory system.
Figure 2 - Translation of Virtual Address into a Physical Address
Even though the page table translation algorithm may vary slightly between processors and may sometimes be implemented in software, modern processors all use roughly the same method for translating a virtual page number to a physical frame number. Specifically, this translation is performed through the use of page tables, which are arrays that associate a selected number of virtual page numbers with physical frame numbers.
Because the virtual address spaces of most processes are both large and sparse, page table entries are only allocated for the portions of the address space that are actually used. To determine the physical address corresponding to a given virtual address, the appropriate page table, and the correct entry within that page table must be located.
For systems that uses 3-level page tables, a virtual address is divided into four fields, x1 through x4. The x1 bits (the directory offset) specify an entry in a perprocess page directory. The entry contains the address of a page map table. The x2 bits (the map offset) are used as an offset within the specified page map table, giving the address of a page table. The x3 bits (the table offset) index into the chosen page table, returning the number of a physical page frame. x4, then, specifies the offset within a physical frame that contains the data referred to by the original virtual address.
This resolution process is illustrated in Figure 3. Note that if memory segments are used, segment translation typically occurs before operations involving the page table.
Figure 3 - Translation of a Virtual to Physical Address through Page Tables
TLB Translation look Aside Buffer
Because multiple memory locations must be accessed to resolve each virtual memory address, virtual address translation using page tables is a expensive operation. To speed up these mappings, a specialized high-speed associative memory store called a translation look-aside buffer (TLB) is used.
A TLB caches recently used mappings of virtual page numbers to physical page frames. On every virtual memory access, all entries in a TLB are checked to see whether any of them contain the correct virtual page number. If an entry is found for the virtual page number, a TLB hit has occurred, and the corresponding physical page frame is immediately accessed. Otherwise, we have a TLB miss, and the appropriate page tables are consulted in the fashion discussed previously. The mapping so determined is then added to the TLB by replacing the mapping that was least recently used. Figure 4 illustrates what happens on a TLB hit.
Because of the principal of locality, TLB translation works very well in practise. System designers have noticed, however, that code and data exhibit different patterns of locality. To prevent interference between these patterns, caches of code and data are often separated; for similar reasons, most modern CPUs have separate code and data TLBs.
CPU caches mark referenced memory as code or data depending upon whether it is sent to an instruction decoder. Whenever an instruction is fetched from memory, the instruction pointer is translated via the instruction TLB into a physical address. When data is fetched or stored, the processor uses a separate data TLB for the translation. Using different TLB units for code and data allows the processor to maintain a more accurate representation of recently used memory. Separate TLB’s also protect against frequent random accesses of code (data) overwhelming both TLB’s. Because most code and data references exhibit high degrees of locality, a combination of small amounts of fast storage (e.g. onchip memory caches) and more plentiful slower storage (DRAM memory) can together approximate the performance of a larger amount of fast storage.
Because the memory management unit presents a virtual address space to the application running, the application need not be aware of the physical sections of memory which it actively uses. Thus even though the virtual address space of a program is contiguous, the physical regions of memory it uses may not be. This presents a great opportunity for the operating system. Not only does it allow multiple applications to be run on the system (each with its own unique virtual address space, mapping to different physical pages), but it allows the operating system to only keep in physical memory those parts of each application required at the current time.
Since not all pages of virtual memory may map to a physical page, there must be some way for the processor to inform the OS when a virtual address does not have a physical mapping. The processor does this through the use of a page fault interrupt. The processor will store the virtual address which caused the page fault in a register, and then signal the operating system through an interrupt handler. The operating system updates the mapping of virtual to physical addresses, so that the requested virtual address can be mapped to a physical address.
This may mean bringing the section of the program into physical memory from disk or some other external storage. The OS then signals the processor to retry the instruction by returning from the interrupt. The OS also has the choice of aborting execution of the application if it determines that the virtual address is invalid, e.g. if the virtual address refers to memory that has not been located.
Along with the translation of a virtual to physical address, the processor may implement access protection on memory regions. Since the virtual memory subsystem already splits physical memory into small areas (frames), it makes sense that the same memory management unit would also implement access control on a per-frame basis. The most important protection is that only pages that an application is allowed to access are mapped into its page table. To prevent an application from manually mapping a page into its address space, the page directory base pointer is stored in a read-only register, and the frames containing a process’s page table are themselves not accessible by the process.
In addition, there are protection mechanisms for pages which are in a process’s address space. Each mapped page is restricted in the types of operations that may be performed on its contents: read, write, and instruction fetch (also called execute). Permitted operations are specified using control bits associated with each page table entry. Read and write are common operations on data pages, while executing code is commonly associated with a page containing executable code.
Modern operating systems take advantage of the protection mechanisms implemented by the processor to distinguish various types of memory usage.
The ability to set no-execute permission on a per-page basis produces the restriction that many programs are confined to executing code from their code segment, unless they take specific action to make their data executable.
Although such changes can interfere with systems that generate machine code at runtime (e.g. modern Java Virtual Machines), many types of code injection attacks can be defeated by nonexecutable data pages
Table 1 - Separation of access control privileges for different page types
Table 1 shows the ideal separation of privileges for different sections of an application. This separation of privileges is currently assumed in executable file formats. All processors implementing page level access controls must check for disallowed operations and signal the operating system appropriately. Most often, the operating system is signalled through the page fault interrupt, which indicates the memory reference that caused the invalid operation.
5. The memory model of the x86 architecture, the segments
In addition to supporting memory pages, the x86 can also manage memory in variable sized chunks known as segments.
x86体系的内存模式及分段
x86体系通过管理可变大小内存块(也称为段)来维持内存页面。
Associated with each segment is a base address, size, permissions, and other meta-data. Together this information forms a segment descriptor. To use a given segment descriptor, its value is loaded into one of the segment registers. Other than segment descriptor numbers, the contents of these registers are inaccessible to all software. In order to update a segment register, the corresponding segment descriptor must be modified in kernel memory and then reloaded into the segment register.
A logical address consists of a segment register specifier and offset. To derive a linear address, a segment register’s segment base (named by the segment specifier) is added to the segment offset. An illustration of the complete translation mechanism for the x86 architecture is shown in Figure 6. Code reads are always relative to the code segment (CS) register, while normally, if no segment register is specified data reads use the data segment (DS) register. Through segment overrides a data read can use any segment register including CS. After obtaining a linear address, normal page table translation is done as shown in Figure 6 and Figure 7.
Figure 6 - Translation from virtual to physical addresses on the x86
图6――x86体系中虚拟地址到物理地址的转换
Figure 7 - Translation of a get using segment overrides
图7――使用段超越的转换
Unlike pages on the x86, segments can be set to only allow instruction reads (execute-only). Data reads and writes to an execute-only segment will generate an exception. This execute-only permission can be used to detect when an application attempts to read memory relative to CS. As soon as the exception is delivered to an OS modified for our attack, the OS can automatically modify the memory map (see Figure 7) to make it appear as if the unmodified data was present at that memory page.
Most operating systems for x86, however, now implement a flat memory model. This means that the base value for the CS and DS registers are equal; an application need not use the CS register to read its code. A flat memory model will ensure that both linear addresses are the same, resulting in the same physical address (see the dash-dot-dot line in Figure 7).
6. Writing and debugging a selfchecking program in C
Resuming what we said till now it can be stated that: "detecting whether portions of a binary have been modified is essentially an error-detection problem; therefore, a checksum algorithm such as CRC32, MD5, or SHA1 can be used to generate a signature for an arbitrary block of code or data. This signature can then be checked at runtime to determine whether any modification has taken place."
I have chosen the CRC32 algorithm both for its ease of implementation and for its speed. It is ideal for detecting changes to short sequences of bytes; however, because there are only 232 possible checksum values, and because it is not cryptographically secure, the likelihood of a collision is high, giving the attacker a realistic chance to replace code without changing the checksum.
Here, first of all, I included a classical implementation of a CRC32 API, which I found to be very compact and useful. This is an implementation of CRC32, which consists of macros for marking the start and end of the block to be checked, as well as a function to calculate the checksum of the block. The function crc32_calc( ) is used to compute the checksum of a buffer.
static int crc32(unsigned long a, unsigned long b) {
int idx, prev;
prev = (a >> 8) & 0x00FFFFFF;
idx = (a ^ b) & 0xFF;
return (prev ^ crc32_table[idx] ^ 0xFFFFFFFF);
}
static unsigned long crc32_table_init(void) {
int i, j;
unsigned long crc;
for (i = 0; i < CRC_TABLE_LEN; i++) {
crc = i;
for (j = 8; j > 0; j--) {
if (crc & 1) crc = (crc >> 1) ^ CRC_POLY;
else crc >>= 1;
}
crc32_table[i] = crc;
}
return 1;
}
unsigned long crc32_calc(unsigned char *buf, int buf_len) {
int x;
unsigned long crc = 0xFFFFFFFF;
if (!crc32_table[0]) crc32_table_init( );
for (x = 0; x < buf_len; x++) crc = crc32(crc, buf[x]);
return crc;
}
The following program demonstrates the use of the checksum implementation. Note that the program is first compiled with a printf( ) in main( ) that will print the checksum to stdout. As long as main( ) is linked into the program after the buffer being checked, this printf( ) can be removed and the program recompiled without the value of the checksum changing. Once the checksum is known, a hex editor can be used to patch the checksum value into the location crc32_stored. In this example, the four bytes of the checksum are stored between two 0xBAADF00D markers that should be overwritten with random bytes before the binary is distributed. Note that the markers will be stored in little-endian order in the binary, hence the reversed ordering of the bytes in the C source.
/* warning: replace "crc32_stored" with the real checksum! 警告:用校验和真值替换crc32_stored*/
asm(".long 0x0DF0ADBA \n" /* look for 0xBAADF00D markers 寻找0xBAADF00D标志符*/
"crc32_stored: \n"
".long 0xFFFFFFFF \n" /* change this in the binary! 用二进制码更改这里*/
".long 0x0DF0ADBA \n" /* end marker 结束标志符 */
);
CRC_START_BLOCK(test)
int test_routine(int a) {
while (a < 12) a = (a - (a * 3)) + 1;
return a;
}
CRC_END_BLOCK( test )
int main(int argc, char *argv[ ]) {
unsigned long crc;
crc = crc32_calc(CRC_BLOCK_ADDR(test), CRC_BLOCK_LEN(test));
#ifdef TEST_BUILD
/* This printf( ) displays the CRC value that needs to be stored in the program.
* The printf( ) must be removed, and the program recompiled before distribution.
* 此printf( )函数程序需要存储的CRC值。
* 此printf( ) 函数必需移除,分配前程序会重新编译。*/
printf("CRC is %08X\n", crc);
#else
if (crc != crc32_stored) {
printf("CRC32 %#08X does not match %#08X\n", crc, crc32_stored);
return 1;
}
printf("CRC32 %#08X is OK\n", crc);
#endif
return 0;
}
You should compile this program with TEST_BUILD defined, then execute it to obtain the CRC value that needs to be replaced for crc32_stored in the binary. Then rebuild the program with TEST_BUILD undefined, and modify the binary with the proper CRC value from the first run. To calculate the CRC32 of a program you can use the CRC_calculator given with this tutorial, just drag & drop the executable over to ge the CRC value.
It is tempting to generate a checksum of the entire program and use this to determine whether any bytes have been changed; however, this causes a loss of granularity for the checksum and can degrade performance. Instead, as explained in previous sections is always better to generate multiple checksums for vital sections of code. These checksums can be encrypted, and they can even be supplemented with checksums of trivial blocks of code to disguise which portions of code are significant.
Notes about compiling the example1 with VisualC++ 6.0
The code relative to this example is included into the example1 folder. If you open it you might notice some differencies with what I just written above. These differencies are because of not complete compliance to standard C of Visual C++ 6.0. With respect to the code explained above, there are some differencies, just because the Visual C++ 6.0 I used doesn't support the standard asm C operator. I had to use instead the Microsoft specific __asm operator which has some clearly documented limitations. First of all is not possible to directly define variables using the DB directive. I have to use instead the pseudoinstruction _emit which force the compiler to emit the specified bytes, simulating so the DB directive. The ASM part is also modified to copy the address of the crc value to a local variable usable by the C code.
To make things shorter I'll not include this code here too, look the file for further comments and clarifications.
关于用 VisualC++ 6.0编译例1(example1)的注意事项
example1文件夹中有这个例子的源代码。如果你打开它,你也许会注意到它跟上面我所写的有些不同。这是由于Visual C++ 6.0与标准C并不完全一致。两者的一些差别仅仅因为Visual C++ 6.0 不支持标准C的操作符asm。我只能用 __asm关键字来代替 ,关于__asm的一些限制,微软文档有清楚的说明。首先,不能用DB伪指令直接定义变量。我改用伪指令_emit强制编译器放出指定的字节来模拟DB。经修改ASM部分也使C代码可用,其作用是复制CRC值的地址到一个局部变量。
限于篇幅,这里就不引用代码了。请直接查看该文件以获得更多注释和说明。
To compile use "cl crc1.c /DTEST_BUILD" or "cl crc1.c"
Note that the programs are console based, they will just close and exit when double clicking them. You need to run them from within a DOS box, to see the printed messages..
编译使用"cl crc1.c /DTEST_BUILD" 或 "cl crc1.c"
注意程序是基于控制台的,双击它就会关闭退出。你须在DOS窗口运行它以看到显示的信息。
Looking the example1 into OllyDbg
Open with ollydbg the crc1.exe file inside example1 folder of the present tutorial's archive. Hit CTRL-B to open the Binary Search dialog and search for 0xBAADF00D hex value. You should land here:
I properly placed the comments to help you find the correspondence with the written code (refers to the modified code used for the example1 and not to the code above).
Place a Breakpoint with F2 at the JMP at 0x0040114C and press F9 to reach the breakpoint. First of all if you step with F8 the program at 0x00401180 you can see that the real CRC just-in-time calculated is 0x6C8673E6 (it's in ECX).
Well you have the opportunity then to run again the application from the beginning and to stop again the application at 0x0040114C. Now go to location 0x401152 and select "Follow in dump" option to let you view in Ollydbg the memory at that location.
好了,你有机会重头开始运行程序,然后再次停在0x0040114C处。来到0x401152处并右键选择"Follow in dump"(数据窗口中跟随,译者注),在Ollydbg中会看到此处的内存内容。
Now insert the real CRC value you've just seen before (0x6C8673E6), remember to insert it as 0xE673866C.
Now save the modifications to a new executable and run from outside Ollydbg, the application runs perfectly. I saved this second executable into crc1_ok.exe
Weel, obviously the markers are not mandatory and are useful only to easily find the proper point where to insert the real CRC value, the whole routine can be more efficiently hidden and the check can be made less evident.
Analizying a more more complex example, example2
Generally speaking to fool the protection you only have to change the JE jump (opcode 0x74) at offset 0x401183 to a JMP instruction (opcode 0xEB). The generated checksum should never be checked against a valid one; instead, the generated checksum should be used as a source of information that the program requires to execute properly. A byte within the checksum could be used as an index into a jump table, for example, or the checksum itself could be used as a key to decrypt critical code or data.
The next program demonstrates how to use a table of function pointers to test the value of a checksum. Each nibble or half-byte in the checksum is used as an index into a 16-entry table of function pointers; only the correct table entry calls the function to check the next nibble. This method requires 8 tables of 16 function pointers so that one table is used for each nibble in the checksum.
When this program is compiled with TEST_BUILD defined, the resulting binary will print the CRC32 computed for the function test_routine( ). If the computed CRC32 is 0xFFF7FB7C, the following table indices will represent valid function pointers: b1[12], b2[7], b3[11], b4[15], b5[7], b6[15], b7[15], b8[15]. Each of these contains a pointer to the function that will process the next nibble in the checksum, except for b8[15], which contains a pointer to the function that is called when the checksum has proven valid. The tables in the source can now be rewritten to reflect these correct values:
Obviously, the NULL bytes will have to be replaced with other values to disguise the fact that they are invalid entries. They can be replaced with pointers to functions that handle incorrect checksums, or they can be filled with garbage values to make the program unstable. For example:
In this table, the use of incrementally increasing values causes the table to appear to be valid data, as opposed to addresses in the code segment.
表中,与代码段地址相对,递增的数值看起来像是有效数据。
Notes about compiling the example2 with VisualC++ 6.0
This time the example doesn't have any specific modification to be compiled on VC++ 6.0. Follow the same commands as before to compile it.
Into the example2 folder there are two sources, the first crc2_step1.c must compiled using the TEST_BUILD define, because has an unitialized crc_check_fn matrix. The crc2_step2.c instead has this matrix correctly initialized.
If you compile the program crc2_step1.c using TEST_BUILD you should get this value 0X6C8673E6. According to what I just said before the crc_check_fn matrix should becomes:
If you try to open the crc2_step2 into Olly you can see that it is not a so trivial task to modify the code and still obtain a valid running program. Looking for the references to the string "CRC is valid" you should land at 0x004012C4. If you place a breakpoint with F2 and press run to the program with F9 you land to the correct place. Now if you press CTRL-K to see the call stack you should easily see that this subroutine is called from the
如果尝试用OllyDbg打开crc2_step2,你会发觉修改代码并保持程序运行稳定并不是件很容易的事情。寻找字串参考“CRC is valid”,你会来到0x004012C4处。若F2下断后F9重新运行程序,你会到达正确地址。现在按CTRL-K查看调用堆栈,你应该很容易看到此子程序的调用来自
This local variable is a pointer to the selected address into the function pointer addresses contained into the b8[16] vector.
此局部变量是个指针,指向函数指针地址中选中的地址,b8[16]向量包含了这个函数指针地址。
Well, indeed there's still space for a more intelligent reversing of this application. We will see now how crypto algorithms can be identified using the constants they use. CRC32 is a very easy algorithm but also all the other more complex are not so complex..we'll see it in the next, last, section! Have a little more patient, we arriving at the end ^_°
7. Identifying self-checking algorithms into programs
Breaking the example1 program
To introduce you to the problem let approach to the program crc1_ok.exe inside the example1 folder in the classical way. First of all be sure to have PEiD with the KryptoAnalizer (aka KANAL) installed. Run PEiD and select the crc1_ok.exe program, then select plugins and run the plugin just mentioned.
You should obtain (I assume you are alredy expert on PEiD, otherwise return to Beginner tut #1) something like below:
The plugin reports clearly an address at 0x004010DF. Well, fire up Olly and go there to see what's happening there..
插件清楚显示了一个地址:0x004010DF。打开OllyDbg跟进去看看究竟是怎么回事。
You landed here, to get the meaning of the place where you are move a little the cursors, to realign to the actual real code (you landed in the middle of an instruction, why it will be clear in a moment). Your Olly then should show something like below:
Well, we were at the address of a constant, 0xEDB88320 which was part of an instruction XOR ECX,EDB88320
好,这是常量的地址,0xEDB88320 就是 XOR ECX,EDB88320指令的一部分。
Now look the CRC32 implementation I did in Section 6, you can find this line:
现在看看我在第6节所讲的CRC32的实现,会找到这行:
#define CRC_POLY 0xEDB88320L
This is the same constant KANAL plugin found. If you have the opportunity/time to see the sources of KANAL (freely available on PEiD site) you will realize that KANAL is doing nothing more than searching for specific constants in memory, any crypto algorithm (better hashing algorithm) uses some specific constants somewhere in the algorithm.
At this stage we then understood we landed in the middle of CRC32 algorithm implementation. What we should do is to go at the entrypoint of the CRC32 routine that is, in the example1 sources, the crc32_calc() function.
Now you arrived at destination. Look the code where you are now (at 0x00401172) and compare it with the code shown before in the debugging section of the example1, it's the same place.
Breaking the example2 program
We are now using the method learnt for the easyer example1 with the more complex example2. Run KANAL on the file crc2_step2.exe into the example2 folder. You should obtain something like this:
Well, place a breakpoint at this specific address and run the program using F9. Execute the call at 0x004012EA with F8, and you should have in EAX the real CRC32 value just calculated.
If you step with F8 and follow this call, you will see that the called function is another one, almost identical . These functions are the functions responsible to extract from the b1[]; b8[] vectors the correct function pointers, which finally will lead you to the end of the program.
Think of this example mixed with an obfuscator or compressor like AsProtect or Armadillo with a more complex hashing algorithm than CRC32 and you will get how it would be easy to write less easily crackable programs.
What I shown here is a long journey on self-checking programs and the anti-tampering techniques programs uses. I also shown the theoretical assumptions that are behind these solutions and two practical examples. You should have realized how easy is to build custom protections for programs, that combined with known commercial protectors give extra barrires to crackers. Fortunately on the other hand much programmers do not ever consider them doing our cracking-lifes much of the time easy.
I suggest the following further readings from now on to complete this argument..
O'Reilly - Secure Programming Cookbook for C and C++
There a lot of tutorials on http://tutorials.accessroot.com which fool anti-tampering protections involving hashing.
Intel - IA-32 Intel Architecture Software Developer's Manual, volume 3: System Programming Guide
..and essentially all the tutorials seens around (also others on our tutorials page) which often target checksummed applications..
参考资料
我建议从现在起进一步阅读下面的资料,完善这方面的知识:
O'Reilly - Secure Programming Cookbook for C and C++
http://tutorials.accessroot.com中有许多关于破解反篡改保护与散列密码的教程。
Intel - IA-32 Intel Architecture Software Developer's Manual, volume 3: System Programming Guide
基本上你能找到的以校验和程序为目标的全部教程(包括我们教程版面的其他文章)。
Thanks to all the people who take time to write tutorials.
Thanks to all the people who continue to develop better tools.
Thanks to Exetools, Woodmann, SND, TSRH, MP2K and all the others for being a great place of learning.
Thanks also to The Codebreakers Journal, and the Anticrack forum.
If you have any suggestions, comments or corrections contact me in usual places..