Obsidian is a non-intrusive debugger, which means that it doesn’t change the targets process as a normal debugger would. Being in beta state there can be some minor issues but it should be mostly stable. Why?
Just for the fun of it and the learning of new things. If you have questions, encounter problems or would like to make suggestions for improvements or new features please drop me a line. My mail address can be found at the bottom of the page.
What is it good for?
The main advantage would be that you don’t have to care anymore about those anti-debugger-tricks like:
IsDebuggerPresent() which boils down to checking the debugger-flag in the PEB self-debugging: creating another thread or process which attaches itself to the target in order to keep other debuggers from doing so and probably doing some code ‘corrections’ during runtime. timing checks to recognize delays due to an attached debugger. How does it work? Basic information
The basics for this project, or to be correct, about the workings of a debugger came mostly from MSDN with additional insight on the course of debugging from iczelion’s tutorials. Windows API
The debugging functions are implemented by using standard Win32-API calls like: CreateProcess OpenProcess OpenThread CreateToolhelp32Snapshot SuspendThread / ResumeThread ReadProcessMemory / WriteProcessMemory GetThreadContext / SetThreadContext Breakpoints
To implement breakpoints I used a trick I learned from a very interesting paper in Codebreakers Journal. Its name is “Guide on How to Play with Processes Memory, Write Loaders and Oraculumns” and was written by Shub Nigurrath. Shub Nigurrath references the trick itself to yates and his paper “Creating Loaders & Dumpers - Crackers Guide to Program Flow Control”, so kudos to him too. The trick is to place the opcode EB FE at the address you want to stop. This code stands for “jmp -2″ which is the shortest way to code a while(1); loop I know of.
Dis-/Assembling
To dis-/assemble the opcodes, I used the awesome code of the disasm zip-file Oleh Yuschuk, creator of OllyDbg, has put on his site. OllyDbg has rightfully gained a reputation for being intuitive and a real alternative to SoftICE when it comes to ring 3 applications.
File-information
To extract some information about code and data segments and other stuff about the process I used the information gained from the paper “Portable Executable File Format – A Reverse Engineer View” written by Goppit. This paper can also be found at Codebreakers Journal.
Singlestep and stepping into calls
Since I couldn’t use debug-events, I chose the simple way out and “just” set a breakpoint on the instruction which would be executed next. This involved checking for jumps, calls and returns to make sure to get the right instruction. Checking for conditional jumps was easy since the disasm files (mentioned above) could already do this for me with the Checkcondition function. The same applies for calls. With the exception of calls that got their destination from a register. After searching for a while I found that the lower nibble of the call-opcode gave away the register that should be used. Last time I wrote about StackWalk-function and I have to admit that I was wrong about using it for returns since intel-documentation states that ret in any case uses the first element form the stack. So there’s nothing to be done except reading the DWORD pointed to by the ESP.
Thread Local Storage (TLS)
The first piece of code that will be executed when a new process is started isn’t at the address pointed to by AddressOfEntryPoint. Actually DataDirectory[IMAGE_DIRECTORY_ENTRY_TLS] in the optional header points to a IMAGE_TLS_DIRECTORY32 structure which contains a pointer to a list of functions executed before going to the AddressOfEntryPoint. Attaching to a running process
There are two handles which are needed to provide full functionality. The first one is the process-handle and the second one is the handle to the main-thread (as obsidian only supports single threaded debugging). In order to allow selection of a process, the set of process id, executable path and name for each process can be gained by employing the CreateToolhelp32Snapshot with TH32CS_SNAPMODULE as parameter. The process id from the selection can be passed to the OpenProcess-function in order to retrieve the process-handle. Getting the thread-handle required a bit more work. The OpenThread-function will return the handle if the thread id is known. Since OpenThread is only available from Windows 2000 Professional onwards the attach method will work only for those systems which provide this function. Finding the thread id can be managed by using the result of CreateToolhelp32Snapshot with TH32CS_SNAPTHREAD. Allegedly the first entry is always the main-thread of the process, so this will be the thread that’s going to be opened. By looking into the returned structure the member th32ThreadID provides the id for the thread that’s required by the OpenThread-function. With both handles and the process- and thread-ids available it is possible to provide complete functionality. Process dumping
When I started writing the code, I was wondering why there didn’t seem to be any tutorial about dumping a running process with your own program. Most tutorials I found used existing tools for it. There are some really good papers about rebuilding the IAT by the way. Which I will keep in mind for one of the next releases. As I began to reread the PE documentation it occurred to me that this is about all you need to dump an unprotected process. You can get the headers directly from the imagebase of the module and from them you can gather all the other parts. So the job is reassembling the parts scattered through process space by the loader and writing them into a file. Just keep boundaries and offsets in mind.
Stack view
After a long time the makeshift memory display for the stack, it has been replaced by an improved version. To get a better impression of the stack frames the “special” values on the stack (stored EBP, stored EIP and the address pointed to by ESP) are highlighted in different colors. In addition, anything which is not part of the active stack is displayed in gray. Another little feature, long existant but not mentioned yet, is a simple guard to make you aware of a change in the saved EIP. For this to work you need to step into the call and single step through it until you reach the RET. When the EIP was changed a messagebox will pop up and warn you about the change. Abstractions Symbols
Working with symbols is much easier than I first thought. Most work is done by the Sym*-functions provided by the imagehlp library (for example use SymGetSymFromAddr to get a symbols name by its address). So the only part which requires a bit of work, is to determine the levels of indirection so calls via a jumptable could be resolved correctly. The same goes for applying the IDA map file. Once it is parsed, it’s back to analysing references again. By the way, IDA is a very impressive disassembler by Ilfak Guilfanov (formerly DataRescue now at Hex-Rays). It provides a deeper analysis and another view to an executable than most debuggers do. Plus, as the name implies, you don’t need to actually execute the target, which is pretty cool, especially for malware analysis. Basic block analysis
After the construction of the (more or less) needed basics I decided to take a shot at improving the code analysis. A short research yielded the magical words ‘basic block’, which is a term that originated from optimization concepts of compilers. But perhaps it’s better to first explain what basic blocks are. A basic block is, generally spoken, a sequence of commands that doesn’t contain any jumps and isn’t jumped into. Where jump doesn’t mean the jmp instruction but generally everything that explicitly moves the eip anywhere. The commands I used the determine the end of a basic block are: all jumps, conditional and unconditional (e.g. jmp, je…) call ret
How are blocks and addresses handled? The analyser contains two lists, where one holds all addresses not analysed yet and the other contains the generated blocks. By doing this there is a clean separation between unknown and known blocks. To avoid an infinite loop e.g. when dealing with backward jumps the analyser only processes addresses that do not lie on the beginning of an already processed code-block. Also no processing of addresses out of the modules scope will be performed. This is done to keep the processing-time at an acceptable level. As this approach relies on splitting unknown blocks or dividable known blocks into several smaller ones, the code needs to search through the known and unknown block and check wether the current address implies a splitting or the creation of a separate block. When looking at the code I found that it is much more efficient to walk through the address list backwards (e.g. the latest blocks first) since most matches will occur in the vincinity of the last processed block. The analysis of the code starts at the entrypoint and moves onward from there on. Calls and conditional jumps both yield at best two addresses where the analysis of a new block could be started. The ‘at best’ results from the fact that at the time of analysis indirect addressing with register can’t be resolved, so this is a path that can’t analysed. When an address points into a known block this means that the block needs to be split, since an address can only come from a jump to this location which means the former block ends there and a new one begins. At the moment the analyser doesn’t make any assumptions about what could be meant but only cares for definate information. Thus there are blocks of code which haven’t been recognized and therefore are treated as filling. This affects the readability of the disassembled code. Since any opcode not flagged as code will be disassembled in byte steps. For example an opcode like 74 53 at address 00403F52 will result in the following output: 00403F52 74 53 JE 403FA7 00403F53 53 PUSH