See, I felt there was something wrong going on back when I first managed to divert the running dump process to the 'correct' branch at that ntdll routine...
So I cracked open the latest UPX source in order to analyze the entire UPX packing process (would also make good practice, I thought...). Finding that unUPXing and reUPXing the target with the latest version of UPX produced the following UPX stub; alayzing it piece by piece with the source code I came up with this (comments after semicolons):
CODE12E43EC0 > 60 PUSHAD ; PEMAIN01
12E43EC1 BE 00B01412 MOV ESI, flt-bio-.1214B000
12E43EC6 8DBE 00607BFE LEA EDI, DWORD PTR DS:[ESI+FE7B6000]
12E43ECC 57 PUSH EDI ; PEMAIN02
12E43ECD EB 0B JMP SHORT flt-bio-.12E43EDA ; start long NRV2B decompression...
12E43ECF 90 NOP
12E43ED0 8A06 MOV AL, BYTE PTR DS:[ESI]
12E43ED2 46 INC ESI
12E43ED3 8807 MOV BYTE PTR DS:[EDI], AL
12E43ED5 47 INC EDI
...
12E43F94 8B02 MOV EAX, DWORD PTR DS:[EDX]
12E43F96 83C2 04 ADD EDX, 4
12E43F99 8907 MOV DWORD PTR DS:[EDI], EAX
12E43F9B 83C7 04 ADD EDI, 4
12E43F9E 83E9 04 SUB ECX, 4
12E43FA1 ^ 77 F1 JA SHORT flt-bio-.12E43F94
12E43FA3 01CF ADD EDI, ECX
12E43FA5 ^ E9 2CFFFFFF JMP flt-bio-.12E43ED6 ; end NRV2B decompression
12E43FAA 5E POP ESI ; PEMAIN10
12E43FAB 89F7 MOV EDI, ESI ; PECTTNUL
12E43FAD B9 BB9B6600 MOV ECX, 669BBB ; start AddFilters32() CALLTR10 cjt
12E43FB2 B0 E8 MOV AL, 0E8 ; CALLTRE8
12E43FB4 F2:AE REPNE SCAS BYTE PTR ES:[EDI] ; CALLTR11
12E43FB6 75 17 JNZ SHORT flt-bio-.12E43FCF
12E43FB8 803F 6D CMP BYTE PTR DS:[EDI], 6D ; CTCLEVE2
12E43FBB ^ 75 F7 JNZ SHORT flt-bio-.12E43FB4
12E43FBD 8B07 MOV EAX, DWORD PTR DS:[EDI] ; CALLTR12
12E43FBF 66:C1E8 08 SHR AX, 8 ; CTBSHR11
12E43FC3 C1C0 10 ROL EAX, 10 ; CTBSWA11
12E43FC6 86C4 XCHG AH, AL
12E43FC8 29F8 SUB EAX, EDI ; CALLTR13
12E43FCA 01F0 ADD EAX, ESI
12E43FCC AB STOS DWORD PTR ES:[EDI]
12E43FCD ^ EB E3 JMP SHORT flt-bio-.12E43FB2 ; end AddFilters32()
12E43FCF 8DBE 00F05302 LEA EDI, DWORD PTR DS:[ESI+253F000] ; PEIMPORT // lea edi, [esi + compressed_imports]
12E43FD5 8B07 MOV EAX, DWORD PTR DS:[EDI] ; // next_dll:
12E43FD7 09C0 OR EAX, EAX
12E43FD9 74 45 JE SHORT flt-bio-.12E44020
12E43FDB 8B5F 04 MOV EBX, DWORD PTR DS:[EDI+4] ; // iat
12E43FDE 8D8430 04025502 LEA EAX, DWORD PTR DS:[EAX+ESI+2550204] ; // lea eax, [eax + esi + start_of_imports]
12E43FE5 01F3 ADD EBX, ESI
12E43FE7 50 PUSH EAX
12E43FE8 83C7 08 ADD EDI, 8
12E43FEB FF96 F8035502 CALL DWORD PTR DS:[ESI+25503F8] ; // call [esi + LoadLibraryA]
12E43FF1 95 XCHG EAX, EBP
12E43FF2 8A07 MOV AL, BYTE PTR DS:[EDI] ; // next_func:
12E43FF4 47 INC EDI
12E43FF5 08C0 OR AL, AL
12E43FF7 ^ 74 DC JE SHORT flt-bio-.12E43FD5
12E43FF9 89F9 MOV ECX, EDI
12E43FFB 79 07 JNS SHORT flt-bio-.12E44004 ; PEIBYORD
12E43FFD 0FB707 MOVZX EAX, WORD PTR DS:[EDI] ; PEIMORD1 // not_kernel32:
12E44000 47 INC EDI
12E44001 50 PUSH EAX
12E44002 47 INC EDI
12E44003 B9 5748F2AE MOV ECX, AEF24857
12E44008 55 PUSH EBP
12E44009 FF96 FC035502 CALL DWORD PTR DS:[ESI+25503FC] ; // call [esi + GetProcAddress]
12E4400F 09C0 OR EAX, EAX
12E44011 74 07 JE SHORT flt-bio-.12E4401A
12E44013 8903 MOV DWORD PTR DS:[EBX], EAX ; // next_imp:
12E44015 83C3 04 ADD EBX, 4
12E44018 ^ EB D8 JMP SHORT flt-bio-.12E43FF2
12E4401A FF96 0C045502 CALL DWORD PTR DS:[ESI+255040C] ; PEIEREXE // imp_failed: call [esi + ExitProcess]
12E44020 8BAE 00045502 MOV EBP, DWORD PTR DS:[ESI+2550400] ; PEIMDONE, PEDEPHAK // mov ebp, [esi + VirtualProtect]
12E44026 8DBE 00F0FFFF LEA EDI, DWORD PTR DS:[ESI-1000]
12E4402C BB 00100000 MOV EBX, 1000 ; // mov ebx, offset vp_size // 0x1000 or 0x2000
12E44031 50 PUSH EAX ; // provide 4 bytes stack
12E44032 54 PUSH ESP ; // &lpflOldProtect on stack
12E44033 6A 04 PUSH 4 ; // PAGE_READWRITE
12E44035 53 PUSH EBX
12E44036 57 PUSH EDI
12E44037 FFD5 CALL EBP ; // call VirtualProtect
12E44039 8D87 B7020000 LEA EAX, DWORD PTR DS:[EDI+2B7] ; // lea eax, [edi + swri] // in this case -- lea eax, [10901000-1000+2B7==109002B7]
12E4403F 8020 7F AND BYTE PTR DS:[EAX], 7F ; // marks UPX0 non writeable
12E44042 8060 28 7F AND BYTE PTR DS:[EAX+28], 7F ; // marks UPX1 non writeable
12E44046 58 POP EAX
12E44047 50 PUSH EAX
12E44048 54 PUSH ESP
12E44049 50 PUSH EAX ; // restore protection
12E4404A 53 PUSH EBX
12E4404B 57 PUSH EDI
12E4404C FFD5 CALL EBP ; // call VirtualProtect
12E4404E 58 POP EAX ; // pedep9: // restore stack
12E4404F 61 POPAD ; PEMAIN20
12E44050 8D4424 80 LEA EAX, DWORD PTR SS:[ESP-80] ; CLEARSTACK
12E44054 6A 00 PUSH 0
12E44056 39C4 CMP ESP, EAX
12E44058 ^ 75 FA JNZ SHORT flt-bio-.12E44054
12E4405A 83EC 80 SUB ESP, -80
12E4405D - E9 D3ACF5FD JMP flt-bio-.10D9ED35 ; PEMAIN21 // reloc_end_jmp: // PEDOJUMP // jmp original_entry
12E44062 0000
12E44064 0000
12E44066 0000
Well, the problem I had in the beginning, in redefined form, is this:
PROBLEM: Manually dumping as regular UPX (jump OEP + dump + fix IAT) creates a runtime problem (program quits). (Post factum, this isn't occuring in manually dumped MSVS<8 exes, only MSVC=8 ones. Explanation will follow...)
OK, seeing as unUPXing+reUPXing works, I'm led to believe that the problem lies within either the UPX decompression/DEPHACK/IAT-rebuilding OR a later SecuROM check on sections/header/IAT.
First, a little bit on the DEP(Data Execution Prevention/Protection)HACK (from UPX source p_w32pe.cpp pack() function):
CODE... if (use_dep_hack)
{
// This works around a "protection" introduced in MSVCRT80, which
// works like this:
// When the compiler detects that it would link in some code from its
// C runtime library which references some data in a read only
// section then it compiles in a runtime check whether that data is
// still in a read only section by looking at the pe header of the
// file. If this check fails the runtime does "interesting" things
// like not running the floating point initialization code - the result
// is an R6002 runtime error.
// These supposed to be read only addresses are covered by the sections
// UPX0 & UPX1 in the compressed files, so we have to patch the PE header
// in the memory. And the page on which the PE header is stored is read
// only so we must make it rw, fix the flags (i.e. clear
// PEFL_WRITE of osection[x].flags), and make it ro again.
// rva of the most significant byte of member "flags" in section "UPX0"
const unsigned swri = pe_offset + sizeof(oh) + sizeof(pe_section_t) - 1;
// make sure we only touch the minimum number of pages
const unsigned addr = 0u - rvamin + swri;
linker->defineSymbol("swri", addr & 0xfff); // page offset
// check whether osection[0].flags and osection[1].flags
// are on the same page
linker->defineSymbol("vp_size", ((addr & 0xfff) + 0x28 >= 0x1000) ?
0x2000 : 0x1000); // 2 pages or 1 page
linker->defineSymbol("vp_base", addr &~ 0xfff); // page mask
linker->defineSymbol("VirtualProtect", myimport + get_le32(oimpdlls + 16) + 8);
} ...
Ah, the runtime does indeed fail somewhere around MSVCR80 calls in the dump...
Let's see if this works: mark UPX0 & UPX1 as read only access rights at both the header and before the OEP via VirtualProtect... but first let's test a bit...
-- Making progress: Tested DUMP in DEBUGGING ('DUMP'=run to OEP, dump, fix IAT... 'DEBUGGING'=open in Olly, change section header access rights at 109002B7 and 109002DF to non-write (&=7f), and run.) AND FOUND TO BE WORKING!
Conclusion >>> CULPRIT IS THE MSVC8 DEP HACK!
Now trying to inline patch this in the dump (after the now unused UPX stub) and changing the entry point to it (0254406C raw address):
CODE12E4406C > 60 PUSHAD ; patch based on UPX's DEPHACK:
12E4406D BE 00109010 MOV ESI, newdump_.10901000
12E44072 8BAE 00045502 MOV EBP, DWORD PTR DS:[ESI+2550400] ; VirtualProtect
12E44078 8DBE 00F0FFFF LEA EDI, DWORD PTR DS:[ESI-1000]
12E4407E BB 00100000 MOV EBX, 1000 ; // mov ebx, offset vp_size // 0x1000 or 0x2000
12E44083 50 PUSH EAX ; // provide 4 bytes stack
12E44084 54 PUSH ESP ; // &lpflOldProtect on stack
12E44085 6A 04 PUSH 4 ; // PAGE_READWRITE
12E44087 53 PUSH EBX
12E44088 57 PUSH EDI
12E44089 FFD5 CALL EBP ; // call VirtualProtect
12E4408B 8D87 B7020000 LEA EAX, DWORD PTR DS:[EDI+2B7] ; // lea eax, [edi + swri] // in this case -- lea eax, [10901000-1000+2B7==109002B7]
12E44091 8020 7F AND BYTE PTR DS:[EAX], 7F ; // marks UPX0 non writeable
12E44094 8060 28 7F AND BYTE PTR DS:[EAX+28], 7F ; // marks UPX1 non writeable
12E44098 58 POP EAX
12E44099 50 PUSH EAX
12E4409A 54 PUSH ESP
12E4409B 50 PUSH EAX ; // restore protection
12E4409C 53 PUSH EBX
12E4409D 57 PUSH EDI
12E4409E FFD5 CALL EBP ; // call VirtualProtect
12E440A0 58 POP EAX ; // restore stack
12E440A1 61 POPAD
12E440A2 - E9 8EACF5FD JMP newdump_.10D9ED35; jump to OEP
Granted, this patch may be just a tad bit idiotic, but hey, ...
Saving... Running... >>> THE PATCHED DUMP WORKS!
Yay.
So, this appears to be an MSVC8 (M*cro$oft Visual C++/Visual Studio 2005) related issue for binaries that use MSVCRT80, and this is why it didn't occur in the earlier release I was talking about (the one that managed to unpack regulary) as it was MSVC7 based. Also, this is but one possible way to solve it... Who knows, might come useful in unpacking irregular UPXed MSVC8 targets...
Right, so... what about those 2 MOV inline-patching instructions before the jump to OEP that I had noticed back in the original UPXed exe? Well, about those 2 -- they aren't restored if you take on this using "upx -d" as Human suggested... leaving you without a missing piece in this puzzle... (meaning -- they aren't "already applied to the code" in a "upx -d" unpack (as opposed to the manual unpack), and neither do they remain as patches before the OEP -- they are completely gone.)
I don't know about you, but I definitely learned something from all this.
Thanks again for reading, and thank you for all the help.