-
-
[原创]调试TerminateThread导致的死锁
-
发表于: 2019-11-11 12:39 1959
-
前言
项目里的一个升级程序偶尔会死锁,查看dump后发现是死在了ShellExecuteExW里。经验少,不知道为什么,于是在高端调试论坛里发帖求助,链接如下http://advdbg.org/forums/6520/ShowPost.aspx
根据张银奎老师的描述可知,应该是拥有关键段的线程意外结束了。仔细检查项目中的代码,发现程序中有使用TerminateThread()来强制杀线程的代码。很可疑,于是写了一个测试程序,还原了这个问题。
这也是几年前在项目中遇到的一个问题,我对之前的笔记进行了整理重新发布于此。
问题重现
重现方法
主程序会加载一个DLL,并调用该DLL的导出函数创建一个线程,然后调用TerminateThread()强制杀死这个线程,然后调用RunProcess()(内部封装了对ShellExecuteEx()的调用)执行一个新进程,会卡死在ShellExecuteEx()。为了让问题更容易重现,特地在DllMain()的参数ul_reason_for_call为DLL_THREAD_DETACH时,强制睡眠了5秒。
代码摘录
主工程 testTerminateThread
//testTerminateThread.cpp #include "stdafx.h" #include "windows.h" #include "process.h" typedef HANDLE (*pfnGenerateThread)(); HANDLE RunProcess(const TCHAR* app_name, const TCHAR* cmd) { SHELLEXECUTEINFO shex = {sizeof(SHELLEXECUTEINFO)}; shex.fMask = SEE_MASK_NOCLOSEPROCESS; shex.lpVerb = _T("open"); shex.lpFile = app_name; shex.lpParameters = cmd; shex.lpDirectory = NULL; shex.nShow = SW_NORMAL; if (!::ShellExecuteEx(&shex)) { return INVALID_HANDLE_VALUE; } return shex.hProcess; } int _tmain(int argc, _TCHAR* argv[]) { while ( 1 ) { HMODULE hModule = LoadLibrary(_T("testDll.dll")); if ( NULL == hModule ) return 0; pfnGenerateThread pfn = (pfnGenerateThread)GetProcAddress(hModule, "GenerateThread"); if ( NULL == pfn ) return 0; HANDLE hThread = pfn(); // give thread time to start up Sleep(1000); // terminate thread. BOOL bOk = TerminateThread(hThread, 0); // dead lock in this function... RunProcess(argv[0], NULL); FreeLibrary(hModule); } return 0; }
DLL工程 testDll
// DllMain.cpp #include "stdafx.h" #include "windows.h" BOOL APIENTRY DllMain( HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved ) { switch (ul_reason_for_call) { case DLL_PROCESS_ATTACH: OutputDebugString(L"====> DLL_PROCESS_ATTACH called.\n"); break; case DLL_THREAD_ATTACH: OutputDebugString(L"----> DLL_THREAD_ATTACH called.\n"); break; case DLL_THREAD_DETACH: OutputDebugString(L"<---- DLL_THREAD_DETACH called.\n"); // with LdrpLoaderLock held! sleep 5 seconds. Sleep(5000); break; case DLL_PROCESS_DETACH: OutputDebugString(L"<==== DLL_PROCESS_DETACH called.\n"); break; } return TRUE; }
// testDll.cpp #include "stdafx.h" #include "stdio.h" #include "process.h" #include "windows.h" void OutputCurrentThreadId() { TCHAR szBuffer[1024]; swprintf_s(szBuffer, L"thread [0x%x], running & exiting...\n", GetCurrentThreadId()); OutputDebugString(szBuffer); return; } unsigned __stdcall testProc(void *) { OutputCurrentThreadId(); return 0; } HANDLE GenerateThread() { HANDLE hThread = (HANDLE)_beginthreadex(NULL, 0, &testProc, NULL, 0, NULL); return hThread; }
点我下载测试工程
问题分析
运行测试程序前先打开DbgView监视调试信息,然后运行测试程序。
从日志可知,我们启动的测试线程的线程id为0x1400。
当程序hang住后,使用windbg附加。附加成功后,先运行~*kvn查看线程及每个线程的的调用栈信息。发现只有一个0号线程(1号线程是windbg附加到进程时产生的)。
0:001> ~*kvn 0 Id: 18c0.1008 Suspend: 1 Teb: 7ffdf000 Unfrozen # ChildEBP RetAddr Args to Child 00 002bf614 775a6a64 77592278 00000064 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0]) 01 002bf618 77592278 00000064 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0]) 02 002bf67c 7759215c 00000000 00000000 00000001 ntdll!RtlpWaitOnCriticalSection+0x13e (FPO: [Non-Fpo]) 03 002bf6a4 775c00e1 77637340 77bf1b77 00000000 ntdll!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo]) 04 002bf6dc 75587bc3 00000001 00000000 002bf704 ntdll!LdrLockLoaderLock+0xe4 (FPO: [Non-Fpo]) 05 002bf728 7679215d 00000000 002bf73c 00000104 KERNELBASE!GetModuleFileNameW+0x75 (FPO: [Non-Fpo]) 06 002bf948 76792112 002bfbb0 002bf968 7ffdb000 SHELL32!InRunDllProcess+0x39 (FPO: [Non-Fpo]) *** WARNING: Unable to verify checksum for C:\Users\BianChengNan\Documents\Visual Studio 2012\Projects\testTerminateThread\Debug\testTerminateThread.exe 07 002bf95c 013714db 002bfa44 002bfcbc 002bfbc0 SHELL32!ShellExecuteExW+0x51 (FPO: [Non-Fpo]) 08 002bfbb0 01371685 000ac518 00000000 00000000 testTerminateThread!RunProcess+0xdb (FPO: [Non-Fpo]) (CONV: cdecl) [c:\users\bianchengnan\documents\visual studio 2012\projects\testterminatethread\testterminatethread\testterminatethread.cpp @ 28] 09 002bfcbc 01371c69 00000001 000ac510 000ae660 testTerminateThread!wmain+0xc5 (FPO: [Non-Fpo]) (CONV: cdecl) [c:\users\bianchengnan\documents\visual studio 2012\projects\testterminatethread\testterminatethread\testterminatethread.cpp @ 59] 0a 002bfd0c 01371e5d 002bfd20 758ced6c 7ffdb000 testTerminateThread!__tmainCRTStartup+0x199 (FPO: [Non-Fpo]) (CONV: cdecl) [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 533] 0b 002bfd14 758ced6c 7ffdb000 002bfd60 775c37eb testTerminateThread!wmainCRTStartup+0xd (FPO: [Non-Fpo]) (CONV: cdecl) [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 377] 0c 002bfd20 775c37eb 7ffdb000 77bf10cb 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo]) 0d 002bfd60 775c37be 01371082 7ffdb000 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo]) 0e 002bfd78 00000000 01371082 7ffdb000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo]) # 1 Id: 18c0.193c Suspend: 1 Teb: 7ffde000 Unfrozen # ChildEBP RetAddr Args to Child 00 0133fbac 775ff20f 76a71677 00000000 00000000 ntdll!DbgBreakPoint (FPO: [0,0,0]) 01 0133fbdc 758ced6c 00000000 0133fc28 775c37eb ntdll!DbgUiRemoteBreakin+0x3c (FPO: [Non-Fpo]) 02 0133fbe8 775c37eb 00000000 76a71183 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo]) 03 0133fc28 775c37be 775ff1d3 00000000 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo]) 04 0133fc40 00000000 775ff1d3 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
通过调用栈,我们发现程序卡在了ShellExecuteExW里。
运行!cs -l看下输出结果:
0:001> !cs -l ----------------------------------------- DebugInfo = 0x77637540 Critical section = 0x77637340 (ntdll!LdrpLoaderLock+0x0) LOCKED LockCount = 0x1 WaiterWoken = No OwningThread = 0x00001400 RecursionCount = 0x1 LockSemaphore = 0x64 SpinCount = 0x00000000
注意OwningThread的值0x00001400 正是我们生成的测试线程,与我们在DbgView里看到的线程id一致。但是该线程已经被我们杀死了,它在被杀死前获得了进程加载锁0x77637340 (ntdll!LdrpLoaderLock+0x0)。
至此,真相大白。
总结
不要随便用TerminateThread来强行杀死线程!
windbg真是windows下的调试神器。
!cs -l可以帮助我们快速的查找到死锁的关键段。
参考资料
《软件调试》
《格蠹汇编》
《windows核心编程(第 5 版)》尤其是第20章
Dynamic-Link Library Best Practices
扫描左侧二维码进入公众号,扫描右侧二维码加我个人微信:)
本文作者: BianChengNan
本文链接: https://bianchengnan.gitee.io/2019/10/26/debugging-deadlock-caused-by-TerminateThread/
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 3.0 许可协议。转载请注明出处!
作者寄语: 文章的结束只是思考的开始,您宝贵的意见和建议将是我继续前行的动力,点击右侧分享按钮即可携友同行!
[招生]科锐逆向工程师培训(2024年11月15日实地,远程教学同时开班, 第51期)