背景
一个服务器程序,基于Windows IOCP实现,一直稳定运行,近期加入了一个异步HTTP处理后(WinHttp), 出现了轻微的内存泄露情况。
泄露检查
首先为服务器程序配置PageHeap,重启服务,对程序做FullMemoryDump取样,间隔一定时间,拿到多个样本。
使用gflags做PageHeap配置,如下图:
我此处收集了3个dmp, 分别查看其堆内存使用情况:
//Process Uptime: 0 days 0:57:09.000
0:000> !heap -s
************************************************************************************************************************
NT HEAP STATS BELOW
************************************************************************************************************************
LFH Key : 0x94d993b2
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
036c0000 08000002 2148 1884 2040 248 242 2 0 3 LFH
External fragmentation 13 % (242 free blocks)
048c0000 08001002 168 64 60 13 1 1 0 0 LFH
04a50000 08001002 60 16 60 14 1 1 0 0
04c50000 08001002 60 16 60 14 1 1 0 0
-----------------------------------------------------------------------------
//Process Uptime: 0 days 1:26:42.000
0:000> !heap -s
************************************************************************************************************************
NT HEAP STATS BELOW
************************************************************************************************************************
LFH Key : 0x94d993b2
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
036c0000 08000002 2148 1996 2040 327 267 2 0 6 LFH
External fragmentation 16 % (267 free blocks)
048c0000 08001002 168 64 60 13 1 1 0 0 LFH
04a50000 08001002 60 16 60 14 1 1 0 0
04c50000 08001002 60 16 60 14 1 1 0 0
-----------------------------------------------------------------------------
//Process Uptime: 0 days 2:16:30.000
0:000> !heap -s
************************************************************************************************************************
NT HEAP STATS BELOW
************************************************************************************************************************
LFH Key : 0x94d993b2
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
036c0000 08000002 4192 2072 4084 333 270 3 0 a LFH
External fragmentation 16 % (270 free blocks)
048c0000 08001002 168 64 60 13 1 1 0 0 LFH
04a50000 08001002 60 16 60 14 1 1 0 0
04c50000 08001002 60 16 60 14 1 1 0 0
-----------------------------------------------------------------------------
其中堆036c0000的内存提交大小在1个小时内,由1884KB增加到2072KB,其它堆的提交内存未有明显变化,因此可以怀疑内存泄露是在堆036c0000发生的,下面进一步查看其分配详情:
//Process Uptime: 0 days 0:57:09.000
0:000> !heap -stat -h 036c0000 -grpS
heap @ 036c0000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
4024 f - 3c21c (22.62)
30 8a9 - 19fb0 (9.77)
6ec 1a - b3f8 (4.23)
1ac 57 - 9174 (3.42) //==>57
a0 a9 - 69a0 (2.48)
2c 266 - 6988 (2.48)
38 1bd - 6158 (2.29)
34 1d9 - 6014 (2.26)
2000 3 - 6000 (2.26)
84 ad - 5934 (2.10)
1000 5 - 5000 (1.88)
64 a7 - 413c (1.53)
4c da - 40b8 (1.52)
4000 1 - 4000 (1.50)
b8 57 - 3e88 (1.47)
3c 107 - 3da4 (1.45)
26c 19 - 3c8c (1.42)
22e 1b - 3ada (1.38)
208 1a - 34d0 (1.24)
1034 3 - 309c (1.14)
//Process Uptime: 0 days 1:26:42.000
0:000> !heap -stat -h 036c0000 -grpS
heap @ 036c0000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
4024 f - 3c21c (22.01)
30 8a9 - 19fb0 (9.51)
1ac 78 - c8a0 (4.59) //==>78
6ec 1a - b3f8 (4.12)
2c 26a - 6a38 (2.43)
a0 a9 - 69a0 (2.42)
34 1ef - 648c (2.30)
38 1c9 - 63f8 (2.29)
2000 3 - 6000 (2.20)
84 ad - 5934 (2.04)
26c 24 - 5730 (1.99)
b8 78 - 5640 (1.97)
1000 5 - 5000 (1.83)
64 a7 - 413c (1.49)
4c da - 40b8 (1.48)
3c 112 - 4038 (1.47)
4000 1 - 4000 (1.46)
22e 1b - 3ada (1.35)
208 1a - 34d0 (1.21)
3008 1 - 3008 (1.10)
//Process Uptime: 0 days 2:16:30.000
0:000> !heap -stat -h 036c0000 -grpS
heap @ 036c0000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
4024 f - 3c21c (21.12)
30 8aa - 19fe0 (9.13)
1ac ab - 11de4 (6.28) //==>ab
6ec 1a - b3f8 (3.95)
26c 34 - 7df0 (2.76)
b8 ab - 7ae8 (2.70)
34 210 - 6b40 (2.35)
a0 ab - 6ae0 (2.35)
2c 26c - 6a90 (2.34)
38 1dd - 6858 (2.29)
84 ad - 5934 (1.96)
3c 123 - 4434 (1.50)
64 a7 - 413c (1.43)
4c db - 4104 (1.43)
4000 1 - 4000 (1.40)
2000 2 - 4000 (1.40)
1000 4 - 4000 (1.40)
22e 1b - 3ada (1.29)
208 1a - 34d0 (1.16)
3008 1 - 3008 (1.05)
可以看到,内存块1ac和26c大小的分配是比较可疑的,一直处于一个递增的状态,选择1ac的块,继续跟进分配详情:
0:000> !heap -flt s1ac
_HEAP @ 36c0000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
036f5f98 0037 0000 [00] 036f5fa0 001ac - (busy)
036f6150 0037 0037 [00] 036f6158 001ac - (busy)
036f8e90 0037 0037 [00] 036f8e98 001ac - (busy)
036f94a0 0037 0037 [00] 036f94a8 001ac - (busy)
036f9b30 0037 0037 [00] 036f9b38 001ac - (busy)
036fc4d8 0037 0037 [00] 036fc4e0 001ac - (busy)
036fe9a8 0038 0037 [00] 036fe9b0 001ac - (busy)
036ff128 0038 0038 [00] 036ff130 001ac - (busy)
036ff2e8 0038 0038 [00] 036ff2f0 001ac - (busy)
036ff570 0038 0038 [00] 036ff578 001ac - (busy)
037014d8 0038 0038 [00] 037014e0 001ac - (busy)
03701a20 0038 0038 [00] 03701a28 001ac - (busy)
03702120 0038 0038 [00] 03702128 001ac - (busy)
03703070 0038 0038 [00] 03703078 001ac - (busy)
03703e58 0038 0038 [00] 03703e60 001ac - (busy)
03704018 0038 0038 [00] 03704020 001ac - (busy)
...
...
...
0:000> !heap -p -a 03703e60
address 03703e60 found in
_HEAP @ 36c0000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
03703e58 0038 0000 [00] 03703e60 001ac - (busy)
Trace: 0a3f
77cb0d64 ntdll!RtlAllocateHeap+0x0000014d
73c0bf35 ucrtbased!heap_alloc_dbg_internal+0x00000195
73c0bd46 ucrtbased!heap_alloc_dbg+0x00000036
73c0e4ba ucrtbased!_malloc_dbg+0x0000001a
73c0edd4 ucrtbased!malloc+0x00000014
*** WARNING: Unable to verify checksum for core.exe
642cfd core!CRYPTO_malloc+0x0000004d
64302e core!CRYPTO_zalloc+0x0000001e
7be437 core!blake2s256_init+0x00000be7
6959b0 core!EVP_CipherInit_ex+0x00000230
7d87f1 core!chacha20_initctx+0x00012841
7d7509 core!chacha20_initctx+0x00011559
76ad23 core!EVP_RAND_verify_zeroization+0x00000c13
769ede core!EVP_RAND_set_ctx_params+0x0000002e
6ddeea core!rand_cleanup_int+0x0000050a
6dd064 core!RAND_get0_private+0x000000b4
6dd51b core!RAND_priv_bytes_ex+0x0000007b
67d3c1 core!BN_rand_range_ex+0x00000151
67d839 core!BN_rand_range_ex+0x000005c9
67d186 core!BN_priv_rand_range_ex+0x00000016
63f878 core!BN_BLINDING_create_param+0x000000f8
62fa1d core!RSA_setup_blinding+0x0000017d
646232 core!RSA_set_default_method+0x00000112
644bc6 core!_vsnwprintf_l+0x000012e6
62f802 core!RSA_private_decrypt+0x00000022
5b8061 core!__empty_global_delete+0x00000181
5bbb7e core!rsa_private_decrypt+0x0000009e
...
...
...
0:000> !heap -p -a 03704020
address 03704020 found in
_HEAP @ 36c0000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
03704018 0038 0000 [00] 03704020 001ac - (busy)
Trace: 0a3f
77cb0d64 ntdll!RtlAllocateHeap+0x0000014d
73c0bf35 ucrtbased!heap_alloc_dbg_internal+0x00000195
73c0bd46 ucrtbased!heap_alloc_dbg+0x00000036
73c0e4ba ucrtbased!_malloc_dbg+0x0000001a
73c0edd4 ucrtbased!malloc+0x00000014
642cfd core!CRYPTO_malloc+0x0000004d
64302e core!CRYPTO_zalloc+0x0000001e
7be437 core!blake2s256_init+0x00000be7
6959b0 core!EVP_CipherInit_ex+0x00000230
7d87f1 core!chacha20_initctx+0x00012841
7d7509 core!chacha20_initctx+0x00011559
76ad23 core!EVP_RAND_verify_zeroization+0x00000c13
769ede core!EVP_RAND_set_ctx_params+0x0000002e
6ddeea core!rand_cleanup_int+0x0000050a
6dd064 core!RAND_get0_private+0x000000b4
6dd51b core!RAND_priv_bytes_ex+0x0000007b
67d3c1 core!BN_rand_range_ex+0x00000151
67d839 core!BN_rand_range_ex+0x000005c9
67d186 core!BN_priv_rand_range_ex+0x00000016
63f878 core!BN_BLINDING_create_param+0x000000f8
62fa1d core!RSA_setup_blinding+0x0000017d
646232 core!RSA_set_default_method+0x00000112
644bc6 core!_vsnwprintf_l+0x000012e6
62f802 core!RSA_private_decrypt+0x00000022
5b8061 core!__empty_global_delete+0x00000181
5bbb7e core!rsa_private_decrypt+0x0000009e
...
...
...
似乎是在Rsa私钥解密时,申请的内存,再选几个地址看后,发现在Rsa公钥加密时也会有类似的调用栈,Rsa算法使用的是OpenSSL3.0库,在之前的版本已经稳定运行了多时,此次版本升级,似乎没有涉及到SSL相关的代码,不过已经大概确定了位置,就写测试代码详细调试跟进一下吧。
泄露回溯
(调试使用的OpenSSL库是重新下载的最新版编译的,发现依然有泄露,同时泄露的内存块大小变成了0x1a4)
//测试代码
void thread_test()
{
bytes source = { 1,2,3,4 };
bytes cipher = rsa_public_encrypt(source, "pubkey.pem");
bytes plain = rsa_private_decrypt(cipher, "prikey.pem");
}
int main()
{
int nthread = 1;
do
{
cout << "按任意键启动" << nthread << "个测试线程..." << flush;
cin.get();
for (size_t i = 0; i < 1; i++)
{
auto h = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)thread_test, NULL, 0, NULL);
CloseHandle(h);
}
} while (true);
return 0;
}
//产生泄露,并检查泄露内存的分配栈跟踪
0:004> !heap -p -a 09ab6f68
......
7a16a8b0 verifier!AVrfDebugPageHeapAllocate+0x00000240
7736ef3e ntdll!RtlDebugAllocateHeap+0x00000039
772d7080 ntdll!RtlpAllocateHeap+0x000000f0
772d6ddc ntdll!RtlpAllocateHeapInternal+0x0000104c
772d5d7e ntdll!RtlAllocateHeap+0x0000003e
7a33bf35 ucrtbased!heap_alloc_dbg_internal+0x00000195 [minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp @ 359]
7a33bd46 ucrtbased!heap_alloc_dbg+0x00000036 [minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp @ 450]
7a33e4ba ucrtbased!_malloc_dbg+0x0000001a [minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp @ 495]
7a33edd4 ucrtbased!malloc+0x00000014 [minkernel\crts\ucrt\src\appcrt\heap\malloc.cpp @ 23]
00ed467c Test!CRYPTO_zalloc+0x0000003c [X:\dofun\clibs\openssl\openssl-3.0.0-alpha16\crypto\mem.c @ 191]
栈跟踪不是很完整,00ed467c下段跟一下,此函数调用比较频繁,可以分两次操作:
首先,下在函数返回地址处,把返回值记录下来,如:bp 00ed4691 "r rax; g;",选择一个新增的泄露的内存地址,看是第几次分配的内存,我本地第11次分配时的内存未释放,(需要注意的是!heap看到的内存UsrPtr是包含块头结构的,需要减去块头大小才是记录下来的返回值,在我本地,这个大小是0x20)。
然后,下断bp 00ed4691 0n11 "r rax;", 创建新测试线程,断下后,查看堆栈如下:
0:004> k
# ChildEBP RetAddr
00 0873f4bc 01028da1 Test!drbg_ctr_init+0x2 [X:\dofun\clibs\openssl\openssl-3.0.0-alpha16\providers\implementations\rands\drbg_ctr.c @ 536]
01 0873f4e4 01028ad6 Test!drbg_ctr_set_ctx_params+0x201 [X:\dofun\clibs\openssl\openssl-3.0.0-alpha16\providers\implementations\rands\drbg_ctr.c @ 720]
02 0873f4f0 00fc945b Test!drbg_ctr_instantiate_wrapper+0x16 [X:\dofun\clibs\openssl\openssl-3.0.0-alpha16\providers\implementations\rands\drbg_ctr.c @ 335]
03 0873f514 00f55d37 Test!EVP_RAND_instantiate+0x3b [X:\dofun\clibs\openssl\openssl-3.0.0-alpha16\crypto\evp\evp_rand.c @ 524]
04 0873f60c 00f54cc7 Test!rand_new_drbg+0x217 [X:\dofun\clibs\openssl\openssl-3.0.0-alpha16\crypto\rand\rand_lib.c @ 588]
05 0873f634 00f0246b Test!RAND_bytes_ex+0xd7 [X:\dofun\clibs\openssl\openssl-3.0.0-alpha16\crypto\rand\rand_lib.c @ 358]
06 0873f654 00ed485d Test!ossl_rsa_padding_add_PKCS1_type_2_ex+0x5b [X:\dofun\clibs\openssl\openssl-3.0.0-alpha16\crypto\rsa\rsa_pk1.c @ 141]
07 0873f684 00ec58ba Test!rsa_ossl_public_encrypt+0x1ad [X:\dofun\clibs\openssl\openssl-3.0.0-alpha16\crypto\rsa\rsa_ossl.c @ 114]
08 0873f850 00ec78b3 Test!__rsa_encrypt+0x12a [X:\dofun\clibs\openssl\Test\main.cpp @ 72]
09 0873f970 00ec7b0d Test!rsa_public_encrypt+0xf3 [X:\dofun\clibs\openssl\Test\main.cpp @ 115]
0a 0873fb1c 7680fa29 Test!thread_test+0xdd [X:\dofun\clibs\openssl\Test\main.cpp @ 183]
0b 0873fb2c 772f7a4e KERNEL32!BaseThreadInitThunk+0x19
0c 0873fb88 772f7a1e ntdll!__RtlUserThreadStart+0x2f
0d 0873fb98 00000000 ntdll!_RtlUserThreadStart+0x1b
进一步检查drbg_ctr_init,发现共有3个未释放指针,共占用(3 * 0x1a4)字节:
static int drbg_ctr_init(PROV_DRBG *drbg)
{
PROV_DRBG_CTR *ctr = (PROV_DRBG_CTR *)drbg->data;
size_t keylen;
if (ctr->cipher_ctr == NULL) {
ERR_raise(ERR_LIB_PROV, PROV_R_MISSING_CIPHER);
return 0;
}
ctr->keylen = keylen = EVP_CIPHER_key_length(ctr->cipher_ctr);
if (ctr->ctx_ecb == NULL)
ctr->ctx_ecb = EVP_CIPHER_CTX_new(); //泄露点1
if (ctr->ctx_ctr == NULL)
ctr->ctx_ctr = EVP_CIPHER_CTX_new(); //泄露点2
if (ctr->ctx_ecb == NULL || ctr->ctx_ctr == NULL) {
ERR_raise(ERR_LIB_PROV, ERR_R_MALLOC_FAILURE);
goto err;
}
if (!EVP_CipherInit_ex(ctr->ctx_ecb,
ctr->cipher_ecb, NULL, NULL, NULL, 1)
|| !EVP_CipherInit_ex(ctr->ctx_ctr,
ctr->cipher_ctr, NULL, NULL, NULL, 1)) {
ERR_raise(ERR_LIB_PROV, PROV_R_UNABLE_TO_INITIALISE_CIPHERS);
goto err;
}
drbg->strength = keylen * 8;
drbg->seedlen = keylen + 16;
if (ctr->use_df) {
/* df initialisation */
static const unsigned char df_key[32] = {
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
};
if (ctr->ctx_df == NULL)
ctr->ctx_df = EVP_CIPHER_CTX_new(); //泄露点3
if (ctr->ctx_df == NULL) {
ERR_raise(ERR_LIB_PROV, ERR_R_MALLOC_FAILURE);
goto err;
}
/* Set key schedule for df_key */
if (!EVP_CipherInit_ex(ctr->ctx_df,
ctr->cipher_ecb, NULL, df_key, NULL, 1)) {
ERR_raise(ERR_LIB_PROV, PROV_R_DERIVATION_FUNCTION_INIT_FAILED);
goto err;
}
}
return drbg_ctr_init_lengths(drbg);
err:
EVP_CIPHER_CTX_free(ctr->ctx_ecb);
EVP_CIPHER_CTX_free(ctr->ctx_ctr);
ctr->ctx_ecb = ctr->ctx_ctr = NULL;
return 0;
}
进一步回溯堆栈,发现了一处可疑操作:
int RAND_bytes_ex(OSSL_LIB_CTX *ctx, unsigned char *buf, int num)
{
EVP_RAND_CTX *rand;
#ifndef OPENSSL_NO_DEPRECATED_3_0
const RAND_METHOD *meth = RAND_get_rand_method();
if (meth != NULL && meth != RAND_OpenSSL()) {
if (meth->bytes != NULL)
return meth->bytes(buf, num);
ERR_raise(ERR_LIB_RAND, RAND_R_FUNC_NOT_IMPLEMENTED);
return -1;
}
#endif
//当rand不为NULL时,使用rand作为上下文生成随机数,之后未作rand的释放动作,直接返回了。
//而rand做为一个局部变量,未作资源释放,是否会产生资源泄露?还是做了其它特殊处理?
rand = RAND_get0_public(ctx);
if (rand != NULL)
return EVP_RAND_generate(rand, buf, num, 0, 0, NULL, 0);
return 0;
}
跟进RAND_get0_public函数一探究竟:
EVP_RAND_CTX *RAND_get0_public(OSSL_LIB_CTX *ctx)
{
RAND_GLOBAL *dgbl = rand_get_global(ctx);
EVP_RAND_CTX *rand, *primary;
if (dgbl == NULL)
return NULL;
//使用线程局部存储,减少了频繁的初始化及反初始化,以优化速度。
rand = CRYPTO_THREAD_get_local(&dgbl->public);
if (rand == NULL) {
primary = RAND_get0_primary(ctx);
if (primary == NULL)
return NULL;
ctx = ossl_lib_ctx_get_concrete(ctx);
/*
* If the private is also NULL then this is the first time we've
* used this thread.
*/
if (CRYPTO_THREAD_get_local(&dgbl->private) == NULL
&& !ossl_init_thread_start(NULL, ctx, rand_delete_thread_state))
return NULL;
rand = rand_new_drbg(ctx, primary, SECONDARY_RESEED_INTERVAL,
SECONDARY_RESEED_TIME_INTERVAL);
CRYPTO_THREAD_set_local(&dgbl->public, rand);
}
return rand;
}
同时,可以发现RAND_get0_private同RAND_get0_public,使用了同样的优化机制:
EVP_RAND_CTX *RAND_get0_private(OSSL_LIB_CTX *ctx)
{
RAND_GLOBAL *dgbl = rand_get_global(ctx);
EVP_RAND_CTX *rand, *primary;
if (dgbl == NULL)
return NULL;
//使用线程局部存储,减少了频繁的初始化及反初始化,以优化速度。
rand = CRYPTO_THREAD_get_local(&dgbl->private);
if (rand == NULL) {
primary = RAND_get0_primary(ctx);
if (primary == NULL)
return NULL;
ctx = ossl_lib_ctx_get_concrete(ctx);
/*
* If the public is also NULL then this is the first time we've
* used this thread.
*/
if (CRYPTO_THREAD_get_local(&dgbl->public) == NULL
&& !ossl_init_thread_start(NULL, ctx, rand_delete_thread_state))
return NULL;
rand = rand_new_drbg(ctx, primary, SECONDARY_RESEED_INTERVAL,
SECONDARY_RESEED_TIME_INTERVAL);
CRYPTO_THREAD_set_local(&dgbl->private, rand);
}
return rand;
}
总结
看来,"内存泄露"的原因已经明确了,源于OpenSSL为了优化速度,对EVP_RAND_CTX的生成使用了线程局部存储机制,当RSA在新建线程中使用完毕时,如果未作线程资源清理动作,每调用一次RSA_public_encrypt或RSA_private_decrypt便会造成(3*0x1a4)字节的内存资源未释放。测试代码的新线程使用RSA_public_encrypt和RSA_private_decrypt各一次,因此共计(2*3*0x1a4)字节,与预期一致。
其它Rsa函数如RSA_public_decrypt等,未做深究,想来应该会有同样的问题存在。
由于在IOCP模型下不会频繁的创建新线程,这个问题便被很好地'掩盖'了, 而在加入了WinHttp异步操作后,由于频繁的在WinHttp回调函数(新线程)中执行Rsa函数,这个问题便不可避免的暴露出来了,考虑到具体应用场景,将Rsa相关操作迁移到IOCP回调中执行即可。
PS:
补充一点,在分析32位应用时,最好使用32位gflag.exe进行配置,同时使用32位任务管理器生成dmp文件且用32位WinDBG进行分析,否则可能会出现一些影响分析的兼容性问题。
[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课
最后于 2021-5-10 15:47
被Anakin Stone编辑
,原因: