[原创]暴力搜索内存必备Sig特征码SSE2加速(支持模糊匹配)-软件逆向-看雪-安全社区|安全招聘|kanxue.com

[原创]暴力搜索内存必备Sig特征码SSE2加速(支持模糊匹配)

发表于: 2016-5-2 08:58 18543

[原创]暴力搜索内存必备Sig特征码SSE2加速(支持模糊匹配)

liuzewei

2016-5-2 08:58

18543

暴力搜索爱好者必备工具函数，SSE2加速，支持模糊匹配（内核，用户，32BIT，64BIT通用）
这样基本就够用了，如搜索字符串、特征码，模糊匹配动态代码、函数地址、数据地址等

因为某些原因需要暴力搜索整个物理内存（本人电脑是8G物理内存），但极端情况下（8G内存
全部命中失败，特征码长度超过30位）时效率不能满足需要（一次完整遍历需要9025毫秒）。

又因为某些原因，在下几年前曾实现了一套 stristr strista 等的 sse2 加速版本。当时实
现时 google 百度了几乎所有的相关信息（发现还蛮多人在搞这个的，因为标准库刚好缺这个
函数），并反复 profile 了各种实现，从代码级别到算法级别，自认实现了我所知的最快版
本。

因此，为了能愉快的玩耍一下，所以优化了一下标准SIG特征码匹配查找函数（655毫秒），加
速了大概13倍。

想必来这个论坛的兄弟姐妹没有几位的代码库里没有与此相似工具函数，小弟不才，分享给大
家，抛砖引玉。

登录后可查看完整内容

[注意]传递专业知识、拓宽行业人脉——看雪讲师团队等你加入！

#系统底层

收藏・77

免费・10

支持

最新回复 (28) 1 2 ▶
lhglhg 雪币： 221 活跃值： (2311) 能力值： ( LV4，RANK：50 ) 在线值：发帖 8 回帖 434 粉丝 3 关注私信	lhglhg 1 2 楼需要增加哪些头文件？ 2016-5-2 09:41 0
liuzewei 雪币： 324 活跃值： (91) 能力值： ( LV9，RANK：140 ) 在线值：发帖 9 回帖 65 粉丝 6 关注私信	liuzewei 3 3 楼 #ifdef WIN32 # ifndef WIN32_LEAN_AND_MEAN # define WIN32_LEAN_AND_MEAN # endif # include <windows.h> # ifndef PAGE_SIZE # define PAGE_SIZE 0x1000 # endif #else # include <ntifs.h> # ifndef MAX_PATH # define MAX_PATH 260 # endif #endif 2016-5-2 09:44 0
靴子雪币： 22 活跃值： (443) 能力值： ( LV2，RANK：10 ) 在线值：发帖 19 回帖 524 粉丝 1 关注私信	靴子 4 楼参数如何传递的举个例子呀！ 2016-5-2 10:12 0
值得怀疑雪币： 2325 活跃值： (4903) 能力值： ( LV2，RANK：10 ) 在线值：发帖 1 回帖 357 粉丝 0 关注私信	值得怀疑 5 楼 CurHead = _mm_loadu_si128((__m128i*)(VirtualAddress + i)); CurComp = _mm_cmpeq_epi8(SigHead, CurHead); MskComp = _mm_movemask_epi8(CurComp); 这是函数呢？？？ 2016-5-2 11:09 0
lhglhg 雪币： 221 活跃值： (2311) 能力值： ( LV4，RANK：50 ) 在线值：发帖 8 回帖 434 粉丝 3 关注私信	lhglhg 1 6 楼需要 emmintrin.h 支持 2016-5-2 11:36 0
lhglhg 雪币： 221 活跃值： (2311) 能力值： ( LV4，RANK：50 ) 在线值：发帖 8 回帖 434 粉丝 3 关注私信	lhglhg 1 7 楼 xmmintrin.h(34) : fatal error C1189: #error : "SSE instruction set not enabled" 2016-5-2 11:45 0
lhglhg 雪币： 221 活跃值： (2311) 能力值： ( LV4，RANK：50 ) 在线值：发帖 8 回帖 434 粉丝 3 关注私信	lhglhg 1 8 楼只能用GCC 编译器？ VS编译不行？ 2016-5-2 11:52 0
sxpp 雪币： 231 活跃值： (2631) 能力值： ( LV5，RANK：60 ) 在线值：发帖 11 回帖 343 粉丝 4 关注私信	sxpp 1 9 楼 VS可以编译~~~~~~~ 2016-5-2 15:45 0
VCKFC 雪币： 9941 活跃值： (2143) 能力值： ( LV3，RANK：20 ) 在线值：发帖 32 回帖 319 粉丝 16 关注私信	VCKFC 10 楼不错,可以学习一下SSE 2016-5-2 16:07 0
liuzewei 雪币： 324 活跃值： (91) 能力值： ( LV9，RANK：140 ) 在线值：发帖 9 回帖 65 粉丝 6 关注私信	liuzewei 3 11 楼参数传递在函数注释里有的 2016-5-2 22:46 0
liuzewei 雪币： 324 活跃值： (91) 能力值： ( LV9，RANK：140 ) 在线值：发帖 9 回帖 65 粉丝 6 关注私信	liuzewei 3 12 楼头文件我加入1楼了，摘录出来时没有再编译，我的锅。现在摘录出来单独编译测试过了： 1>------ Build started: Project: ConsoleApplication1, Configuration: Debug x64 ------ 1> Source.cpp ========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ========== 2016-5-2 22:50 0
liuzewei 雪币： 324 活跃值： (91) 能力值： ( LV9，RANK：140 ) 在线值：发帖 9 回帖 65 粉丝 6 关注私信	liuzewei 3 13 楼编译器都是直接翻译成对应SSE汇编指令的， DEBUG版 CurHead = _mm_loadu_si128((__m128i*)(VirtualAddress + i)); 000000013F7B1237 movdqu xmm0,xmmword ptr [rax] 000000013F7B123B movdqa xmmword ptr [rsp+10F0h],xmm0 000000013F7B1244 movdqa xmm0,xmmword ptr [rsp+10F0h] 000000013F7B124D movdqa xmmword ptr [CurHead],xmm0 CurComp = _mm_cmpeq_epi8(SigHead, CurHead); 000000013F7B1256 movdqa xmm0,xmmword ptr [SigHead] 000000013F7B125F pcmpeqb xmm0,xmmword ptr [CurHead] 000000013F7B1268 movdqa xmmword ptr [rsp+1100h],xmm0 000000013F7B1271 movdqa xmm0,xmmword ptr [rsp+1100h] 000000013F7B127A movdqa xmmword ptr [CurComp],xmm0 MskComp = _mm_movemask_epi8(CurComp); 000000013F7B1283 movdqa xmm0,xmmword ptr [CurComp] 000000013F7B128C pmovmskb eax,xmm0 000000013F7B1290 mov dword ptr [MskComp],eax RELEASE版： 000000013FCB1080 movdqu xmm0,xmmword ptr [rdi] 000000013FCB1087 pcmpeqb xmm0,xmm1 000000013FCB108B pmovmskb ebx,xmm0 while (_BitScanForward(&IdxComp, MskComp)) 000000013FCB108F bsf r9d,ebx 2016-5-2 22:56 0
lhglhg 雪币： 221 活跃值： (2311) 能力值： ( LV4，RANK：50 ) 在线值：发帖 8 回帖 434 粉丝 3 关注私信	lhglhg 1 14 楼头文件也上传吧。谢谢了。。 2016-5-2 23:45 0
VCKFC 雪币： 9941 活跃值： (2143) 能力值： ( LV3，RANK：20 ) 在线值：发帖 32 回帖 319 粉丝 16 关注私信	VCKFC 15 楼 /----------Floating Point Intrinsics Using Streaming SIMD Extensions------------/ //Arithmetic Operations(Floating Point ):add、sub、mul、div、sqrt、rcp、min、max //---------------------说明：_ps结尾的指令表示对4个单精度浮点数同时进行运算， //_ss结尾的指令表示仅对4个单精度浮点数最低位的浮点数进行运算--------------------- //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数相加， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(_A0+_B0, _A1, _A2, _A3) extern __m128 _mm_add_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数相加， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0+_B0, r1=_A1+_B1, r2=_A2+_B2, r3=_A3+_B3 extern __m128 _mm_add_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数相减， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(_A0-_B0, _A1, _A2, _A3) extern __m128 _mm_sub_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数相减， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0-_B0, r1=_A1-_B1, r2=_A2-_B2, r3=_A3-_B3 extern __m128 _mm_sub_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数相乘， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(_A0_B0, _A1, _A2, _A3) extern __m128 _mm_mul_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数相乘， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0_B0, r1=_A1_B1, r2=_A2_B2, r3=_A3_B3 extern __m128 _mm_mul_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数相除， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(_A0/_B0, _A1, _A2, _A3) extern __m128 _mm_div_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数相除， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0/_B0, r1=_A1/_B1, r2=_A2/_B2, r3=_A3/_B3 extern __m128 _mm_div_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，仅将寄存器_A最低对应位置的32bit单精度浮点数开平方， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3) //则返回寄存器为r=(sqrt(_A0), _A1, _A2, _A3) extern __m128 _mm_sqrt_ss(__m128 _A); //返回一个__m128的寄存器，将寄存器_A中4个32bit单精度浮点数开平方， //例如_A=(_A0,_A1,_A2,_A3)，则返回寄存器为 //r=(sqrt(_A0), sqrt(_A1), sqrt(_A2), sqrt(_A3)) extern __m128 _mm_sqrt_ps(__m128 _A); //返回一个__m128的寄存器，仅将寄存器_A最低对应位置的32bit单精度浮点数取倒数， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3) //则返回寄存器为r=(recip(_A0), _A1, _A2, _A3) extern __m128 _mm_rcp_ss(__m128 _A); //返回一个__m128的寄存器，将寄存器_A中4个32bit单精度浮点数取倒数， //例如_A=(_A0,_A1,_A2,_A3)，则返回寄存器为 //r=(recip(_A0), recip(_A1), recip(_A2), recip(_A3)) extern __m128 _mm_rcp_ps(__m128 _A); //返回一个__m128的寄存器，仅将寄存器_A最低对应位置的32bit单精度浮点数取平方根的倒数， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3) //则返回寄存器为r=(recip(sqrt(_A0)), _A1, _A2, _A3) extern __m128 _mm_rsqrt_ss(__m128 _A); //返回一个__m128的寄存器，将寄存器_A中4个32bit单精度浮点数取平方根的倒数， //例如_A=(_A0,_A1,_A2,_A3)，则返回寄存器为 //r=(recip(sqrt(_A0)), recip(sqrt(_A1)), recip(sqrt(_A2)), recip(sqrt(_A3))) extern __m128 _mm_rsqrt_ps(__m128 _A); //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数取最小值， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(min(_A0,_B0), _A1, _A2, _A3) extern __m128 _mm_min_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数取最小值， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=min(_A0,_B0), r1=min(_A1,_B1), r2=min(_A2,_B2), r3=min(_A3,_B3) extern __m128 _mm_min_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数取最大值， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(max(_A0,_B0), _A1, _A2, _A3) extern __m128 _mm_max_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数取最大值， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=max(_A0,_B0), r1=max(_A1,_B1), r2=max(_A2,_B2), r3=max(_A3,_B3) extern __m128 _mm_max_ps(__m128 _A, __m128 _B); //Logical Operations(SSE)：and、andnot、or、xor //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数分别进行按位与运算， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0 & _B0, r1=_A1 & _B1, r2=_A2 & _B2, r3=_A3 & _B3 extern __m128 _mm_and_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A对应位置的32bit单精度浮点数的非和寄存器_B对应位置的32bit //单精度浮点数分别进行按位与运算，例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=~_A0 & _B0, r1=~_A1 & _B1, r2=~_A2 & _B2, r3=~_A3 & _B3 extern __m128 _mm_andnot_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数分别进行按位或运算， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0 \| _B0, r1=_A1 \| _B1, r2=_A2 \| _B2, r3=_A3 \| _B3 extern __m128 _mm_or_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数分别进行按位异或运算， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0 ^ _B0, r1=_A1 ^ _B1, r2=_A2 ^ _B2, r3=_A3 ^ _B3 extern __m128 _mm_xor_ps(__m128 _A, __m128 _B); //Comparison Intrinsics(SSE):==、<、<=、>、>=、!=、不小于、不小于等于、不大于、不大于等于 //返回一个__m128的寄存器，Compares for equality, //r0=(_A0 == _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpeq_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for equality, //r0=(_A0 == _B0) ? 0xffffffff : 0x0, r1=(_A1 == _B1) ? 0xffffffff : 0x0, //r2=(_A2 == _B2) ? 0xffffffff : 0x0, r3=(_A3 == _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpeq_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for less than, //r0=(_A0 < _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmplt_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for less than, //r0=(_A0 < _B0) ? 0xffffffff : 0x0, r1=(_A1 < _B1) ? 0xffffffff : 0x0, //r2=(_A2 < _B2) ? 0xffffffff : 0x0, r3=(_A3 < _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmplt_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for less than or equal, //r0=(_A0 <= _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmple_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for less than or equal, //r0=(_A0 <= _B0) ? 0xffffffff : 0x0, r1=(_A1 <= _B1) ? 0xffffffff : 0x0, //r2=(_A2 <= _B2) ? 0xffffffff : 0x0, r3=(_A3 <= _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmple_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for greater than, //r0=(_A0 > _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpgt_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for greater than, //r0=(_A0 > _B0) ? 0xffffffff : 0x0, r1=(_A1 > _B1) ? 0xffffffff : 0x0, //r2=(_A2 > _B2) ? 0xffffffff : 0x0, r3=(_A3 > _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpgt_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for greater than or equal, //r0=(_A0 >= _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpge_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for greater than or equal, //r0=(_A0 >= _B0) ? 0xffffffff : 0x0, r1=(_A1 >= _B1) ? 0xffffffff : 0x0, //r2=(_A2 >= _B2) ? 0xffffffff : 0x0, r3=(_A3 >= _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpge_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for inequality, //r0=(_A0 != _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpneq_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for inequality, //r0=(_A0 != _B0) ? 0xffffffff : 0x0, r1=(_A1 != _B1) ? 0xffffffff : 0x0, //r2=(_A2 != _B2) ? 0xffffffff : 0x0, r3=(_A3 != _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpneq_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not less than, //r0= !(_A0 < _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpnlt_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not less than, //r0=!(_A0 < _B0) ? 0xffffffff : 0x0, r1=!(_A1 < _B1) ? 0xffffffff : 0x0, //r2=!(_A2 < _B2) ? 0xffffffff : 0x0, r3=!(_A3 < _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpnlt_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not less than or equal //r0= !(_A0 <= _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpnle_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not less than or equal //r0=!(_A0 <= _B0) ? 0xffffffff : 0x0, r1=!(_A1 <= _B1) ? 0xffffffff : 0x0, //r2=!(_A2 <= _B2) ? 0xffffffff : 0x0, r3=!(_A3 <= _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpnle_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not greater than, //r0=!(_A0 > _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpngt_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not greater than, //r0=!(_A0 > _B0) ? 0xffffffff : 0x0, r1=!(_A1 > _B1) ? 0xffffffff : 0x0, //r2=!(_A2 > _B2) ? 0xffffffff : 0x0, r3=!(_A3 > _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpngt_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not greater than or equal, //r0=!(_A0 >= _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpnge_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not greater than or equal, //r0=!(_A0 >= _B0) ? 0xffffffff : 0x0, r1=!(_A1 >= _B1) ? 0xffffffff : 0x0, //r2=!(_A2 >= _B2) ? 0xffffffff : 0x0, r3=!(_A3 >= _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpnge_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for ordered, //r0=(_A0 ord? _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpord_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for ordered, //r0=(_A0 ord? _B0) ? 0xffffffff : 0x0, r1=(_A1 ord? _B1) ? 0xffffffff : 0x0, //r2=(_A2 ord? _B2) ? 0xffffffff : 0x0, r3=(_A3 ord? _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpord_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for unordered, //r0=(_A0 unord? _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpunord_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for unordered, //r0=(_A0 unord? _B0) ? 0xffffffff : 0x0, r1=(_A1 unord? _B1) ? 0xffffffff : 0x0, //r2=(_A2 unord? _B2) ? 0xffffffff : 0x0, r3=(_A3 unord? _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpunord_ps(__m128 _A, __m128 _B); //返回一个0或1的整数，Compares the lower single-precision, floating-point value of //a and b for a equal to b,If a and b are equal, 1 is returned. Otherwise, //0 is returned. If a or b is a NaN, 1 is returned //r=(_A0 == _B0) ? 0x1 : 0x0 extern int _mm_comieq_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is less than b, 1 is returned. Otherwise, //0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 < _B0) ? 0x1 : 0x0 extern int _mm_comilt_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is less than or equal to b, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 <= _B0) ? 0x1 : 0x0 extern int _mm_comile_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is greater than b, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 > _B0) ? 0x1 : 0x0 extern int _mm_comigt_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is greater than or equal to b, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 >= _B0) ? 0x1 : 0x0 extern int _mm_comige_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a and b are not equal, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 != _B0) ? 0x1 : 0x0 extern int _mm_comineq_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a and b are equal, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 == _B0) ? 0x1 : 0x0 extern int _mm_ucomieq_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is less than b , 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 < _B0) ? 0x1 : 0x0 extern int _mm_ucomilt_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is less than or equal to b, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 <= _B0) ? 0x1 : 0x0 extern int _mm_ucomile_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is greater than b, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 > _B0) ? 0x1 : 0x0 extern int _mm_ucomigt_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is greater than or equal to b, 1 is returned. //Otherwise, 0 is returned,r=(_A0 >= _B0) ? 0x1 : 0x0 extern int _mm_ucomige_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a and b are not equal, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 != _B0) ? 0x1 : 0x0 extern int _mm_ucomineq_ss(__m128 _A, __m128 _B); //Conversion Operations(SSE) //返回一个32bit的整数，Converts the lower single-precision, floating-point value //of a to a 32-bit integer according to the current rounding mode, r=(int)_A0 extern int _mm_cvt_ss2si(__m128 _A);//=_mm_cvtss_si32 //返回一个__m64寄存器，Converts the two lower single-precision, floating-point //values of a to two 32-bit integers according to the current rounding mode, //returning the integers in packed form, r0=(int)_A0, r1=(int)_A1 extern __m64 _mm_cvt_ps2pi(__m128 _A);//=_mm_cvtps_pi32 //返回一个32bit的整数，Converts the lower single-precision, floating-point value //of a to a 32-bit integer with truncation, r=(int)_A0 extern int _mm_cvtt_ss2si(__m128 _A);//=_mm_cvttss_si32 //返回一个__m64寄存器，Converts the two lower single-precision, floating-point //values of a to two 32-bit integer with truncation, returning the integers //in packed form, r0=(int)_A0, r1=(int)_A1 extern __m64 _mm_cvtt_ps2pi(__m128 _A);//=_mm_cvttps_pi32 //返回一个__m128的寄存器，Converts the 32-bit integer value b to an single-precision, //floating-point value; the upper three single-precision, floating-point values are //passed through from a, r0=(float)_B, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cvt_si2ss(__m128 _A, int _B);//=_mm_cvtsi32_ss //返回一个__m128的寄存器，Converts the two 32-bit integer values in packed form in b //to two single-precision, floating-point values; the upper two single-precision, //floating-point values are passed through from a //r0=(float)_B0, r1=(float)_B1, r2=_A2, r3=_A3 extern __m128 _mm_cvt_pi2ps(__m128 _A, __m64 _B);//=_mm_cvtpi32_ps //返回一个__m128的寄存器，Converts the four 16-bit signed integer values in a to //four single-precision, floating-point values //r0=(float)_A0, r1=(float)_A1, r2=(float)_A2, r3=(float)_A3 __inline __m128 _mm_cvtpi16_ps(__m64 _A); //返回一个__m128的寄存器，Converts the four 16-bit unsigned integer values in a //to four single-precision, floating-point values //r0=(float)_A0, r1=(float)_A1, r2=(float)_A2, r3=(float)_A3 __inline __m128 _mm_cvtpu16_ps(__m64 _A); //返回一个__m64的寄存器，Converts the four single-precision, floating-point values //in a to four signed 16-bit integer values //r0=(short)_A0, r1=(short)_A1, r2=(short)_A2, r3=(short)_A3 __inline __m64 _mm_cvtps_pi16(__m128 _A); //返回一个__m128的寄存器，Converts the lower four 8-bit signed integer values in a //to four single-precision, floating-point values //r0=(float)_A0, r1=(float)_A1, r2=(float)_A2, r3=(float)_A3 __inline __m128 _mm_cvtpi8_ps(__m64 _A); //返回一个__m128的寄存器，Converts the lower four 8-bit unsigned integer values in a //to four single-precision, floating-point values //r0=(float)_A0, r1=(float)_A1, r2=(float)_A2, r3=(float)_A3 __inline __m128 _mm_cvtpu8_ps(__m64 _A); //返回一个__m64的寄存器，Converts the four single-precision, floating-point values //in a to the lower four signed 8-bit integer values of the result //r0=(char)_A0, r1=(char)_A1, r2=(char)_A2, r3=(char)_A3 __inline __m64 _mm_cvtps_pi8(__m128 _A); //返回一个__m128的寄存器，Converts the two 32-bit signed integer values in a and the //two 32-bit signed integer values in b to four single-precision, floating-point values //r0=(float)_A0, r1=(float)_A1, r2=(float)_B0, r3=(float)_B1 __inline __m128 _mm_cvtpi32x2_ps(__m64 _A, __m64 _B); //返回一个32bit浮点数，Extracts the lower order floating point value from the parameter //r=_A0 extern float _mm_cvtss_f32(__m128 _A); //Miscellaneous Instructions That Use Streaming SIMD Extensions: //返回一个__m128的寄存器，Selects four specific single-precision, floating-point //values from a and b, based on the mask i extern __m128 _mm_shuffle_ps(__m128 _A, __m128 _B, unsigned int _Imm8); //返回一个__m128的寄存器，Selects and interleaves the upper two single-precision, //floating-point values from a and b //r0=_A2, r1=_B2, r2=_A3, r3=_B3 extern __m128 _mm_unpackhi_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Selects and interleaves the lower two single-precision, //floating-point values from a and b //r0=_A0, r1=_B0, r2=_A1, r3=_B1 extern __m128 _mm_unpacklo_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Sets the upper two single-precision, floating-point //values with 64 bits of data loaded from the address p; the lower two values //are passed through from a //r0=_A0, r1=_A1, r2=_P0, r3=_P1 extern __m128 _mm_loadh_pi(__m128 _A, __m64 const _P); //返回一个__m128的寄存器，Moves the upper two single-precision, floating-point //values of b to the lower two single-precision, floating-point values of the result //r3=_A3, r2=_A2, r1=_B3, r0=_B2 extern __m128 _mm_movehl_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Moves the lower two single-precision, floating-point //values of b to the upper two single-precision, floating-point values of the result //r3=_B1, r2=_B0, r1=_A1, r0=_A0 extern __m128 _mm_movelh_ps(__m128 _A, __m128 _B); //返回为空，Stores the upper two single-precision, floating-point values of a //to the address p, _P0=_A2, _P1=_A3 extern void _mm_storeh_pi(__m64 _P, __m128 _A); //返回一个__m128的寄存器，Sets the lower two single-precision, floating-point //values with 64 bits of data loaded from the address p; the upper two values //are passed through from a //r0=_P0, r1=_P1, r2=_A2, r3=_A3 extern __m128 _mm_loadl_pi(__m128 _A, __m64 const _P); //返回为空，Stores the lower two single-precision, floating-point values of a //to the address p, _P0=_A0, _P1=_A1 extern void _mm_storel_pi(__m64 _P, __m128 _A); //返回一个整数，Creates a 4-bit mask from the most significant bits of the //four single-precision, floating-point values //r=sign(_A3)<<3 \| sign(_A2)<<2 \| sign(_A1)<<1 \| sign(_A0) extern int _mm_movemask_ps(__m128 _A); //返回一个无符号整数，Returns the contents of the control register extern unsigned int _mm_getcsr(void); //返回为空，Sets the control register to the value specified extern void _mm_setcsr(unsigned int); //Memory and Initialization Using Streaming SIMD Extensions //Load Operations(SSE) //返回一个__m128的寄存器，Loads an single-precision, floating-point value into //the low word and clears the upper three words //r0=_P, r1=0.0, r2=0.0, r3=0.0 extern __m128 _mm_load_ss(float const* _P); //返回一个__m128的寄存器，Loads a single single-precision, floating-point value, //copying it into all four words //r0=_P0, r1=_P1, r2=_P2, r3=_P3 extern __m128 _mm_load_ps1(float const* _P);//=_mm_load1_ps //返回一个__m128的寄存器，Loads four single-precision, floating-point values //The address must be 16-byte aligned //r0=_P[0], r1=_P[1], r2=_P[2], r3=_P[3] extern __m128 _mm_load_ps(float const* _P); //返回一个__m128的寄存器，Loads four single-precision, floating-point values //in reverse order, The address must be 16-byte aligned //r0=_P[3], r1=_P[2], r2=_P[1], r3=_P[0] extern __m128 _mm_loadr_ps(float const* _P); //返回一个__m128的寄存器，Loads four single-precision, floating-point values //The address does not need to be 16-byte aligned //r0=_P[0], r1=_P[1], r2=_P[2], r3=_P[3] extern __m128 _mm_loadu_ps(float const* _P); //Set Operations(SSE) //返回一个__m128的寄存器，Sets the low word of an single-precision, //floating-point value to w and clears the upper three words //r0=_W, r1=r2=r3=0.0 extern __m128 _mm_set_ss(float _W); //返回一个__m128的寄存器，Sets the four single-precision, floating-point values to w //r0=r1=r2=r3=_W extern __m128 _mm_set_ps1(float _W);//=_mm_set1_ps //返回一个__m128的寄存器，Sets the four single-precision, floating-point values to //the four inputs, r0=_D, r1=_C, r2=_B, r3=_A extern __m128 _mm_set_ps(float _A, float _B, float _C, float _D); //返回一个__m128的寄存器，Sets the four single-precision, floating-point values to //the four inputs in reverse order, r0=_A, r1=_B, r2=_C, r3=_D extern __m128 _mm_setr_ps(float _A, float _B, float _C, float _D); //返回一个__m128的寄存器，Clears the four single-precision, floating-point values //r0=r1=r2=r3=0.0 extern __m128 _mm_setzero_ps(void); //Store Operations(SSE) //返回为空，Stores the lower single-precision, floating-point value，_V=_A0 extern void _mm_store_ss(float _V, __m128 _A); //返回为空，Stores the lower single-precision, floating-point value across four words //_V[0]=_A0, _V[1]=_A0, _V[2]=_A0, _V[3]=_A0 extern void _mm_store_ps1(float _V, __m128 _A);//=_mm_store1_ps //返回为空，Stores four single-precision, floating-point values //The address must be 16-byte aligned //_V[0]=_A0, _V[1]=_A1, _V[2]=_A2, _V[3]=_A3 extern void _mm_store_ps(float _V, __m128 _A); //返回为空，Stores four single-precision, floating-point values in reverse order //The address must be 16-byte aligned, //_V[0]=_A3, _V[1]=_A2, _V[2]=_A1, _V[3]=_A0 extern void _mm_storer_ps(float _V, __m128 _A); //返回为空，Stores four single-precision, floating-point values, //The address does not need to be 16-byte aligned //_V[0]=_A0, _V[1]=_A1, _V[2]=_A2, _V[3]=_A3 extern void _mm_storeu_ps(float _V, __m128 _A); //返回一个__m128的寄存器，Sets the low word to the single-precision, floating-point //value of b,The upper 3 single-precision, floating-point values are passed through //from a, r0=_B0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_move_ss(__m128 _A, __m128 _B); //Integer Intrinsics Using Streaming SIMD Extensions //返回一个16bit整数，Extracts one of the four words of a， //The selector n must be an immediate, //r=(_Imm == 0) ? _A0 : ((_Imm==1) ? _A1 : ((_Imm==2) ? _A2 : _A3)) extern int _m_pextrw(__m64 _A, int _Imm);//=_mm_extract_pi16 //返回一个__m64的寄存器,Inserts word d into one of four words of a, //The selector n must be an immediate //r0=(_Imm==0)? _D : _A0, r1=(_Imm==1)? _D : _A1, //r2=(_Imm==2)? _D : _A2, r3=(_Imm==3)? _D : _A3 extern __m64 _m_pinsrw(__m64 _A, int _D, int _Imm);//=_mm_insert_pi16 //返回一个__m64的寄存器,Computes the element-wise maximum of the words in a and b, //r0=max(_A0, _B0), r1=max(_A1, _B1), r2=max(_A2, _B2), r3=max(_A3, _B3) extern __m64 _m_pmaxsw(__m64 _A, __m64 _B);//=_mm_max_pi16 //返回一个__m64的寄存器,Computes the element-wise maximum of the unsigned bytes in //a and b, r0=max(_A0, _B0), r1=max(_A1, _B1), ... r7=max(_A7, _B7) extern __m64 _m_pmaxub(__m64 _A, __m64 _B);//=_mm_max_pu8 //返回一个__m64的寄存器,Computes the element-wise minimum of the words in a and b //r0=min(_A0, _B0), r1=min(_A1, _B1), r2=min(_A2, _B2), r3=min(_A3, _B3) extern __m64 _m_pminsw(__m64 _A, __m64 _B);//=_mm_min_pi16 //返回一个__m64的寄存器,Computes the element-wise minimum of the unsigned bytes //in a and b, r0=min(_A0, _B0), r1=min(_A1, _B1), ... r7=min(_A7, _B7) extern __m64 _m_pminub(__m64 _A, __m64 _B);//=_mm_min_pu8 //返回一个整数，Creates an 8-bit mask from the most significant bits of the //bytes in a, r=sign(_A7)<<7 \| sign(_A6)<<6 \| ... \| sign(_A0) extern int _m_pmovmskb(__m64 _A);//=_mm_movemask_pi8 //返回一个__m64的寄存器,Multiplies the unsigned words in a and b, returning the //upper 16 bits of the 32-bit intermediate results, //r0=hiword(_A0, _B0), r1=hiword(_A1, _B1), r2=hiword(_A2, _B2), r3=hiword(_A3, _B3) extern __m64 _m_pmulhuw(__m64 _A, __m64 _B);//=_mm_mulhi_pu16 //返回为空，Conditionally stores byte elements of d to address p,The high bit of //each byte in the selector _B determines whether the corresponding byte in _A //will be stored, if (sign(_B0)) _P[0]=_A0, if (sign(_B1)) _P[1]=_A1, ... //if (sign(_B7)) _P[7]=_A7 extern void _m_maskmovq(__m64 _A, __m64 _B, char * _P);//=_mm_maskmove_si64 //返回一个__m64的寄存器,Computes the (rounded) averages of the unsigned bytes //in a and b, t=(unsigned short)_A0 + (unsigned short)_B0, r0=(t>>1) \| (t & 0x01), //..., t=(unsigned short)_A7 + (unsigned short)_B7, r7=(t>>1) \| (t & 0x01) extern __m64 _m_pavgb(__m64 _A, __m64 _B);//=_mm_avg_pu8 //返回一个__m64的寄存器,Computes the (rounded) averages of the unsigned words //in a and b, t=(unsigned short)_A0 + (unsigned short)_B0, r0=(t>>1) \| (t & 0x01), //..., t=(unsigned short)_A4 + (unsigned short)_B4, r7=(t>>1) \| (t & 0x01) extern __m64 _m_pavgw(__m64 _A, __m64 _B);//=_mm_avg_pu16 //返回一个__m64的寄存器,Computes the sum of the absolute differences of the unsigned //bytes in a and b, returning the value in the lower word //The upper three words are cleared //r0=abs(_A0-_B0) + ... + abs(_A7-_B7), r1=r2=r3=0 extern __m64 _m_psadbw(__m64, __m64);//=_mm_sad_pu8 //返回一个__m64的寄存器,Returns a combination of the four words of a. //The selector _Imm must be an immediate //r0=word(_Imm & 0x03) of _A, r1=word((_Imm>>2) & 0x03) of _A, //r2=word((_Imm>>4) & 0x03) of _A, r1=word((_Imm>>6) & 0x03) of _A, extern __m64 _m_pshufw(__m64 _A, int _Imm);//=_mm_shuffle_pi16 //Streaming SIMD Extensions that Support the Cache //返回为空，Loads one cache line of data from address p to a location closer //to the processor, The value _Sel specifies the type of prefetch operation extern void _mm_prefetch(char const_A, int _Sel); //返回为空，Stores the data in a to the address p without polluting the caches //This intrinsic requires you to empty the multimedia state for the MMX register extern void _mm_stream_pi(__m64 _P, __m64 _A); //返回为空，Stores the data in a to the address p without polluting the caches, //The address must be 16-byte aligned extern void _mm_stream_ps(float , __m128 _A); //返回为空，Guarantees that every preceding store is globally visible //before any subsequent store extern void _mm_sfence(void); / Alternate intrinsic names definition */ #define _mm_cvtss_si32 _mm_cvt_ss2si #define _mm_cvtps_pi32 _mm_cvt_ps2pi #define _mm_cvttss_si32 _mm_cvtt_ss2si #define _mm_cvttps_pi32 _mm_cvtt_ps2pi #define _mm_cvtsi32_ss _mm_cvt_si2ss #define _mm_cvtpi32_ps _mm_cvt_pi2ps #define _mm_extract_pi16 _m_pextrw #define _mm_insert_pi16 _m_pinsrw #define _mm_max_pi16 _m_pmaxsw #define _mm_max_pu8 _m_pmaxub #define _mm_min_pi16 _m_pminsw #define _mm_min_pu8 _m_pminub #define _mm_movemask_pi8 _m_pmovmskb #define _mm_mulhi_pu16 _m_pmulhuw #define _mm_shuffle_pi16 _m_pshufw #define _mm_maskmove_si64 _m_maskmovq #define _mm_avg_pu8 _m_pavgb #define _mm_avg_pu16 _m_pavgw #define _mm_sad_pu8 _m_psadbw #define _mm_set1_ps _mm_set_ps1 #define _mm_load1_ps _mm_load_ps1 #define _mm_store1_ps _mm_store_ps1 xmmintrin.h文件中各函数的介绍： 2016-5-3 01:09 0
kuboys 雪币： 75 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 5 回帖 20 粉丝 0 关注私信	kuboys 16 楼好资料,学习了..谢谢分享 2016-5-3 08:04 0
木志本柯雪币： 4738 活跃值： (4286) 能力值： ( LV2，RANK：10 ) 在线值：发帖 0 回帖 364 粉丝 4 关注私信	木志本柯 17 楼哈哈哈，这实在是太酷了。 2018-11-17 21:19 0
jgs 雪币： 8911 活跃值： (5136) 能力值： ( LV2，RANK：10 ) 在线值：发帖 1 回帖 447 粉丝 7 关注私信	jgs 18 楼谢谢分享好资料,学习了 2018-11-17 21:32 0
jgs 雪币： 8911 活跃值： (5136) 能力值： ( LV2，RANK：10 ) 在线值：发帖 1 回帖 447 粉丝 7 关注私信	jgs 19 楼一年了，都没有人再关注？我学习使用了这段代码，感觉特征码搜索还是很快的，赞一个。 2019-12-11 22:46 0
ookkaa 雪币： 246 活跃值： (4427) 能力值： ( LV4，RANK：45 ) 在线值：发帖 22 回帖 214 粉丝 51 关注私信	ookkaa 20 楼好 2020-4-25 17:14 0
ZwCopyAll 雪币： 259 活跃值： (283) 能力值： ( LV2，RANK：10 ) 在线值：发帖 11 回帖 266 粉丝 10 关注私信	ZwCopyAll 21 楼求暴力搜索物理内存的相关代码 2020-4-26 10:12 0
iamasbcx 雪币： 3879 活跃值： (3673) 能力值： ( LV2，RANK：10 ) 在线值：发帖 2 回帖 191 粉丝 8 关注私信	iamasbcx 22 楼刚好学习内核INLINEHOOK 试试看 2020-4-26 18:49 0
明天去要饭雪币： 20 活跃值： (12) 能力值： ( LV2，RANK：10 ) 在线值：发帖 3 回帖 24 粉丝 0 关注私信	明天去要饭 23 楼好贴！这论坛要被搞G的和卖教程的霸占了 2020-5-7 18:48 0
院士雪币： 3574 活跃值： (3955) 能力值： ( LV2，RANK：10 ) 在线值：发帖 0 回帖 177 粉丝 2 关注私信	院士 24 楼帖子提到支持模糊搜索，代码里没看到处理通配符？的代码，哪位大佬指点一下？感谢。 2021-7-31 21:17 0
yaoguen 雪币： 4605 活跃值： (4527) 能力值： ( LV2，RANK：10 ) 在线值：发帖 14 回帖 143 粉丝 1 关注私信	yaoguen 25 楼 liuzewei #ifdef WIN32 #    ifndef WIN32_LEAN_AND_MEAN #     & ... 发现一个bug，如果VirtualLength小于16，并且搜索不到的情况下程序会崩溃 2022-2-28 22:36 0
	游客登录 \| 注册方可回帖回帖表情雪币赚取及消费高级回复

liuzewei

发帖

回帖

140

RANK

关注

私信

他的文章

关于我们

联系我们

企业服务

看雪公众号

最新回复 (28) 1 2 ▶
lhglhg 雪币： 221 活跃值： (2311) 能力值： ( LV4，RANK：50 ) 在线值：发帖 8 回帖 434 粉丝 3 关注私信	lhglhg 1 2 楼需要增加哪些头文件？ 2016-5-2 09:41 0
liuzewei 雪币： 324 活跃值： (91) 能力值： ( LV9，RANK：140 ) 在线值：发帖 9 回帖 65 粉丝 6 关注私信	liuzewei 3 3 楼 #ifdef WIN32 # ifndef WIN32_LEAN_AND_MEAN # define WIN32_LEAN_AND_MEAN # endif # include <windows.h> # ifndef PAGE_SIZE # define PAGE_SIZE 0x1000 # endif #else # include <ntifs.h> # ifndef MAX_PATH # define MAX_PATH 260 # endif #endif 2016-5-2 09:44 0
靴子雪币： 22 活跃值： (443) 能力值： ( LV2，RANK：10 ) 在线值：发帖 19 回帖 524 粉丝 1 关注私信	靴子 4 楼参数如何传递的举个例子呀！ 2016-5-2 10:12 0
值得怀疑雪币： 2325 活跃值： (4903) 能力值： ( LV2，RANK：10 ) 在线值：发帖 1 回帖 357 粉丝 0 关注私信	值得怀疑 5 楼 CurHead = _mm_loadu_si128((__m128i*)(VirtualAddress + i)); CurComp = _mm_cmpeq_epi8(SigHead, CurHead); MskComp = _mm_movemask_epi8(CurComp); 这是函数呢？？？ 2016-5-2 11:09 0
lhglhg 雪币： 221 活跃值： (2311) 能力值： ( LV4，RANK：50 ) 在线值：发帖 8 回帖 434 粉丝 3 关注私信	lhglhg 1 6 楼需要 emmintrin.h 支持 2016-5-2 11:36 0
lhglhg 雪币： 221 活跃值： (2311) 能力值： ( LV4，RANK：50 ) 在线值：发帖 8 回帖 434 粉丝 3 关注私信	lhglhg 1 7 楼 xmmintrin.h(34) : fatal error C1189: #error : "SSE instruction set not enabled" 2016-5-2 11:45 0
lhglhg 雪币： 221 活跃值： (2311) 能力值： ( LV4，RANK：50 ) 在线值：发帖 8 回帖 434 粉丝 3 关注私信	lhglhg 1 8 楼只能用GCC 编译器？ VS编译不行？ 2016-5-2 11:52 0
sxpp 雪币： 231 活跃值： (2631) 能力值： ( LV5，RANK：60 ) 在线值：发帖 11 回帖 343 粉丝 4 关注私信	sxpp 1 9 楼 VS可以编译~~~~~~~ 2016-5-2 15:45 0
VCKFC 雪币： 9941 活跃值： (2143) 能力值： ( LV3，RANK：20 ) 在线值：发帖 32 回帖 319 粉丝 16 关注私信	VCKFC 10 楼不错,可以学习一下SSE 2016-5-2 16:07 0
liuzewei 雪币： 324 活跃值： (91) 能力值： ( LV9，RANK：140 ) 在线值：发帖 9 回帖 65 粉丝 6 关注私信	liuzewei 3 11 楼参数传递在函数注释里有的 2016-5-2 22:46 0
liuzewei 雪币： 324 活跃值： (91) 能力值： ( LV9，RANK：140 ) 在线值：发帖 9 回帖 65 粉丝 6 关注私信	liuzewei 3 12 楼头文件我加入1楼了，摘录出来时没有再编译，我的锅。现在摘录出来单独编译测试过了： 1>------ Build started: Project: ConsoleApplication1, Configuration: Debug x64 ------ 1> Source.cpp ========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ========== 2016-5-2 22:50 0
liuzewei 雪币： 324 活跃值： (91) 能力值： ( LV9，RANK：140 ) 在线值：发帖 9 回帖 65 粉丝 6 关注私信	liuzewei 3 13 楼编译器都是直接翻译成对应SSE汇编指令的， DEBUG版 CurHead = _mm_loadu_si128((__m128i*)(VirtualAddress + i)); 000000013F7B1237 movdqu xmm0,xmmword ptr [rax] 000000013F7B123B movdqa xmmword ptr [rsp+10F0h],xmm0 000000013F7B1244 movdqa xmm0,xmmword ptr [rsp+10F0h] 000000013F7B124D movdqa xmmword ptr [CurHead],xmm0 CurComp = _mm_cmpeq_epi8(SigHead, CurHead); 000000013F7B1256 movdqa xmm0,xmmword ptr [SigHead] 000000013F7B125F pcmpeqb xmm0,xmmword ptr [CurHead] 000000013F7B1268 movdqa xmmword ptr [rsp+1100h],xmm0 000000013F7B1271 movdqa xmm0,xmmword ptr [rsp+1100h] 000000013F7B127A movdqa xmmword ptr [CurComp],xmm0 MskComp = _mm_movemask_epi8(CurComp); 000000013F7B1283 movdqa xmm0,xmmword ptr [CurComp] 000000013F7B128C pmovmskb eax,xmm0 000000013F7B1290 mov dword ptr [MskComp],eax RELEASE版： 000000013FCB1080 movdqu xmm0,xmmword ptr [rdi] 000000013FCB1087 pcmpeqb xmm0,xmm1 000000013FCB108B pmovmskb ebx,xmm0 while (_BitScanForward(&IdxComp, MskComp)) 000000013FCB108F bsf r9d,ebx 2016-5-2 22:56 0
lhglhg 雪币： 221 活跃值： (2311) 能力值： ( LV4，RANK：50 ) 在线值：发帖 8 回帖 434 粉丝 3 关注私信	lhglhg 1 14 楼头文件也上传吧。谢谢了。。 2016-5-2 23:45 0
VCKFC 雪币： 9941 活跃值： (2143) 能力值： ( LV3，RANK：20 ) 在线值：发帖 32 回帖 319 粉丝 16 关注私信	VCKFC 15 楼 /----------Floating Point Intrinsics Using Streaming SIMD Extensions------------/ //Arithmetic Operations(Floating Point ):add、sub、mul、div、sqrt、rcp、min、max //---------------------说明：_ps结尾的指令表示对4个单精度浮点数同时进行运算， //_ss结尾的指令表示仅对4个单精度浮点数最低位的浮点数进行运算--------------------- //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数相加， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(_A0+_B0, _A1, _A2, _A3) extern __m128 _mm_add_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数相加， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0+_B0, r1=_A1+_B1, r2=_A2+_B2, r3=_A3+_B3 extern __m128 _mm_add_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数相减， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(_A0-_B0, _A1, _A2, _A3) extern __m128 _mm_sub_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数相减， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0-_B0, r1=_A1-_B1, r2=_A2-_B2, r3=_A3-_B3 extern __m128 _mm_sub_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数相乘， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(_A0_B0, _A1, _A2, _A3) extern __m128 _mm_mul_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数相乘， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0_B0, r1=_A1_B1, r2=_A2_B2, r3=_A3_B3 extern __m128 _mm_mul_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数相除， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(_A0/_B0, _A1, _A2, _A3) extern __m128 _mm_div_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数相除， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0/_B0, r1=_A1/_B1, r2=_A2/_B2, r3=_A3/_B3 extern __m128 _mm_div_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，仅将寄存器_A最低对应位置的32bit单精度浮点数开平方， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3) //则返回寄存器为r=(sqrt(_A0), _A1, _A2, _A3) extern __m128 _mm_sqrt_ss(__m128 _A); //返回一个__m128的寄存器，将寄存器_A中4个32bit单精度浮点数开平方， //例如_A=(_A0,_A1,_A2,_A3)，则返回寄存器为 //r=(sqrt(_A0), sqrt(_A1), sqrt(_A2), sqrt(_A3)) extern __m128 _mm_sqrt_ps(__m128 _A); //返回一个__m128的寄存器，仅将寄存器_A最低对应位置的32bit单精度浮点数取倒数， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3) //则返回寄存器为r=(recip(_A0), _A1, _A2, _A3) extern __m128 _mm_rcp_ss(__m128 _A); //返回一个__m128的寄存器，将寄存器_A中4个32bit单精度浮点数取倒数， //例如_A=(_A0,_A1,_A2,_A3)，则返回寄存器为 //r=(recip(_A0), recip(_A1), recip(_A2), recip(_A3)) extern __m128 _mm_rcp_ps(__m128 _A); //返回一个__m128的寄存器，仅将寄存器_A最低对应位置的32bit单精度浮点数取平方根的倒数， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3) //则返回寄存器为r=(recip(sqrt(_A0)), _A1, _A2, _A3) extern __m128 _mm_rsqrt_ss(__m128 _A); //返回一个__m128的寄存器，将寄存器_A中4个32bit单精度浮点数取平方根的倒数， //例如_A=(_A0,_A1,_A2,_A3)，则返回寄存器为 //r=(recip(sqrt(_A0)), recip(sqrt(_A1)), recip(sqrt(_A2)), recip(sqrt(_A3))) extern __m128 _mm_rsqrt_ps(__m128 _A); //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数取最小值， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(min(_A0,_B0), _A1, _A2, _A3) extern __m128 _mm_min_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数取最小值， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=min(_A0,_B0), r1=min(_A1,_B1), r2=min(_A2,_B2), r3=min(_A3,_B3) extern __m128 _mm_min_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，仅将寄存器_A和寄存器_B最低对应位置的32bit单精度浮点数取最大值， //其余位置取寄存器_A中的数据,例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器为r=(max(_A0,_B0), _A1, _A2, _A3) extern __m128 _mm_max_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数取最大值， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=max(_A0,_B0), r1=max(_A1,_B1), r2=max(_A2,_B2), r3=max(_A3,_B3) extern __m128 _mm_max_ps(__m128 _A, __m128 _B); //Logical Operations(SSE)：and、andnot、or、xor //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数分别进行按位与运算， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0 & _B0, r1=_A1 & _B1, r2=_A2 & _B2, r3=_A3 & _B3 extern __m128 _mm_and_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A对应位置的32bit单精度浮点数的非和寄存器_B对应位置的32bit //单精度浮点数分别进行按位与运算，例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=~_A0 & _B0, r1=~_A1 & _B1, r2=~_A2 & _B2, r3=~_A3 & _B3 extern __m128 _mm_andnot_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数分别进行按位或运算， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0 \| _B0, r1=_A1 \| _B1, r2=_A2 \| _B2, r3=_A3 \| _B3 extern __m128 _mm_or_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，将寄存器_A和_B的对应位置的32bit单精度浮点数分别进行按位异或运算， //例如_A=(_A0,_A1,_A2,_A3), _B=(_B0,_B1,_B2,_B3), //则返回寄存器r0=_A0 ^ _B0, r1=_A1 ^ _B1, r2=_A2 ^ _B2, r3=_A3 ^ _B3 extern __m128 _mm_xor_ps(__m128 _A, __m128 _B); //Comparison Intrinsics(SSE):==、<、<=、>、>=、!=、不小于、不小于等于、不大于、不大于等于 //返回一个__m128的寄存器，Compares for equality, //r0=(_A0 == _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpeq_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for equality, //r0=(_A0 == _B0) ? 0xffffffff : 0x0, r1=(_A1 == _B1) ? 0xffffffff : 0x0, //r2=(_A2 == _B2) ? 0xffffffff : 0x0, r3=(_A3 == _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpeq_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for less than, //r0=(_A0 < _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmplt_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for less than, //r0=(_A0 < _B0) ? 0xffffffff : 0x0, r1=(_A1 < _B1) ? 0xffffffff : 0x0, //r2=(_A2 < _B2) ? 0xffffffff : 0x0, r3=(_A3 < _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmplt_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for less than or equal, //r0=(_A0 <= _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmple_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for less than or equal, //r0=(_A0 <= _B0) ? 0xffffffff : 0x0, r1=(_A1 <= _B1) ? 0xffffffff : 0x0, //r2=(_A2 <= _B2) ? 0xffffffff : 0x0, r3=(_A3 <= _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmple_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for greater than, //r0=(_A0 > _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpgt_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for greater than, //r0=(_A0 > _B0) ? 0xffffffff : 0x0, r1=(_A1 > _B1) ? 0xffffffff : 0x0, //r2=(_A2 > _B2) ? 0xffffffff : 0x0, r3=(_A3 > _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpgt_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for greater than or equal, //r0=(_A0 >= _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpge_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for greater than or equal, //r0=(_A0 >= _B0) ? 0xffffffff : 0x0, r1=(_A1 >= _B1) ? 0xffffffff : 0x0, //r2=(_A2 >= _B2) ? 0xffffffff : 0x0, r3=(_A3 >= _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpge_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for inequality, //r0=(_A0 != _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpneq_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for inequality, //r0=(_A0 != _B0) ? 0xffffffff : 0x0, r1=(_A1 != _B1) ? 0xffffffff : 0x0, //r2=(_A2 != _B2) ? 0xffffffff : 0x0, r3=(_A3 != _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpneq_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not less than, //r0= !(_A0 < _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpnlt_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not less than, //r0=!(_A0 < _B0) ? 0xffffffff : 0x0, r1=!(_A1 < _B1) ? 0xffffffff : 0x0, //r2=!(_A2 < _B2) ? 0xffffffff : 0x0, r3=!(_A3 < _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpnlt_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not less than or equal //r0= !(_A0 <= _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpnle_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not less than or equal //r0=!(_A0 <= _B0) ? 0xffffffff : 0x0, r1=!(_A1 <= _B1) ? 0xffffffff : 0x0, //r2=!(_A2 <= _B2) ? 0xffffffff : 0x0, r3=!(_A3 <= _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpnle_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not greater than, //r0=!(_A0 > _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpngt_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not greater than, //r0=!(_A0 > _B0) ? 0xffffffff : 0x0, r1=!(_A1 > _B1) ? 0xffffffff : 0x0, //r2=!(_A2 > _B2) ? 0xffffffff : 0x0, r3=!(_A3 > _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpngt_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not greater than or equal, //r0=!(_A0 >= _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpnge_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for not greater than or equal, //r0=!(_A0 >= _B0) ? 0xffffffff : 0x0, r1=!(_A1 >= _B1) ? 0xffffffff : 0x0, //r2=!(_A2 >= _B2) ? 0xffffffff : 0x0, r3=!(_A3 >= _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpnge_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for ordered, //r0=(_A0 ord? _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpord_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for ordered, //r0=(_A0 ord? _B0) ? 0xffffffff : 0x0, r1=(_A1 ord? _B1) ? 0xffffffff : 0x0, //r2=(_A2 ord? _B2) ? 0xffffffff : 0x0, r3=(_A3 ord? _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpord_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for unordered, //r0=(_A0 unord? _B0) ? 0xffffffff : 0x0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cmpunord_ss(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Compares for unordered, //r0=(_A0 unord? _B0) ? 0xffffffff : 0x0, r1=(_A1 unord? _B1) ? 0xffffffff : 0x0, //r2=(_A2 unord? _B2) ? 0xffffffff : 0x0, r3=(_A3 unord? _B3) ? 0xffffffff : 0x0 extern __m128 _mm_cmpunord_ps(__m128 _A, __m128 _B); //返回一个0或1的整数，Compares the lower single-precision, floating-point value of //a and b for a equal to b,If a and b are equal, 1 is returned. Otherwise, //0 is returned. If a or b is a NaN, 1 is returned //r=(_A0 == _B0) ? 0x1 : 0x0 extern int _mm_comieq_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is less than b, 1 is returned. Otherwise, //0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 < _B0) ? 0x1 : 0x0 extern int _mm_comilt_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is less than or equal to b, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 <= _B0) ? 0x1 : 0x0 extern int _mm_comile_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is greater than b, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 > _B0) ? 0x1 : 0x0 extern int _mm_comigt_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is greater than or equal to b, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 >= _B0) ? 0x1 : 0x0 extern int _mm_comige_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a and b are not equal, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 != _B0) ? 0x1 : 0x0 extern int _mm_comineq_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a and b are equal, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 == _B0) ? 0x1 : 0x0 extern int _mm_ucomieq_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is less than b , 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 < _B0) ? 0x1 : 0x0 extern int _mm_ucomilt_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is less than or equal to b, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 <= _B0) ? 0x1 : 0x0 extern int _mm_ucomile_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is greater than b, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 > _B0) ? 0x1 : 0x0 extern int _mm_ucomigt_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a is greater than or equal to b, 1 is returned. //Otherwise, 0 is returned,r=(_A0 >= _B0) ? 0x1 : 0x0 extern int _mm_ucomige_ss(__m128 _A, __m128 _B); //返回一个0或1的整数，If a and b are not equal, 1 is returned. //Otherwise, 0 is returned. If a or b is a NaN, 1 is returned, //r=(_A0 != _B0) ? 0x1 : 0x0 extern int _mm_ucomineq_ss(__m128 _A, __m128 _B); //Conversion Operations(SSE) //返回一个32bit的整数，Converts the lower single-precision, floating-point value //of a to a 32-bit integer according to the current rounding mode, r=(int)_A0 extern int _mm_cvt_ss2si(__m128 _A);//=_mm_cvtss_si32 //返回一个__m64寄存器，Converts the two lower single-precision, floating-point //values of a to two 32-bit integers according to the current rounding mode, //returning the integers in packed form, r0=(int)_A0, r1=(int)_A1 extern __m64 _mm_cvt_ps2pi(__m128 _A);//=_mm_cvtps_pi32 //返回一个32bit的整数，Converts the lower single-precision, floating-point value //of a to a 32-bit integer with truncation, r=(int)_A0 extern int _mm_cvtt_ss2si(__m128 _A);//=_mm_cvttss_si32 //返回一个__m64寄存器，Converts the two lower single-precision, floating-point //values of a to two 32-bit integer with truncation, returning the integers //in packed form, r0=(int)_A0, r1=(int)_A1 extern __m64 _mm_cvtt_ps2pi(__m128 _A);//=_mm_cvttps_pi32 //返回一个__m128的寄存器，Converts the 32-bit integer value b to an single-precision, //floating-point value; the upper three single-precision, floating-point values are //passed through from a, r0=(float)_B, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_cvt_si2ss(__m128 _A, int _B);//=_mm_cvtsi32_ss //返回一个__m128的寄存器，Converts the two 32-bit integer values in packed form in b //to two single-precision, floating-point values; the upper two single-precision, //floating-point values are passed through from a //r0=(float)_B0, r1=(float)_B1, r2=_A2, r3=_A3 extern __m128 _mm_cvt_pi2ps(__m128 _A, __m64 _B);//=_mm_cvtpi32_ps //返回一个__m128的寄存器，Converts the four 16-bit signed integer values in a to //four single-precision, floating-point values //r0=(float)_A0, r1=(float)_A1, r2=(float)_A2, r3=(float)_A3 __inline __m128 _mm_cvtpi16_ps(__m64 _A); //返回一个__m128的寄存器，Converts the four 16-bit unsigned integer values in a //to four single-precision, floating-point values //r0=(float)_A0, r1=(float)_A1, r2=(float)_A2, r3=(float)_A3 __inline __m128 _mm_cvtpu16_ps(__m64 _A); //返回一个__m64的寄存器，Converts the four single-precision, floating-point values //in a to four signed 16-bit integer values //r0=(short)_A0, r1=(short)_A1, r2=(short)_A2, r3=(short)_A3 __inline __m64 _mm_cvtps_pi16(__m128 _A); //返回一个__m128的寄存器，Converts the lower four 8-bit signed integer values in a //to four single-precision, floating-point values //r0=(float)_A0, r1=(float)_A1, r2=(float)_A2, r3=(float)_A3 __inline __m128 _mm_cvtpi8_ps(__m64 _A); //返回一个__m128的寄存器，Converts the lower four 8-bit unsigned integer values in a //to four single-precision, floating-point values //r0=(float)_A0, r1=(float)_A1, r2=(float)_A2, r3=(float)_A3 __inline __m128 _mm_cvtpu8_ps(__m64 _A); //返回一个__m64的寄存器，Converts the four single-precision, floating-point values //in a to the lower four signed 8-bit integer values of the result //r0=(char)_A0, r1=(char)_A1, r2=(char)_A2, r3=(char)_A3 __inline __m64 _mm_cvtps_pi8(__m128 _A); //返回一个__m128的寄存器，Converts the two 32-bit signed integer values in a and the //two 32-bit signed integer values in b to four single-precision, floating-point values //r0=(float)_A0, r1=(float)_A1, r2=(float)_B0, r3=(float)_B1 __inline __m128 _mm_cvtpi32x2_ps(__m64 _A, __m64 _B); //返回一个32bit浮点数，Extracts the lower order floating point value from the parameter //r=_A0 extern float _mm_cvtss_f32(__m128 _A); //Miscellaneous Instructions That Use Streaming SIMD Extensions: //返回一个__m128的寄存器，Selects four specific single-precision, floating-point //values from a and b, based on the mask i extern __m128 _mm_shuffle_ps(__m128 _A, __m128 _B, unsigned int _Imm8); //返回一个__m128的寄存器，Selects and interleaves the upper two single-precision, //floating-point values from a and b //r0=_A2, r1=_B2, r2=_A3, r3=_B3 extern __m128 _mm_unpackhi_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Selects and interleaves the lower two single-precision, //floating-point values from a and b //r0=_A0, r1=_B0, r2=_A1, r3=_B1 extern __m128 _mm_unpacklo_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Sets the upper two single-precision, floating-point //values with 64 bits of data loaded from the address p; the lower two values //are passed through from a //r0=_A0, r1=_A1, r2=_P0, r3=_P1 extern __m128 _mm_loadh_pi(__m128 _A, __m64 const _P); //返回一个__m128的寄存器，Moves the upper two single-precision, floating-point //values of b to the lower two single-precision, floating-point values of the result //r3=_A3, r2=_A2, r1=_B3, r0=_B2 extern __m128 _mm_movehl_ps(__m128 _A, __m128 _B); //返回一个__m128的寄存器，Moves the lower two single-precision, floating-point //values of b to the upper two single-precision, floating-point values of the result //r3=_B1, r2=_B0, r1=_A1, r0=_A0 extern __m128 _mm_movelh_ps(__m128 _A, __m128 _B); //返回为空，Stores the upper two single-precision, floating-point values of a //to the address p, _P0=_A2, _P1=_A3 extern void _mm_storeh_pi(__m64 _P, __m128 _A); //返回一个__m128的寄存器，Sets the lower two single-precision, floating-point //values with 64 bits of data loaded from the address p; the upper two values //are passed through from a //r0=_P0, r1=_P1, r2=_A2, r3=_A3 extern __m128 _mm_loadl_pi(__m128 _A, __m64 const _P); //返回为空，Stores the lower two single-precision, floating-point values of a //to the address p, _P0=_A0, _P1=_A1 extern void _mm_storel_pi(__m64 _P, __m128 _A); //返回一个整数，Creates a 4-bit mask from the most significant bits of the //four single-precision, floating-point values //r=sign(_A3)<<3 \| sign(_A2)<<2 \| sign(_A1)<<1 \| sign(_A0) extern int _mm_movemask_ps(__m128 _A); //返回一个无符号整数，Returns the contents of the control register extern unsigned int _mm_getcsr(void); //返回为空，Sets the control register to the value specified extern void _mm_setcsr(unsigned int); //Memory and Initialization Using Streaming SIMD Extensions //Load Operations(SSE) //返回一个__m128的寄存器，Loads an single-precision, floating-point value into //the low word and clears the upper three words //r0=_P, r1=0.0, r2=0.0, r3=0.0 extern __m128 _mm_load_ss(float const* _P); //返回一个__m128的寄存器，Loads a single single-precision, floating-point value, //copying it into all four words //r0=_P0, r1=_P1, r2=_P2, r3=_P3 extern __m128 _mm_load_ps1(float const* _P);//=_mm_load1_ps //返回一个__m128的寄存器，Loads four single-precision, floating-point values //The address must be 16-byte aligned //r0=_P[0], r1=_P[1], r2=_P[2], r3=_P[3] extern __m128 _mm_load_ps(float const* _P); //返回一个__m128的寄存器，Loads four single-precision, floating-point values //in reverse order, The address must be 16-byte aligned //r0=_P[3], r1=_P[2], r2=_P[1], r3=_P[0] extern __m128 _mm_loadr_ps(float const* _P); //返回一个__m128的寄存器，Loads four single-precision, floating-point values //The address does not need to be 16-byte aligned //r0=_P[0], r1=_P[1], r2=_P[2], r3=_P[3] extern __m128 _mm_loadu_ps(float const* _P); //Set Operations(SSE) //返回一个__m128的寄存器，Sets the low word of an single-precision, //floating-point value to w and clears the upper three words //r0=_W, r1=r2=r3=0.0 extern __m128 _mm_set_ss(float _W); //返回一个__m128的寄存器，Sets the four single-precision, floating-point values to w //r0=r1=r2=r3=_W extern __m128 _mm_set_ps1(float _W);//=_mm_set1_ps //返回一个__m128的寄存器，Sets the four single-precision, floating-point values to //the four inputs, r0=_D, r1=_C, r2=_B, r3=_A extern __m128 _mm_set_ps(float _A, float _B, float _C, float _D); //返回一个__m128的寄存器，Sets the four single-precision, floating-point values to //the four inputs in reverse order, r0=_A, r1=_B, r2=_C, r3=_D extern __m128 _mm_setr_ps(float _A, float _B, float _C, float _D); //返回一个__m128的寄存器，Clears the four single-precision, floating-point values //r0=r1=r2=r3=0.0 extern __m128 _mm_setzero_ps(void); //Store Operations(SSE) //返回为空，Stores the lower single-precision, floating-point value，_V=_A0 extern void _mm_store_ss(float _V, __m128 _A); //返回为空，Stores the lower single-precision, floating-point value across four words //_V[0]=_A0, _V[1]=_A0, _V[2]=_A0, _V[3]=_A0 extern void _mm_store_ps1(float _V, __m128 _A);//=_mm_store1_ps //返回为空，Stores four single-precision, floating-point values //The address must be 16-byte aligned //_V[0]=_A0, _V[1]=_A1, _V[2]=_A2, _V[3]=_A3 extern void _mm_store_ps(float _V, __m128 _A); //返回为空，Stores four single-precision, floating-point values in reverse order //The address must be 16-byte aligned, //_V[0]=_A3, _V[1]=_A2, _V[2]=_A1, _V[3]=_A0 extern void _mm_storer_ps(float _V, __m128 _A); //返回为空，Stores four single-precision, floating-point values, //The address does not need to be 16-byte aligned //_V[0]=_A0, _V[1]=_A1, _V[2]=_A2, _V[3]=_A3 extern void _mm_storeu_ps(float _V, __m128 _A); //返回一个__m128的寄存器，Sets the low word to the single-precision, floating-point //value of b,The upper 3 single-precision, floating-point values are passed through //from a, r0=_B0, r1=_A1, r2=_A2, r3=_A3 extern __m128 _mm_move_ss(__m128 _A, __m128 _B); //Integer Intrinsics Using Streaming SIMD Extensions //返回一个16bit整数，Extracts one of the four words of a， //The selector n must be an immediate, //r=(_Imm == 0) ? _A0 : ((_Imm==1) ? _A1 : ((_Imm==2) ? _A2 : _A3)) extern int _m_pextrw(__m64 _A, int _Imm);//=_mm_extract_pi16 //返回一个__m64的寄存器,Inserts word d into one of four words of a, //The selector n must be an immediate //r0=(_Imm==0)? _D : _A0, r1=(_Imm==1)? _D : _A1, //r2=(_Imm==2)? _D : _A2, r3=(_Imm==3)? _D : _A3 extern __m64 _m_pinsrw(__m64 _A, int _D, int _Imm);//=_mm_insert_pi16 //返回一个__m64的寄存器,Computes the element-wise maximum of the words in a and b, //r0=max(_A0, _B0), r1=max(_A1, _B1), r2=max(_A2, _B2), r3=max(_A3, _B3) extern __m64 _m_pmaxsw(__m64 _A, __m64 _B);//=_mm_max_pi16 //返回一个__m64的寄存器,Computes the element-wise maximum of the unsigned bytes in //a and b, r0=max(_A0, _B0), r1=max(_A1, _B1), ... r7=max(_A7, _B7) extern __m64 _m_pmaxub(__m64 _A, __m64 _B);//=_mm_max_pu8 //返回一个__m64的寄存器,Computes the element-wise minimum of the words in a and b //r0=min(_A0, _B0), r1=min(_A1, _B1), r2=min(_A2, _B2), r3=min(_A3, _B3) extern __m64 _m_pminsw(__m64 _A, __m64 _B);//=_mm_min_pi16 //返回一个__m64的寄存器,Computes the element-wise minimum of the unsigned bytes //in a and b, r0=min(_A0, _B0), r1=min(_A1, _B1), ... r7=min(_A7, _B7) extern __m64 _m_pminub(__m64 _A, __m64 _B);//=_mm_min_pu8 //返回一个整数，Creates an 8-bit mask from the most significant bits of the //bytes in a, r=sign(_A7)<<7 \| sign(_A6)<<6 \| ... \| sign(_A0) extern int _m_pmovmskb(__m64 _A);//=_mm_movemask_pi8 //返回一个__m64的寄存器,Multiplies the unsigned words in a and b, returning the //upper 16 bits of the 32-bit intermediate results, //r0=hiword(_A0, _B0), r1=hiword(_A1, _B1), r2=hiword(_A2, _B2), r3=hiword(_A3, _B3) extern __m64 _m_pmulhuw(__m64 _A, __m64 _B);//=_mm_mulhi_pu16 //返回为空，Conditionally stores byte elements of d to address p,The high bit of //each byte in the selector _B determines whether the corresponding byte in _A //will be stored, if (sign(_B0)) _P[0]=_A0, if (sign(_B1)) _P[1]=_A1, ... //if (sign(_B7)) _P[7]=_A7 extern void _m_maskmovq(__m64 _A, __m64 _B, char * _P);//=_mm_maskmove_si64 //返回一个__m64的寄存器,Computes the (rounded) averages of the unsigned bytes //in a and b, t=(unsigned short)_A0 + (unsigned short)_B0, r0=(t>>1) \| (t & 0x01), //..., t=(unsigned short)_A7 + (unsigned short)_B7, r7=(t>>1) \| (t & 0x01) extern __m64 _m_pavgb(__m64 _A, __m64 _B);//=_mm_avg_pu8 //返回一个__m64的寄存器,Computes the (rounded) averages of the unsigned words //in a and b, t=(unsigned short)_A0 + (unsigned short)_B0, r0=(t>>1) \| (t & 0x01), //..., t=(unsigned short)_A4 + (unsigned short)_B4, r7=(t>>1) \| (t & 0x01) extern __m64 _m_pavgw(__m64 _A, __m64 _B);//=_mm_avg_pu16 //返回一个__m64的寄存器,Computes the sum of the absolute differences of the unsigned //bytes in a and b, returning the value in the lower word //The upper three words are cleared //r0=abs(_A0-_B0) + ... + abs(_A7-_B7), r1=r2=r3=0 extern __m64 _m_psadbw(__m64, __m64);//=_mm_sad_pu8 //返回一个__m64的寄存器,Returns a combination of the four words of a. //The selector _Imm must be an immediate //r0=word(_Imm & 0x03) of _A, r1=word((_Imm>>2) & 0x03) of _A, //r2=word((_Imm>>4) & 0x03) of _A, r1=word((_Imm>>6) & 0x03) of _A, extern __m64 _m_pshufw(__m64 _A, int _Imm);//=_mm_shuffle_pi16 //Streaming SIMD Extensions that Support the Cache //返回为空，Loads one cache line of data from address p to a location closer //to the processor, The value _Sel specifies the type of prefetch operation extern void _mm_prefetch(char const_A, int _Sel); //返回为空，Stores the data in a to the address p without polluting the caches //This intrinsic requires you to empty the multimedia state for the MMX register extern void _mm_stream_pi(__m64 _P, __m64 _A); //返回为空，Stores the data in a to the address p without polluting the caches, //The address must be 16-byte aligned extern void _mm_stream_ps(float , __m128 _A); //返回为空，Guarantees that every preceding store is globally visible //before any subsequent store extern void _mm_sfence(void); / Alternate intrinsic names definition */ #define _mm_cvtss_si32 _mm_cvt_ss2si #define _mm_cvtps_pi32 _mm_cvt_ps2pi #define _mm_cvttss_si32 _mm_cvtt_ss2si #define _mm_cvttps_pi32 _mm_cvtt_ps2pi #define _mm_cvtsi32_ss _mm_cvt_si2ss #define _mm_cvtpi32_ps _mm_cvt_pi2ps #define _mm_extract_pi16 _m_pextrw #define _mm_insert_pi16 _m_pinsrw #define _mm_max_pi16 _m_pmaxsw #define _mm_max_pu8 _m_pmaxub #define _mm_min_pi16 _m_pminsw #define _mm_min_pu8 _m_pminub #define _mm_movemask_pi8 _m_pmovmskb #define _mm_mulhi_pu16 _m_pmulhuw #define _mm_shuffle_pi16 _m_pshufw #define _mm_maskmove_si64 _m_maskmovq #define _mm_avg_pu8 _m_pavgb #define _mm_avg_pu16 _m_pavgw #define _mm_sad_pu8 _m_psadbw #define _mm_set1_ps _mm_set_ps1 #define _mm_load1_ps _mm_load_ps1 #define _mm_store1_ps _mm_store_ps1 xmmintrin.h文件中各函数的介绍： 2016-5-3 01:09 0
kuboys 雪币： 75 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 5 回帖 20 粉丝 0 关注私信	kuboys 16 楼好资料,学习了..谢谢分享 2016-5-3 08:04 0
木志本柯雪币： 4738 活跃值： (4286) 能力值： ( LV2，RANK：10 ) 在线值：发帖 0 回帖 364 粉丝 4 关注私信	木志本柯 17 楼哈哈哈，这实在是太酷了。 2018-11-17 21:19 0
jgs 雪币： 8911 活跃值： (5136) 能力值： ( LV2，RANK：10 ) 在线值：发帖 1 回帖 447 粉丝 7 关注私信	jgs 18 楼谢谢分享好资料,学习了 2018-11-17 21:32 0
jgs 雪币： 8911 活跃值： (5136) 能力值： ( LV2，RANK：10 ) 在线值：发帖 1 回帖 447 粉丝 7 关注私信	jgs 19 楼一年了，都没有人再关注？我学习使用了这段代码，感觉特征码搜索还是很快的，赞一个。 2019-12-11 22:46 0
ookkaa 雪币： 246 活跃值： (4427) 能力值： ( LV4，RANK：45 ) 在线值：发帖 22 回帖 214 粉丝 51 关注私信	ookkaa 20 楼好 2020-4-25 17:14 0
ZwCopyAll 雪币： 259 活跃值： (283) 能力值： ( LV2，RANK：10 ) 在线值：发帖 11 回帖 266 粉丝 10 关注私信	ZwCopyAll 21 楼求暴力搜索物理内存的相关代码 2020-4-26 10:12 0
iamasbcx 雪币： 3879 活跃值： (3673) 能力值： ( LV2，RANK：10 ) 在线值：发帖 2 回帖 191 粉丝 8 关注私信	iamasbcx 22 楼刚好学习内核INLINEHOOK 试试看 2020-4-26 18:49 0
明天去要饭雪币： 20 活跃值： (12) 能力值： ( LV2，RANK：10 ) 在线值：发帖 3 回帖 24 粉丝 0 关注私信	明天去要饭 23 楼好贴！这论坛要被搞G的和卖教程的霸占了 2020-5-7 18:48 0
院士雪币： 3574 活跃值： (3955) 能力值： ( LV2，RANK：10 ) 在线值：发帖 0 回帖 177 粉丝 2 关注私信	院士 24 楼帖子提到支持模糊搜索，代码里没看到处理通配符？的代码，哪位大佬指点一下？感谢。 2021-7-31 21:17 0
yaoguen 雪币： 4605 活跃值： (4527) 能力值： ( LV2，RANK：10 ) 在线值：发帖 14 回帖 143 粉丝 1 关注私信	yaoguen 25 楼 liuzewei #ifdef WIN32 #    ifndef WIN32_LEAN_AND_MEAN #     & ... 发现一个bug，如果VirtualLength小于16，并且搜索不到的情况下程序会崩溃 2022-2-28 22:36 0
	游客登录 \| 注册方可回帖回帖表情雪币赚取及消费高级回复