看编程区比较冷清,小弟不才想搞个比赛,有兴趣的进来试试吧!-编程技术-看雪-安全社区|安全招聘|kanxue.com

最新回复 (36) ◀ 1 2
雲飛揚雪币： 251 活跃值： (11) 能力值： ( LV2，RANK：10 ) 在线值：发帖 13 回帖 157 粉丝 0 关注私信	雲飛揚 26 楼我就不明白了,我发帖就是想提高自已,从别人那里学点东西,有什么错,为什么总是听到一些刺耳的声音呢,不能不针对人吗? 首先感谢一下编程区版主和回帖的各位! 再感谢一下"sojoo",让我学到了,不用"比较"的方式,我以前没有想到 ---------------------- 希望版主锁帖吧,再这样下去没有意思了,总是有一些无聊的声音,扫兴. 2009-4-21 15:09 0
forgot 雪币： 6075 活跃值： (2236) 能力值： (RANK：1060 ) 在线值：发帖 155 回帖 3771 粉丝 16 关注私信	forgot 26 27 楼我又没有说楼主的代码写的恶心，只是我不喜欢挤在一起的风格。又没有指名道姓，为什么有人就觉得是说自己呢，戳到短处了？结果我也不知道对不对，反正跟楼主贴的一样。牛逼的意思，大家都明白吧？ 2009-4-21 15:19 0
漂亮乞丐雪币： 200 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 0 回帖 22 粉丝 0 关注私信	漂亮乞丐 28 楼谁说不能直接定义的？只要有定义就行！ 2009-4-21 16:22 0
灰狐雪币： 210 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 2 回帖 68 粉丝 0 关注私信	灰狐 29 楼规范才是王道，你写这种代码时很爽，以后会被维护你代码的人骂死的 2009-4-21 16:40 0
小李他哥雪币： 200 活跃值： (10) 能力值： ( LV2，RANK：10 ) 在线值：发帖 0 回帖 3 粉丝 0 关注私信	小李他哥 30 楼这么简单的代码还有这么多学问 2009-4-21 17:07 0
option 雪币： 8201 活跃值： (2701) 能力值： ( LV2，RANK：10 ) 在线值：发帖 39 回帖 713 粉丝 1 关注私信	option 31 楼早期的C是不行的，现代的基本都可以的 2009-4-21 21:06 0
peaceclub 雪币： 255 活跃值： (207) 能力值： ( LV9，RANK：250 ) 在线值：发帖 59 回帖 967 粉丝 1 关注私信	peaceclub 6 32 楼让我们来做个比较吧,看东西还是看内在,不要被表象所迷惑。云飞扬: int ustrlen(unsigned char data[]) // 注:只精简此函数,只能用C语言. { int i = 0, count = 0; for(;data;(data[i++] & 0xFF) > 127 ? ++i:i, ++count); return count; } disasm: $ ==> >/$ 55 PUSH EBP $+1 >\|. 8BEC MOV EBP,ESP $+3 >\|. 83EC 0C SUB ESP,0C $+6 >\|. C745 F8 00000>MOV DWORD PTR SS:[EBP-8],0 $+D >\|. C745 FC 00000>MOV DWORD PTR SS:[EBP-4],0 $+14 >\|. EB 3E JMP SHORT 22.00401054 $+16 >\|> 8B45 08 /MOV EAX,DWORD PTR SS:[EBP+8] $+19 >\|. 0345 F8 \|ADD EAX,DWORD PTR SS:[EBP-8] $+1C >\|. 33C9 \|XOR ECX,ECX $+1E >\|. 8A08 \|MOV CL,BYTE PTR DS:[EAX] $+20 >\|. 81E1 FF000000 \|AND ECX,0FF $+26 >\|. 8B55 F8 \|MOV EDX,DWORD PTR SS:[EBP-8] $+29 >\|. 83C2 01 \|ADD EDX,1 $+2C >\|. 8955 F8 \|MOV DWORD PTR SS:[EBP-8],EDX $+2F >\|. 83F9 7F \|CMP ECX,7F $+32 >\|. 7E 11 \|JLE SHORT 22.00401045 $+34 >\|. 8B45 F8 \|MOV EAX,DWORD PTR SS:[EBP-8] $+37 >\|. 83C0 01 \|ADD EAX,1 $+3A >\|. 8945 F8 \|MOV DWORD PTR SS:[EBP-8],EAX $+3D >\|. 8B4D F8 \|MOV ECX,DWORD PTR SS:[EBP-8] $+40 >\|. 894D F4 \|MOV DWORD PTR SS:[EBP-C],ECX $+43 >\|. EB 06 \|JMP SHORT 22.0040104B $+45 >\|> 8B55 F8 \|MOV EDX,DWORD PTR SS:[EBP-8] $+48 >\|. 8955 F4 \|MOV DWORD PTR SS:[EBP-C],EDX $+4B >\|> 8B45 FC \|MOV EAX,DWORD PTR SS:[EBP-4] $+4E >\|. 83C0 01 \|ADD EAX,1 $+51 >\|. 8945 FC \|MOV DWORD PTR SS:[EBP-4],EAX $+54 >\|> 8B4D 08 MOV ECX,DWORD PTR SS:[EBP+8] $+57 >\|. 034D F8 \|ADD ECX,DWORD PTR SS:[EBP-8] $+5A >\|. 33D2 \|XOR EDX,EDX $+5C >\|. 8A11 \|MOV DL,BYTE PTR DS:[ECX] $+5E >\|. 85D2 \|TEST EDX,EDX $+60 >\|. 74 02 \|JE SHORT 22.00401064 $+62 >\|.^ EB B2 \JMP SHORT 22.00401016 $+64 >\|> 8B45 FC MOV EAX,DWORD PTR SS:[EBP-4] $+67 >\|. 8BE5 MOV ESP,EBP $+69 >\|. 5D POP EBP $+6A >\. C3 RETN 实际cpu代码序列总长度:0x6A. forgAt: int ustrlen(char s) { unsigned long i; unsigned char c; for (i = 0; c = s++; i++) s += c >> 7; return i; } $ ==> >/$ 55 PUSH EBP $+1 >\|. 8BEC MOV EBP,ESP $+3 >\|. 83EC 08 SUB ESP,8 $+6 >\|. C745 F8 00000>MOV DWORD PTR SS:[EBP-8],0 $+D >\|. EB 09 JMP SHORT 11.00401018 $+F >\|> 8B45 F8 /MOV EAX,DWORD PTR SS:[EBP-8] $+12 >\|. 83C0 01 \|ADD EAX,1 $+15 >\|. 8945 F8 \|MOV DWORD PTR SS:[EBP-8],EAX $+18 >\|> 8B4D 08 MOV ECX,DWORD PTR SS:[EBP+8] $+1B >\|. 8A11 \|MOV DL,BYTE PTR DS:[ECX] $+1D >\|. 8855 FC \|MOV BYTE PTR SS:[EBP-4],DL $+20 >\|. 8B45 FC \|MOV EAX,DWORD PTR SS:[EBP-4] $+23 >\|. 25 FF000000 \|AND EAX,0FF $+28 >\|. 8B4D 08 \|MOV ECX,DWORD PTR SS:[EBP+8] $+2B >\|. 83C1 01 \|ADD ECX,1 $+2E >\|. 894D 08 \|MOV DWORD PTR SS:[EBP+8],ECX $+31 >\|. 85C0 \|TEST EAX,EAX $+33 >\|. 74 16 \|JE SHORT 11.0040104B $+35 >\|. 8B55 FC \|MOV EDX,DWORD PTR SS:[EBP-4] $+38 >\|. 81E2 FF000000 \|AND EDX,0FF $+3E >\|. C1FA 07 \|SAR EDX,7 $+41 >\|. 8B45 08 \|MOV EAX,DWORD PTR SS:[EBP+8] $+44 >\|. 03C2 \|ADD EAX,EDX $+46 >\|. 8945 08 \|MOV DWORD PTR SS:[EBP+8],EAX $+49 >\|.^ EB C4 \JMP SHORT 11.0040100F $+4B >\|> 8B45 F8 MOV EAX,DWORD PTR SS:[EBP-8] $+4E >\|. 8BE5 MOV ESP,EBP $+50 >\|. 5D POP EBP $+51 >\. C3 RETN 实际cpu代码序列总长度:0x51. 虽然forgAt的行数比云飞扬的多,但最终形成的cpu码序列长度反而短。 cpu码序列短并结论正确的代码才是高效率优秀代码。 2009-4-21 22:02 0
peaceclub 雪币： 255 活跃值： (207) 能力值： ( LV9，RANK：250 ) 在线值：发帖 59 回帖 967 粉丝 1 关注私信	peaceclub 6 33 楼让我们来做个比较吧,看东西还是看内在,不要被表象所迷惑。云飞扬: int ustrlen(unsigned char data[]) // 注:只精简此函数,只能用C语言. { int i = 0, count = 0; for(;data;(data[i++] & 0xFF) > 127 ? ++i:i, ++count); return count; } disasm: $ ==> >/$ 55 PUSH EBP $+1 >\|. 8BEC MOV EBP,ESP $+3 >\|. 83EC 0C SUB ESP,0C $+6 >\|. C745 F8 00000>MOV DWORD PTR SS:[EBP-8],0 $+D >\|. C745 FC 00000>MOV DWORD PTR SS:[EBP-4],0 $+14 >\|. EB 3E JMP SHORT 22.00401054 $+16 >\|> 8B45 08 /MOV EAX,DWORD PTR SS:[EBP+8] $+19 >\|. 0345 F8 \|ADD EAX,DWORD PTR SS:[EBP-8] $+1C >\|. 33C9 \|XOR ECX,ECX $+1E >\|. 8A08 \|MOV CL,BYTE PTR DS:[EAX] $+20 >\|. 81E1 FF000000 \|AND ECX,0FF $+26 >\|. 8B55 F8 \|MOV EDX,DWORD PTR SS:[EBP-8] $+29 >\|. 83C2 01 \|ADD EDX,1 $+2C >\|. 8955 F8 \|MOV DWORD PTR SS:[EBP-8],EDX $+2F >\|. 83F9 7F \|CMP ECX,7F $+32 >\|. 7E 11 \|JLE SHORT 22.00401045 $+34 >\|. 8B45 F8 \|MOV EAX,DWORD PTR SS:[EBP-8] $+37 >\|. 83C0 01 \|ADD EAX,1 $+3A >\|. 8945 F8 \|MOV DWORD PTR SS:[EBP-8],EAX $+3D >\|. 8B4D F8 \|MOV ECX,DWORD PTR SS:[EBP-8] $+40 >\|. 894D F4 \|MOV DWORD PTR SS:[EBP-C],ECX $+43 >\|. EB 06 \|JMP SHORT 22.0040104B $+45 >\|> 8B55 F8 \|MOV EDX,DWORD PTR SS:[EBP-8] $+48 >\|. 8955 F4 \|MOV DWORD PTR SS:[EBP-C],EDX $+4B >\|> 8B45 FC \|MOV EAX,DWORD PTR SS:[EBP-4] $+4E >\|. 83C0 01 \|ADD EAX,1 $+51 >\|. 8945 FC \|MOV DWORD PTR SS:[EBP-4],EAX $+54 >\|> 8B4D 08 MOV ECX,DWORD PTR SS:[EBP+8] $+57 >\|. 034D F8 \|ADD ECX,DWORD PTR SS:[EBP-8] $+5A >\|. 33D2 \|XOR EDX,EDX $+5C >\|. 8A11 \|MOV DL,BYTE PTR DS:[ECX] $+5E >\|. 85D2 \|TEST EDX,EDX $+60 >\|. 74 02 \|JE SHORT 22.00401064 $+62 >\|.^ EB B2 \JMP SHORT 22.00401016 $+64 >\|> 8B45 FC MOV EAX,DWORD PTR SS:[EBP-4] $+67 >\|. 8BE5 MOV ESP,EBP $+69 >\|. 5D POP EBP $+6A >\. C3 RETN 实际cpu代码序列总长度:0x6A. forgAt: int ustrlen(char s) { unsigned long i; unsigned char c; for (i = 0; c = s++; i++) s += c >> 7; return i; } $ ==> >/$ 55 PUSH EBP $+1 >\|. 8BEC MOV EBP,ESP $+3 >\|. 83EC 08 SUB ESP,8 $+6 >\|. C745 F8 00000>MOV DWORD PTR SS:[EBP-8],0 $+D >\|. EB 09 JMP SHORT 11.00401018 $+F >\|> 8B45 F8 /MOV EAX,DWORD PTR SS:[EBP-8] $+12 >\|. 83C0 01 \|ADD EAX,1 $+15 >\|. 8945 F8 \|MOV DWORD PTR SS:[EBP-8],EAX $+18 >\|> 8B4D 08 MOV ECX,DWORD PTR SS:[EBP+8] $+1B >\|. 8A11 \|MOV DL,BYTE PTR DS:[ECX] $+1D >\|. 8855 FC \|MOV BYTE PTR SS:[EBP-4],DL $+20 >\|. 8B45 FC \|MOV EAX,DWORD PTR SS:[EBP-4] $+23 >\|. 25 FF000000 \|AND EAX,0FF $+28 >\|. 8B4D 08 \|MOV ECX,DWORD PTR SS:[EBP+8] $+2B >\|. 83C1 01 \|ADD ECX,1 $+2E >\|. 894D 08 \|MOV DWORD PTR SS:[EBP+8],ECX $+31 >\|. 85C0 \|TEST EAX,EAX $+33 >\|. 74 16 \|JE SHORT 11.0040104B $+35 >\|. 8B55 FC \|MOV EDX,DWORD PTR SS:[EBP-4] $+38 >\|. 81E2 FF000000 \|AND EDX,0FF $+3E >\|. C1FA 07 \|SAR EDX,7 $+41 >\|. 8B45 08 \|MOV EAX,DWORD PTR SS:[EBP+8] $+44 >\|. 03C2 \|ADD EAX,EDX $+46 >\|. 8945 08 \|MOV DWORD PTR SS:[EBP+8],EAX $+49 >\|.^ EB C4 \JMP SHORT 11.0040100F $+4B >\|> 8B45 F8 MOV EAX,DWORD PTR SS:[EBP-8] $+4E >\|. 8BE5 MOV ESP,EBP $+50 >\|. 5D POP EBP $+51 >\. C3 RETN 实际cpu代码序列总长度:0x51. 虽然forgAt的行数比云飞扬的多,但最终形成的cpu码序列长度反而短。 cpu码序列短并结论正确的代码才是高效率优秀代码。 2009-4-21 22:02 0
雲飛揚雪币： 251 活跃值： (11) 能力值： ( LV2，RANK：10 ) 在线值：发帖 13 回帖 157 粉丝 0 关注私信	雲飛揚 34 楼 to: peaceclub 谢谢你的回复. 我对汇编没怎么研究过,也没有试过,写完代码然后反汇编. 其实,我不认为反汇编后代码少就代码多的快. 因为这涉及到,CPU执行汇编指令的问题. 打个比方,同样是一条汇编指令,"加" 和 "乘"的速度肯定是不一样的. 用比时间的方法,我感是比较准确的. 如果您有时间的话可以试一下这个方法. 2009-4-22 08:20 0
supermilg 雪币： 22 活跃值： (30) 能力值： ( LV2，RANK：10 ) 在线值：发帖 4 回帖 92 粉丝 0 关注私信	supermilg 35 楼汇编长度不是关键，Cache效率在字符串长度计算中才是关键。另外，楼上的所有同志们难道都没有看出楼主的代码存在数组越界的潜在的安全漏洞？安全是王道！ 2009-4-23 08:26 0
supermilg 雪币： 22 活跃值： (30) 能力值： ( LV2，RANK：10 ) 在线值：发帖 4 回帖 92 粉丝 0 关注私信	supermilg 36 楼楼主要是真是为了效率看看这样的类似代码，充分利用cache才是性能的王道： int strlen(char const* s) { /* Search index. * The input pointer is still needed for length calculation. / char const p = s; /* Bit patterns (32 bit system), temporary. * The three used 32 bit patterns are 0x7efefeff (7efefeff, * 7efefeffh, m), 0x81010101 (81010101, 81010101h, -m) and 0x81010100 * (81010100, 81010100h, ~m). * On 64 bit systems, the pattern would be 0x7efefefefefefeff (its * complement 0x8101010101010101). / int m = 0x7efefeff, n = ~m, i; / Advance the search index until it is aligned to the data size * used in the following search loop; do a conventional search on * that unaligned part at the same time, since the string may be * short (shorter than sizeof(int)-1). * This loop is not entered in the more optimal case of an input * string being aligned (compilers may chose to e.g. align string * literals, dynamically allocated buffers are aligned, etc). * Alignment of the subsequent multi-byte data accesses is an * optimization, since misaligned data accesses include a performance * penalty and possibly use more than one cache line, or are even * disallowed and generate a bus error, e.g. in RISC architectures. * This step will take less than sizeof(int) loop iterations. * The &(sizeof(int)-1) operation is an optimized %sizeof(int) * (assuming sizeof(int) is a power-of-two), since bit operations are * more performant than arithmetic operations (in this case, with an * operand that is known at compile time, the compiler/optimizer * might generate the same code for the %-operation). * The !=0 part is left away here (and also at other places) in the * conditional expression , which might be a reflection of the fact * that the compiler may directly use the zero-conditional produced * by the &-operation. / for (; (int)p & (sizeof(int) - 1); p++) { / Check for end-of-string too (relevant with small, unaligned * strings, see earlier comment too). / if (!p) { return p - s; } } /* Aligned multi-byte data search loop, in pieces of sizeof(int) * (by C language definition, int is the most natural data size for * CPUs to operate on; maybe not with 64 bit systems in LP64 mode * [only longs and pointers native; int remains 32 bit, for software * compatibility and/or resource saving], in which long may be the * natural data size). * One multi-byte data-access is done per iteration instead of * sizeof(int) single-byte data-accesses (a single multi-byte * data-access may even be faster than one single-byte data-access). * Each iteration uses one subtraction/addition/comparison instead * of sizeof(int). / for (;;) { / Next sizeof(int) aligned string bytes. * We may read up to sizeof(int)-1 bytes beyond the string * (excluding its terminating nul byte), which is not a problem * with memory (page) maps, since the (mapped) pages are aligned * to a multiple of sizeof(int) and so an aligned int can't cross * two pages. * However, memory boundary check tools may trigger warnings * (such as uninitialized memory read, etc). / i = (int const)p; / Check if this chunk is a candidate for having embedded nul * bytes. * This algorithm assumes two's complement integer * representation. * i+0x7efefeff is the same as i-0x81010101. * Subtracting 1 from each (possible nul) byte will flip its * least significant bit, and possibly some more, depending on * the input (all bits up to the least significant set bit). * If the byte was zero (all bits cleared, a nul byte), all bits * are flipped and a carry bit is generated into the next * significant byte, which flips the next byte's least * significant bit a second time. * Since we cannot (and don't want) to check the * int-carry-out-bit (the most significant byte's carry) in C, * the most significant byte's most significant bit is used to do * that check, which leads to false candidates if only that bit * was set in the most significant byte (i.e. the most * significant byte was a \x80 byte). * These false candidates weaken the search algorithm's * performance a bit. * Comparing the subtraction-flip result with the logic-flip * result (^~i), while only considering the sizeof(int) bits of * interest (&0x81010100), will reveal if carries have occurred * (double-bit-flip vs. single-bit-flip). / if (!(((i + m) ^ ~i) & n)) { / This chunk is not a candidate, which means it does not * include nul bytes. * Advance sizeof(int) bytes. / p += sizeof(int); } else { / This chunk is a candidate, which means it may include nul * bytes or, at the most significant byte, a \x80. * Searches on strings containing lots of such bytes will be * less-than-optimized. * Conventionally search for a nul (and its position) in * sizeof(int) bytes and continue chunked search if none is * found (with a false candidate, see comments above), in * which case the search index is advanced too by sizeof(int) * bytes. / for (i = sizeof(int); i; p++, i--) { if (!p) { return p - s; } } } } } 2009-4-23 08:31 0
Loka 雪币： 993 活跃值： (442) 能力值： ( LV12，RANK：403 ) 在线值：发帖 23 回帖 131 粉丝 0 关注私信	Loka 2 37 楼精妙！用算术中的进位来判断是否为0，一次处理sizeof(int)字节。 ========================================= 此法还是有点小缺陷，以32位为例，if (!(((i + m) ^ ~i) & n))语句在高位字节为0x80的情况下也进入 for (i = sizeof(int); i; p++, i--) { if (!*p) { return p - s; } 最后判定4字节都为非零，p被正确+4，不影响函数的最后计算结果，但有违算法思想原来的本意，按算法的本意应该是直接p += sizeof(int); 2009-4-23 15:09 0
	游客登录 \| 注册方可回帖回帖表情雪币赚取及消费高级回复

雲飛揚

雪币： 251

活跃值： (11)

能力值：

( LV2，RANK：10 )

在线值：

发帖

13

回帖

157

粉丝

0

关注

私信

雲飛揚: 26 楼

我就不明白了,我发帖就是想提高自已,从别人那里学点东西,有什么错,为什么总是听到一些
刺耳的声音呢,不能不针对人吗?

首先感谢一下编程区版主和回帖的各位!
再感谢一下"sojoo",让我学到了,不用"比较"的方式,我以前没有想到
----------------------
希望版主锁帖吧,再这样下去没有意思了,总是有一些无聊的声音,扫兴.

2009-4-21 15:09

0

forgot

雪币： 6075

活跃值： (2236)

能力值： (RANK：1060 )

在线值：

发帖

155

回帖

3771

粉丝

16

关注

私信

forgot 26: 27 楼

我又没有说楼主的代码写的恶心，只是我不喜欢挤在一起的风格。

又没有指名道姓，为什么有人就觉得是说自己呢，戳到短处了？

结果我也不知道对不对，反正跟楼主贴的一样。

牛逼的意思，大家都明白吧？

2009-4-21 15:19

0

漂亮乞丐

雪币： 200

活跃值： (10)

能力值：

( LV2，RANK：10 )

在线值：

发帖

0

回帖

22

粉丝

0

关注

私信

漂亮乞丐: 28 楼

谁说不能直接定义的？只要有定义就行！

2009-4-21 16:22

0

灰狐

雪币： 210

活跃值： (10)

能力值：

( LV2，RANK：10 )

在线值：

发帖

2

回帖

68

粉丝

0

关注

私信

灰狐: 29 楼

规范才是王道，你写这种代码时很爽，以后会被维护你代码的人骂死的

2009-4-21 16:40

0

小李他哥

雪币： 200

活跃值： (10)

能力值：

( LV2，RANK：10 )

在线值：

发帖

0

回帖

3

粉丝

0

关注

私信

小李他哥: 30 楼

这么简单的代码还有这么多学问

2009-4-21 17:07

0

option

雪币： 8201

活跃值： (2701)

能力值：

( LV2，RANK：10 )

在线值：

发帖

39

回帖

713

粉丝

1

关注

私信

option: 31 楼

早期的C是不行的，现代的基本都可以的

2009-4-21 21:06

0

peaceclub

雪币： 255

活跃值： (207)

能力值：

( LV9，RANK：250 )

在线值：

发帖

59

回帖

967

粉丝

1

关注

私信

peaceclub 6: 32 楼

让我们来做个比较吧,看东西还是看内在,不要被表象所迷惑。

云飞扬:
int ustrlen(unsigned char data[]) // 注:只精简此函数,只能用C语言.
{
int i = 0, count = 0;

for(;data;(data[i++] & 0xFF) > 127 ? ++i:i, ++count);

return count;
}

disasm:

$ ==> >/$ 55 PUSH EBP
$+1 >|. 8BEC MOV EBP,ESP
$+3 >|. 83EC 0C SUB ESP,0C
$+6 >|. C745 F8 00000>MOV DWORD PTR SS:[EBP-8],0
$+D >|. C745 FC 00000>MOV DWORD PTR SS:[EBP-4],0
$+14 >|. EB 3E JMP SHORT 22.00401054
$+16 >|> 8B45 08 /MOV EAX,DWORD PTR SS:[EBP+8]
$+19 >|. 0345 F8 |ADD EAX,DWORD PTR SS:[EBP-8]
$+1C >|. 33C9 |XOR ECX,ECX
$+1E >|. 8A08 |MOV CL,BYTE PTR DS:[EAX]
$+20 >|. 81E1 FF000000 |AND ECX,0FF
$+26 >|. 8B55 F8 |MOV EDX,DWORD PTR SS:[EBP-8]
$+29 >|. 83C2 01 |ADD EDX,1
$+2C >|. 8955 F8 |MOV DWORD PTR SS:[EBP-8],EDX
$+2F >|. 83F9 7F |CMP ECX,7F
$+32 >|. 7E 11 |JLE SHORT 22.00401045
$+34 >|. 8B45 F8 |MOV EAX,DWORD PTR SS:[EBP-8]
$+37 >|. 83C0 01 |ADD EAX,1
$+3A >|. 8945 F8 |MOV DWORD PTR SS:[EBP-8],EAX
$+3D >|. 8B4D F8 |MOV ECX,DWORD PTR SS:[EBP-8]
$+40 >|. 894D F4 |MOV DWORD PTR SS:[EBP-C],ECX
$+43 >|. EB 06 |JMP SHORT 22.0040104B
$+45 >|> 8B55 F8 |MOV EDX,DWORD PTR SS:[EBP-8]
$+48 >|. 8955 F4 |MOV DWORD PTR SS:[EBP-C],EDX
$+4B >|> 8B45 FC |MOV EAX,DWORD PTR SS:[EBP-4]
$+4E >|. 83C0 01 |ADD EAX,1
$+51 >|. 8945 FC |MOV DWORD PTR SS:[EBP-4],EAX
$+54 >|> 8B4D 08 MOV ECX,DWORD PTR SS:[EBP+8]
$+57 >|. 034D F8 |ADD ECX,DWORD PTR SS:[EBP-8]
$+5A >|. 33D2 |XOR EDX,EDX
$+5C >|. 8A11 |MOV DL,BYTE PTR DS:[ECX]
$+5E >|. 85D2 |TEST EDX,EDX
$+60 >|. 74 02 |JE SHORT 22.00401064
$+62 >|.^ EB B2 \JMP SHORT 22.00401016
$+64 >|> 8B45 FC MOV EAX,DWORD PTR SS:[EBP-4]
$+67 >|. 8BE5 MOV ESP,EBP
$+69 >|. 5D POP EBP
$+6A >\. C3 RETN
实际cpu代码序列总长度:0x6A.

forgAt:
int
ustrlen(char *s)
{
unsigned long i;
unsigned char c;

for (i = 0; c = *s++; i++)
s += c >> 7;
return i;
}

$ ==> >/$ 55 PUSH EBP
$+1 >|. 8BEC MOV EBP,ESP
$+3 >|. 83EC 08 SUB ESP,8
$+6 >|. C745 F8 00000>MOV DWORD PTR SS:[EBP-8],0
$+D >|. EB 09 JMP SHORT 11.00401018
$+F >|> 8B45 F8 /MOV EAX,DWORD PTR SS:[EBP-8]
$+12 >|. 83C0 01 |ADD EAX,1
$+15 >|. 8945 F8 |MOV DWORD PTR SS:[EBP-8],EAX
$+18 >|> 8B4D 08 MOV ECX,DWORD PTR SS:[EBP+8]
$+1B >|. 8A11 |MOV DL,BYTE PTR DS:[ECX]
$+1D >|. 8855 FC |MOV BYTE PTR SS:[EBP-4],DL
$+20 >|. 8B45 FC |MOV EAX,DWORD PTR SS:[EBP-4]
$+23 >|. 25 FF000000 |AND EAX,0FF
$+28 >|. 8B4D 08 |MOV ECX,DWORD PTR SS:[EBP+8]
$+2B >|. 83C1 01 |ADD ECX,1
$+2E >|. 894D 08 |MOV DWORD PTR SS:[EBP+8],ECX
$+31 >|. 85C0 |TEST EAX,EAX
$+33 >|. 74 16 |JE SHORT 11.0040104B
$+35 >|. 8B55 FC |MOV EDX,DWORD PTR SS:[EBP-4]
$+38 >|. 81E2 FF000000 |AND EDX,0FF
$+3E >|. C1FA 07 |SAR EDX,7
$+41 >|. 8B45 08 |MOV EAX,DWORD PTR SS:[EBP+8]
$+44 >|. 03C2 |ADD EAX,EDX
$+46 >|. 8945 08 |MOV DWORD PTR SS:[EBP+8],EAX
$+49 >|.^ EB C4 \JMP SHORT 11.0040100F
$+4B >|> 8B45 F8 MOV EAX,DWORD PTR SS:[EBP-8]
$+4E >|. 8BE5 MOV ESP,EBP
$+50 >|. 5D POP EBP
$+51 >\. C3 RETN

实际cpu代码序列总长度:0x51.

虽然forgAt的行数比云飞扬的多,但最终形成的cpu码序列长度反而短。
cpu码序列短并结论正确的代码才是高效率优秀代码。

2009-4-21 22:02

0

peaceclub

雪币： 255

活跃值： (207)

能力值：

( LV9，RANK：250 )

在线值：

发帖

59

回帖

967

粉丝

1

关注

私信

peaceclub 6: 33 楼

让我们来做个比较吧,看东西还是看内在,不要被表象所迷惑。

云飞扬:
int ustrlen(unsigned char data[]) // 注:只精简此函数,只能用C语言.
{
int i = 0, count = 0;

for(;data;(data[i++] & 0xFF) > 127 ? ++i:i, ++count);

return count;
}

disasm:

$ ==> >/$ 55 PUSH EBP
$+1 >|. 8BEC MOV EBP,ESP
$+3 >|. 83EC 0C SUB ESP,0C
$+6 >|. C745 F8 00000>MOV DWORD PTR SS:[EBP-8],0
$+D >|. C745 FC 00000>MOV DWORD PTR SS:[EBP-4],0
$+14 >|. EB 3E JMP SHORT 22.00401054
$+16 >|> 8B45 08 /MOV EAX,DWORD PTR SS:[EBP+8]
$+19 >|. 0345 F8 |ADD EAX,DWORD PTR SS:[EBP-8]
$+1C >|. 33C9 |XOR ECX,ECX
$+1E >|. 8A08 |MOV CL,BYTE PTR DS:[EAX]
$+20 >|. 81E1 FF000000 |AND ECX,0FF
$+26 >|. 8B55 F8 |MOV EDX,DWORD PTR SS:[EBP-8]
$+29 >|. 83C2 01 |ADD EDX,1
$+2C >|. 8955 F8 |MOV DWORD PTR SS:[EBP-8],EDX
$+2F >|. 83F9 7F |CMP ECX,7F
$+32 >|. 7E 11 |JLE SHORT 22.00401045
$+34 >|. 8B45 F8 |MOV EAX,DWORD PTR SS:[EBP-8]
$+37 >|. 83C0 01 |ADD EAX,1
$+3A >|. 8945 F8 |MOV DWORD PTR SS:[EBP-8],EAX
$+3D >|. 8B4D F8 |MOV ECX,DWORD PTR SS:[EBP-8]
$+40 >|. 894D F4 |MOV DWORD PTR SS:[EBP-C],ECX
$+43 >|. EB 06 |JMP SHORT 22.0040104B
$+45 >|> 8B55 F8 |MOV EDX,DWORD PTR SS:[EBP-8]
$+48 >|. 8955 F4 |MOV DWORD PTR SS:[EBP-C],EDX
$+4B >|> 8B45 FC |MOV EAX,DWORD PTR SS:[EBP-4]
$+4E >|. 83C0 01 |ADD EAX,1
$+51 >|. 8945 FC |MOV DWORD PTR SS:[EBP-4],EAX
$+54 >|> 8B4D 08 MOV ECX,DWORD PTR SS:[EBP+8]
$+57 >|. 034D F8 |ADD ECX,DWORD PTR SS:[EBP-8]
$+5A >|. 33D2 |XOR EDX,EDX
$+5C >|. 8A11 |MOV DL,BYTE PTR DS:[ECX]
$+5E >|. 85D2 |TEST EDX,EDX
$+60 >|. 74 02 |JE SHORT 22.00401064
$+62 >|.^ EB B2 \JMP SHORT 22.00401016
$+64 >|> 8B45 FC MOV EAX,DWORD PTR SS:[EBP-4]
$+67 >|. 8BE5 MOV ESP,EBP
$+69 >|. 5D POP EBP
$+6A >\. C3 RETN
实际cpu代码序列总长度:0x6A.

forgAt:
int
ustrlen(char *s)
{
unsigned long i;
unsigned char c;

for (i = 0; c = *s++; i++)
s += c >> 7;
return i;
}

$ ==> >/$ 55 PUSH EBP
$+1 >|. 8BEC MOV EBP,ESP
$+3 >|. 83EC 08 SUB ESP,8
$+6 >|. C745 F8 00000>MOV DWORD PTR SS:[EBP-8],0
$+D >|. EB 09 JMP SHORT 11.00401018
$+F >|> 8B45 F8 /MOV EAX,DWORD PTR SS:[EBP-8]
$+12 >|. 83C0 01 |ADD EAX,1
$+15 >|. 8945 F8 |MOV DWORD PTR SS:[EBP-8],EAX
$+18 >|> 8B4D 08 MOV ECX,DWORD PTR SS:[EBP+8]
$+1B >|. 8A11 |MOV DL,BYTE PTR DS:[ECX]
$+1D >|. 8855 FC |MOV BYTE PTR SS:[EBP-4],DL
$+20 >|. 8B45 FC |MOV EAX,DWORD PTR SS:[EBP-4]
$+23 >|. 25 FF000000 |AND EAX,0FF
$+28 >|. 8B4D 08 |MOV ECX,DWORD PTR SS:[EBP+8]
$+2B >|. 83C1 01 |ADD ECX,1
$+2E >|. 894D 08 |MOV DWORD PTR SS:[EBP+8],ECX
$+31 >|. 85C0 |TEST EAX,EAX
$+33 >|. 74 16 |JE SHORT 11.0040104B
$+35 >|. 8B55 FC |MOV EDX,DWORD PTR SS:[EBP-4]
$+38 >|. 81E2 FF000000 |AND EDX,0FF
$+3E >|. C1FA 07 |SAR EDX,7
$+41 >|. 8B45 08 |MOV EAX,DWORD PTR SS:[EBP+8]
$+44 >|. 03C2 |ADD EAX,EDX
$+46 >|. 8945 08 |MOV DWORD PTR SS:[EBP+8],EAX
$+49 >|.^ EB C4 \JMP SHORT 11.0040100F
$+4B >|> 8B45 F8 MOV EAX,DWORD PTR SS:[EBP-8]
$+4E >|. 8BE5 MOV ESP,EBP
$+50 >|. 5D POP EBP
$+51 >\. C3 RETN

实际cpu代码序列总长度:0x51.

虽然forgAt的行数比云飞扬的多,但最终形成的cpu码序列长度反而短。
cpu码序列短并结论正确的代码才是高效率优秀代码。

2009-4-21 22:02

0

雲飛揚

雪币： 251

活跃值： (11)

能力值：

( LV2，RANK：10 )

在线值：

发帖

13

回帖

157

粉丝

0

关注

私信

雲飛揚: 34 楼

to: peaceclub
谢谢你的回复.
我对汇编没怎么研究过,也没有试过,写完代码然后反汇编.
其实,我不认为反汇编后代码少就代码多的快.
因为这涉及到,CPU执行汇编指令的问题.
打个比方,同样是一条汇编指令,"加" 和 "乘"的速度肯定是不一样的.
用比时间的方法,我感是比较准确的.
如果您有时间的话可以试一下这个方法.

2009-4-22 08:20

0

supermilg

雪币： 22

活跃值： (30)

能力值：

( LV2，RANK：10 )

在线值：

发帖

4

回帖

92

粉丝

0

关注

私信

supermilg: 35 楼

汇编长度不是关键，Cache效率在字符串长度计算中才是关键。

另外，楼上的所有同志们难道都没有看出楼主的代码存在数组越界的潜在的安全漏洞？

安全是王道！

2009-4-23 08:26

0

supermilg

雪币： 22

活跃值： (30)

能力值：

( LV2，RANK：10 )

在线值：

发帖

4

回帖

92

粉丝

0

关注

私信

supermilg: 36 楼

楼主要是真是为了效率看看这样的类似代码，充分利用cache才是性能的王道：

int strlen(char const* s)
{
/* Search index.
* The input pointer is still needed for length calculation.
*/
char const* p = s;

/* Bit patterns (32 bit system), temporary.
* The three used 32 bit patterns are 0x7efefeff (7efefeff,
* 7efefeffh, m), 0x81010101 (81010101, 81010101h, -m) and 0x81010100
* (81010100, 81010100h, ~m).
* On 64 bit systems, the pattern would be 0x7efefefefefefeff (its
* complement 0x8101010101010101).
*/
int m = 0x7efefeff, n = ~m, i;

/* Advance the search index until it is aligned to the data size
* used in the following search loop; do a conventional search on
* that unaligned part at the same time, since the string may be
* short (shorter than sizeof(int)-1).
* This loop is not entered in the more optimal case of an input
* string being aligned (compilers may chose to e.g. align string
* literals, dynamically allocated buffers are aligned, etc).
* Alignment of the subsequent multi-byte data accesses is an
* optimization, since misaligned data accesses include a performance
* penalty and possibly use more than one cache line, or are even
* disallowed and generate a bus error, e.g. in RISC architectures.
* This step will take less than sizeof(int) loop iterations.
* The &(sizeof(int)-1) operation is an optimized %sizeof(int)
* (assuming sizeof(int) is a power-of-two), since bit operations are
* more performant than arithmetic operations (in this case, with an
* operand that is known at compile time, the compiler/optimizer
* might generate the same code for the %-operation).
* The !=0 part is left away here (and also at other places) in the
* conditional expression , which might be a reflection of the fact
* that the compiler may directly use the zero-conditional produced
* by the &-operation.
*/
for (; (int)p & (sizeof(int) - 1); p++) {

      /* Check for end-of-string too (relevant with small, unaligned
      * strings, see earlier comment too).
      */
      if (!*p) {
         return p - s;
      }
}

/* Aligned multi-byte data search loop, in pieces of sizeof(int)
* (by C language definition, int is the most natural data size for
* CPUs to operate on; maybe not with 64 bit systems in LP64 mode
* [only longs and pointers native; int remains 32 bit, for software
* compatibility and/or resource saving], in which long may be the
* natural data size).
* One multi-byte data-access is done per iteration instead of
* sizeof(int) single-byte data-accesses (a single multi-byte
* data-access may even be faster than one single-byte data-access).
* Each iteration uses one subtraction/addition/comparison instead
* of sizeof(int).
*/
for (;;) {

      /* Next sizeof(int) aligned string bytes.
      * We may read up to sizeof(int)-1 bytes beyond the string
      * (excluding its terminating nul byte), which is not a problem
      * with memory (page) maps, since the (mapped) pages are aligned
      * to a multiple of sizeof(int) and so an aligned int can't cross
      * two pages.
      * However, memory boundary check tools may trigger warnings
      * (such as uninitialized memory read, etc).
      */
      i = *(int const*)p;

      /* Check if this chunk is a candidate for having embedded nul
      * bytes.
      * This algorithm assumes two's complement integer
      * representation.
      * i+0x7efefeff is the same as i-0x81010101.
      * Subtracting 1 from each (possible nul) byte will flip its
      * least significant bit, and possibly some more, depending on
      * the input (all bits up to the least significant set bit).
      * If the byte was zero (all bits cleared, a nul byte), all bits
      * are flipped and a carry bit is generated into the next
      * significant byte, which flips the next byte's least
      * significant bit a second time.
      * Since we cannot (and don't want) to check the
      * int-carry-out-bit (the most significant byte's carry) in C,
      * the most significant byte's most significant bit is used to do
      * that check, which leads to false candidates if only that bit
      * was set in the most significant byte (i.e. the most
      * significant byte was a \x80 byte).
      * These false candidates weaken the search algorithm's
      * performance a bit.
      * Comparing the subtraction-flip result with the logic-flip
      * result (^~i), while only considering the sizeof(int) bits of
      * interest (&0x81010100), will reveal if carries have occurred
      * (double-bit-flip vs. single-bit-flip).
      */
      if (!(((i + m) ^ ~i) & n)) {

         /* This chunk is not a candidate, which means it does not
         * include nul bytes.
         * Advance sizeof(int) bytes.
         */
         p += sizeof(int);

      } else {

         /* This chunk is a candidate, which means it may include nul
         * bytes or, at the most significant byte, a \x80.
         * Searches on strings containing lots of such bytes will be
         * less-than-optimized.
         * Conventionally search for a nul (and its position) in
         * sizeof(int) bytes and continue chunked search if none is
         * found (with a false candidate, see comments above), in
         * which case the search index is advanced too by sizeof(int)
         * bytes.
         */
         for (i = sizeof(int); i; p++, i--) {
            if (!*p) {
                  return p - s;
            }
         }
      }
}
}

2009-4-23 08:31

0

Loka

雪币： 993

活跃值： (442)

能力值：

( LV12，RANK：403 )

在线值：

发帖

23

回帖

131

粉丝

0

关注

私信

Loka 2: 37 楼

精妙！用算术中的进位来判断是否为0，一次处理sizeof(int)字节。

=========================================
此法还是有点小缺陷，以32位为例，if (!(((i + m) ^ ~i) & n))语句在高位字节为0x80的情况下也进入
for (i = sizeof(int); i; p++, i--) {
            if (!*p) {
                  return p - s;
            }
最后判定4字节都为非零，p被正确+4，不影响函数的最后计算结果，但有违算法思想原来的本意，
按算法的本意应该是直接p += sizeof(int);

2009-4-23 15:09

0