这是一道很有意思的题目,成功的激起了我的兴趣。做的不一定对,也不一定好。跟大家分享。
31 c0 83 fa 0f 0f 94 c0 8d 04 c5 08 00 00 00 c3 31 c0 83 fa 0f 0f 94 c0 8d 04 c5 09 00 00 00 c3 90 8d b4 26 00 00 00 00 83 ec 0c 89 5c 24 04 31 db 89 74
24 08 89 c6 0f b6 04 1e e8 18 ff ff ff 83 f8 14 76 0b e8 7e 3a ff ff 8d b6 00 00 00 00 ff 24 85 a8 91 0f 08 90 b8 03 00 00 00 8b 74 24 08 01 d8 8b 5c 24
04 83 c4 0c c3 90 8d 74 26 00 b8 02 00 00 00 8b 74 24 08 01 d8 8b 5c 24 04 83 c4 0c c3 90 8d 74 26 00 83 c3 01 eb aa 8d 76 00 b8 01 00 00 00 8b 74 24 08
01 d8 8b 5c 24 04 83 c4 0c c3 90 8d 74 26 00 b8 05 00 00 00 8b 74 24 08 01 d8 8b 5c 24 04 83 c4 0c c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00
一、审题
download附件,解压文件,审题。
1、题目
题目提供了二进制序列的十进制数列,共200个数。
使用机器学习的模型对这两百个数进行判断,是函数入口就打个1,
问题是现在学习出来的全是0,少了唯一的一个1。
2、h5文件
这个文件是Keras的model文件,使用load_model可以加载
二、环境
1、windows
2、x32dbg
3、anaconda3
4、keras(windows下pip安装keras主体没问题,但是可视化的部分有坑,弃之)
5、vscode
三、手工分析数据
题目已经说明,两百个样本点是“二进制代码”,但是数据看起来被加工过,数据在[1,256]之间
特别是多次出现神奇的256,这不在正常的0x00--0xff的区间,所以推测手工分析数据应该全减一
处理一下数据,输出成能复制粘贴的形式。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | strstr = " 50 193 132 251 16 16 149 193 142 5 198 9 1 1 1 196 50 193 132 251 16 16 149 193 \
142 5 198 10 1 1 1 196 145 142 181 39 1 1 1 1 132 237 13 138 93 37 5 50 220 \
138 117 37 9 138 199 16 183 5 31 233 25 256 256 256 132 249 21 119 12 233 127 59 \
256 256 142 183 1 1 1 1 256 37 134 169 146 16 9 145 185 4 1 1 1 140 117 37 9 \
2 217 140 93 37 5 132 197 13 196 145 142 117 39 1 185 3 1 1 1 140 117 37 9 2 \
217 140 93 37 5 132 197 13 196 145 142 117 39 1 132 196 2 236 171 142 119 1 185 2 \
1 1 1 140 117 37 9 2 217 140 93 37 5 132 197 13 196 145 142 117 39 1 185 6 1 \
1 1 140 117 37 9 2 217 140 93 37 5 132 197 13 196 142 183 1 1 1 1 142 189 40 1 \
1 1 1 "
strs = strstr.split( ' ' )
res = ""
for i in strs:
j = hex ( int (i) - 1 )
if len (j) = = 3 :
j = j[: 2 ] + '0' + j[ - 1 ]
if len (j) = = 5 :
j = "0x00"
res = res + j[ 2 :] + " "
print (res)
|
结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | strstr = " 50 193 132 251 16 16 149 193 142 5 198 9 1 1 1 196 50 193 132 251 16 16 149 193 \
142 5 198 10 1 1 1 196 145 142 181 39 1 1 1 1 132 237 13 138 93 37 5 50 220 \
138 117 37 9 138 199 16 183 5 31 233 25 256 256 256 132 249 21 119 12 233 127 59 \
256 256 142 183 1 1 1 1 256 37 134 169 146 16 9 145 185 4 1 1 1 140 117 37 9 \
2 217 140 93 37 5 132 197 13 196 145 142 117 39 1 185 3 1 1 1 140 117 37 9 2 \
217 140 93 37 5 132 197 13 196 145 142 117 39 1 132 196 2 236 171 142 119 1 185 2 \
1 1 1 140 117 37 9 2 217 140 93 37 5 132 197 13 196 145 142 117 39 1 185 6 1 \
1 1 140 117 37 9 2 217 140 93 37 5 132 197 13 196 142 183 1 1 1 1 142 189 40 1 \
1 1 1 "
strs = strstr.split( ' ' )
res = ""
for i in strs:
j = hex ( int (i) - 1 )
if len (j) = = 3 :
j = j[: 2 ] + '0' + j[ - 1 ]
if len (j) = = 5 :
j = "0x00"
res = res + j[ 2 :] + " "
print (res)
|
结果
31 c0 83 fa 0f 0f 94 c0 8d 04 c5 08 00 00 00 c3 31 c0 83 fa 0f 0f 94 c0 8d 04 c5 09 00 00 00 c3 90 8d b4 26 00 00 00 00 83 ec 0c 89 5c 24 04 31 db 89 74
24 08 89 c6 0f b6 04 1e e8 18 ff ff ff 83 f8 14 76 0b e8 7e 3a ff ff 8d b6 00 00 00 00 ff 24 85 a8 91 0f 08 90 b8 03 00 00 00 8b 74 24 08 01 d8 8b 5c 24
04 83 c4 0c c3 90 8d 74 26 00 b8 02 00 00 00 8b 74 24 08 01 d8 8b 5c 24 04 83 c4 0c c3 90 8d 74 26 00 83 c3 01 eb aa 8d 76 00 b8 01 00 00 00 8b 74 24 08
01 d8 8b 5c 24 04 83 c4 0c c3 90 8d 74 26 00 b8 05 00 00 00 8b 74 24 08 01 d8 8b 5c 24 04 83 c4 0c c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00
复制到x32dbg中查看
复制到x32dbg中查看
发现代码确实处于可阅读的状态了,而结果表中1的位置在40(从0算起),对应到sub esp,C的位置
确实符合题目函数入口的要求,
而对比其余ret片段,前面几段短的未涉及esp(0x83)的代码不算在题目要求之列,而后面几段只有add esp,未做平衡的代码也明显不是正常函数。
推测:题目所要求“函数入口”需局部变量的堆栈平衡操作,特征以0x83为主
四、模型分析
1 2 3 4 | from keras.models import load_model
import numpy as np
model = load_model( 'gcc_O2.h5' )
model.summary()
|
1 2 3 4 5 6 7 8 9 10 11 12 13 | _________________________________________________________________
Layer ( type ) Output Shape Param
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
embedding_1 (Embedding) ( None , 200 , 16 ) 4112
_________________________________________________________________
bidirectional_1 (Bidirection ( None , 200 , 16 ) 400
_________________________________________________________________
time_distributed_1 (TimeDist ( None , 200 , 2 ) 34
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Total params: 4 , 546
Trainable params: 4 , 546
Non - trainable params: 0
_________________________________________________________________
|
可以看到,模型分三层网络,最后输出200*2的结果,直接把题目样本给进去,看结果
1 2 3 4 | from keras.models import load_model
import numpy as np
model = load_model( 'gcc_O2.h5' )
model.summary()
|
1 2 3 4 5 6 7 8 9 10 11 12 13 | _________________________________________________________________
Layer ( type ) Output Shape Param
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
embedding_1 (Embedding) ( None , 200 , 16 ) 4112
_________________________________________________________________
bidirectional_1 (Bidirection ( None , 200 , 16 ) 400
_________________________________________________________________
time_distributed_1 (TimeDist ( None , 200 , 2 ) 34
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Total params: 4 , 546
Trainable params: 4 , 546
Non - trainable params: 0
_________________________________________________________________
|
可以看到,模型分三层网络,最后输出200*2的结果,直接把题目样本给进去,看结果
可以看到,模型分三层网络,最后输出200*2的结果,直接把题目样本给进去,看结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | input_str = " 50 193 132 251 16 16 149 193 142 5 198 9 1 1 1 196 50 193 132 251 16 16 149 193 \
142 5 198 10 1 1 1 196 145 142 181 39 1 1 1 1 132 237 13 138 93 37 5 50 220 \
138 117 37 9 138 199 16 183 5 31 233 25 256 256 256 132 249 21 119 12 233 127 59 \
256 256 142 183 1 1 1 1 256 37 134 169 146 16 9 145 185 4 1 1 1 140 117 37 9 \
2 217 140 93 37 5 132 197 13 196 145 142 117 39 1 185 3 1 1 1 140 117 37 9 2 \
217 140 93 37 5 132 197 13 196 145 142 117 39 1 132 196 2 236 171 142 119 1 185 2 \
1 1 1 140 117 37 9 2 217 140 93 37 5 132 197 13 196 145 142 117 39 1 185 6 1 \
1 1 140 117 37 9 2 217 140 93 37 5 132 197 13 196 142 183 1 1 1 1 142 189 40 1 \
1 1 1 "
input_strs = input_str.split( " " )
k = 0
for i in input_strs:
input_strs[k] = int (input_strs[k])
k = k + 1
input_n = np.array(input_strs).reshape( 1 , 200 )
print (input_n)
out = model.predict(input_n)
t = 0
print (out)
for i in out[ 0 ] :
if (i[ 0 ]< 0.999999 ):
print (t, hex (input_n[ 0 ][t] - 1 ),i[ 0 ],i[ 1 ])
t = t + 1
|
其中0.999999是我先大概看了一下输出之后选的阀值,因为大多数值都很接近1了
[注意]看雪招聘,专注安全领域的专业人才平台!