从12306网站新验证码看Web验证码设计与破解-茶余饭后-看雪-安全社区|安全招聘|kanxue.com

从12306网站新验证码看Web验证码设计与破解

发表于: 2015-3-22 15:05 2975

从12306网站新验证码看Web验证码设计与破解

永哥呀

2015-3-22 15:05

2975

2015-03-17 09:28 陈庆翔网络整理字号：T | T
本文转载自 51CTO
原文链接：http://mobile.51cto.com/hot-468559.htm

2015年3月16日，铁路官方购票网站12306又出新招，在登录界面推出了全新的验证方式，用户在填写好登录名和密码之后，还要准确的选取图片验证码才能登陆成功。据悉，12306验证码改版后，目前所有抢票工具都已经无法登录。

多么惨绝人寰的消息，小编相信各大互联网公司都在潜心钻研新的抢票助手，来破解全新的验证码模式。

下面小编带大家看看各种验证码的设计原理及其破解方法。

首先是纯文本式验证码，是比较原始的一种。

这种验证码并不符合验证码的定义，因为只有自动生成的问题才能用做验证码，这种文字验证码都是从题库里选择出来的，数量有限。破解方式也很简单，多刷新几次，建立题库和对应的答案，用正则从网页里抓取问题，寻找匹配的答案后破解。也有些用随机生成的数学公式，比如随机数 [+-*/]随机运算符随机数=?，小学生水平的程序员也可以搞定……

这种验证码也不是一无是处，对于很多见到表单就来一发的spam bot来说，实在没必要单独为了一个网站下那么大功夫。对于铁了心要在你的网站大量灌水的人，这种验证码和没有一样。

第二个是目前比较主流的图片验证码：

这类图片验证码的原理就是通过字符的粘连增加及其识别的难度，而上边这种一般用于不大的网站。

这类验证码处理方式：

图片预处理

怎么去掉背景干扰呢？可以注意到每个验证码数字或字母都是同一颜色，所以把验证码平均分成5份

计算每个区域的颜色分布，除了白色之外，颜色值最多的就是验证码的颜色，因此很容易将背景去掉

代码：

1.public static BufferedImage removeBackgroud(String picFile)
2.          throws Exception {
3.       BufferedImage img = ImageIO.read(new File(picFile));
4.       img = img.getSubimage(1, 1, img.getWidth() - 2, img.getHeight() - 2);
5.       int width = img.getWidth();
6.       int height = img.getHeight();
7.       double subWidth = (double) width / 5.0;
8.       for (int i = 0; i < 5; i++) {
9.          Map<Integer, Integer> map = new HashMap<Integer, Integer>();
10.          for (int x = (int) (1 + i * subWidth); x < (i + 1) * subWidth
11.                   && x < width - 1; ++x) {
12.             for (int y = 0; y < height; ++y) {
13.                   if (isWhite(img.getRGB(x, y)) == 1)
14.                      continue;
15.                   if (map.containsKey(img.getRGB(x, y))) {
16.                      map.put(img.getRGB(x, y), map.get(img.getRGB(x, y)) + 1);
17.                   } else {
18.                      map.put(img.getRGB(x, y), 1);
19.                   }
20.             }
21.          }
22.          int max = 0;
23.          int colorMax = 0;
24.          for (Integer color : map.keySet()) {
25.             if (max < map.get(color)) {
26.                   max = map.get(color);
27.                   colorMax = color;
28.             }
29.          }
30.          for (int x = (int) (1 + i * subWidth); x < (i + 1) * subWidth
31.                   && x < width - 1; ++x) {
32.             for (int y = 0; y < height; ++y) {
33.                   if (img.getRGB(x, y) != colorMax) {
34.                      img.setRGB(x, y, Color.WHITE.getRGB());
35.                   } else {
36.                      img.setRGB(x, y, Color.BLACK.getRGB());
37.                   }
38.             }
39.          }
40.       }
41.       return img;
得到与下图

接着是对图片进行纵向扫描进行切割。

再对每一部分横向扫描

然后进行训练

最后因为固定大小，识别跟验证码识别--1 里面一样，像素比较就可以了。

源码：

1.public class ImagePreProcess2 {
2.
3. private static Map<BufferedImage, String> trainMap = null;
4. private static int index = 0;
5.
6. public static int isBlack(int colorInt) {
7.       Color color = new Color(colorInt);
8.       if (color.getRed() + color.getGreen() + color.getBlue() <= 100) {
9.          return 1;
10.       }
11.       return 0;
12. }
13.
14. public static int isWhite(int colorInt) {
15.       Color color = new Color(colorInt);
16.       if (color.getRed() + color.getGreen() + color.getBlue() > 100) {
17.          return 1;
18.       }
19.       return 0;
20. }
21.
22. public static BufferedImage removeBackgroud(String picFile)
23.          throws Exception {
24.       BufferedImage img = ImageIO.read(new File(picFile));
25.       return img;
26. }
27.
28. public static BufferedImage removeBlank(BufferedImage img) throws Exception {
29.       int width = img.getWidth();
30.       int height = img.getHeight();
31.       int start = 0;
32.       int end = 0;
33.       Label1: for (int y = 0; y < height; ++y) {
34.          int count = 0;
35.          for (int x = 0; x < width; ++x) {
36.             if (isWhite(img.getRGB(x, y)) == 1) {
37.                   count++;
38.             }
39.             if (count >= 1) {
40.                   start = y;
41.                   break Label1;
42.             }
43.          }
44.       }
45.       Label2: for (int y = height - 1; y >= 0; --y) {
46.          int count = 0;
47.          for (int x = 0; x < width; ++x) {
48.             if (isWhite(img.getRGB(x, y)) == 1) {
49.                   count++;
50.             }
51.             if (count >= 1) {
52.                   end = y;
53.                   break Label2;
54.             }
55.          }
56.       }
57.       return img.getSubimage(0, start, width, end - start + 1);
58. }
59.
60. public static List<BufferedImage> splitImage(BufferedImage img)
61.          throws Exception {
62.       List<BufferedImage> subImgs = new ArrayList<BufferedImage>();
63.       int width = img.getWidth();
64.       int height = img.getHeight();
65.       List<Integer> weightlist = new ArrayList<Integer>();
66.       for (int x = 0; x < width; ++x) {
67.          int count = 0;
68.          for (int y = 0; y < height; ++y) {
69.             if (isWhite(img.getRGB(x, y)) == 1) {
70.                   count++;
71.             }
72.          }
73.          weightlist.add(count);
74.       }
75.       for (int i = 0; i < weightlist.size();) {
76.          int length = 0;
77.          while (weightlist.get(i++) > 1) {
78.             length++;
79.          }
80.          if (length > 12) {
81.             subImgs.add(removeBlank(img.getSubimage(i - length - 1, 0,
82.                      length / 2, height)));
83.             subImgs.add(removeBlank(img.getSubimage(i - length / 2 - 1, 0,
84.                      length / 2, height)));
85.          } else if (length > 3) {
86.             subImgs.add(removeBlank(img.getSubimage(i - length - 1, 0,
87.                      length, height)));
88.          }
89.       }
90.       return subImgs;
91. }
92.
93. public static Map<BufferedImage, String> loadTrainData() throws Exception {
94.       if (trainMap == null) {
95.          Map<BufferedImage, String> map = new HashMap<BufferedImage, String>();
96.          File dir = new File("train2");
97.          File[] files = dir.listFiles();
98.          for (File file : files) {
99.             map.put(ImageIO.read(file), file.getName().charAt(0) + "");
100.          }
101.          trainMap = map;
102.       }
103.       return trainMap;
104. }
105.
106. public static String getSingleCharOcr(BufferedImage img,
107.          Map<BufferedImage, String> map) {
108.       String result = "";
109.       int width = img.getWidth();
110.       int height = img.getHeight();
111.       int min = width * height;
112.       for (BufferedImage bi : map.keySet()) {
113.          int count = 0;
114.          int widthmin = width < bi.getWidth() ? width : bi.getWidth();
115.          int heightmin = height < bi.getHeight() ? height : bi.getHeight();
116.          Label1: for (int x = 0; x < widthmin; ++x) {
117.             for (int y = 0; y < heightmin; ++y) {
118.                   if (isWhite(img.getRGB(x, y)) != isWhite(bi.getRGB(x, y))) {
119.                      count++;
120.                      if (count >= min)
121.                         break Label1;
122.                   }
123.             }
124.          }
125.          if (count < min) {
126.             min = count;
127.             result = map.get(bi);
128.          }
129.       }
130.       return result;
131. }
132.
133. public static String getAllOcr(String file) throws Exception {
134.       BufferedImage img = removeBackgroud(file);
135.       List<BufferedImage> listImg = splitImage(img);
136.       Map<BufferedImage, String> map = loadTrainData();
137.       String result = "";
138.       for (BufferedImage bi : listImg) {
139.          result += getSingleCharOcr(bi, map);
140.       }
141.       ImageIO.write(img, "JPG", new File("result2//" + result + ".jpg"));
142.       return result;
143. }
144.
145. public static void downloadImage() {
146.       HttpClient httpClient = new HttpClient();
147.       GetMethod getMethod = null;
148.       for (int i = 0; i < 30; i++) {
149.          getMethod = new GetMethod("http://www.pkland.net/img.php?key="
150.                   + (2000 + i));
151.          try {
152.             // 执行getMethod
153.             int statusCode = httpClient.executeMethod(getMethod);
154.             if (statusCode != HttpStatus.SC_OK) {
155.                   System.err.println("Method failed: "
156.                         + getMethod.getStatusLine());
157.             }
158.             // 读取内容
159.             String picName = "img2//" + i + ".jpg";
160.             InputStream inputStream = getMethod.getResponseBodyAsStream();
161.             OutputStream outStream = new FileOutputStream(picName);
162.             IOUtils.copy(inputStream, outStream);
163.             outStream.close();
164.             System.out.println(i + "OK!");
165.          } catch (Exception e) {
166.             e.printStackTrace();
167.          } finally {
168.             // 释放连接
169.             getMethod.releaseConnection();
170.          }
171.       }
172. }
173.
174. public static void trainData() throws Exception {
175.       File dir = new File("temp");
176.       File[] files = dir.listFiles();
177.       for (File file : files) {
178.          BufferedImage img = removeBackgroud("temp//" + file.getName());
179.          List<BufferedImage> listImg = splitImage(img);
180.          if (listImg.size() == 4) {
181.             for (int j = 0; j < listImg.size(); ++j) {
182.                   ImageIO.write(listImg.get(j), "JPG", new File("train2//"
183.                         + file.getName().charAt(j) + "-" + (index++)
184.                         + ".jpg"));
185.             }
186.          }
187.       }
188. }
189.
190. /**
191.    * @param args
192.    * @throws Exception
193.    */
194. public static void main(String[] args) throws Exception {
195.       // downloadImage();
196.       for (int i = 0; i < 30; ++i) {
197.          String text = getAllOcr("img2//" + i + ".jpg");
198.          System.out.println(i + ".jpg = " + text);
199.       }
200. }
201.}
像BAT这种巨头的验证码通过干扰线、加粗不加粗混用、采用中文常用字（中文常用字大概有5000个，笔画繁复，形似字多，比起26个字母难度高很多）、不同的字体混用，比如楷体、宋体、幼圆混用、拼音，扭曲字体、需要准确识别13位汉字，大大增加了失败概率。

当然除了主流的图片验证码外，一些网站为了照顾视力不好的用户，采用语音验证码。一般这种验证码是机器生成一段读数字的语音。但是在这方面上很多程序员都偷懒了，预先找了10个数字的声音录音，然后生成的时候把他们随机拼到一起，结果就是这样：

设计原理如下：

整体效果

•字符数量一定范围内随机

•字体大小一定范围内随机

•波浪扭曲(角度方向一定范围内随机)

•防识别

•不要过度依赖防识别技术

•不要使用过多字符集-用户体验差

•防分割 •

重叠粘连比干扰线效果好

•备用计划

•同样强度完全不同的一套验证码

既然原理都已经知道了，那么如何破解就变得简单了。

但是问题来了，这次12306的验证码居然是图片，以上方式都不能使用，那么就不能破解了么？

有人认为12306的网站图片内存不会太大，完全可以扒下来，然后进行破解。当然这是纸上谈兵，有一种非常先进又非常原始的办法叫做“网络打码”或者“人肉打码”

一些技术大牛把验证码发送的自制的“打码”软件上，而一些“打码工”通过这个程序来输入机器自动注册，出来的验证码，传输到自动注册机器，完成验证。

目前来看这种简单粗暴的方法可以应对目前的情况。

结语：

12306这次可谓出了杀招，把所有抢票软件一刀砍死，黄牛们不开心我们就可以买到票了。既解决了黄牛问题又为广大程序员出了一道难题。

[注意]APP应用上架合规检测服务，协助应用顺利上架！

#资讯

收藏・1

免费・0

支持