本文目录

🚁前言

在过程中，定制开发小程序大多我们都会碰到验证码识别，定制开发小程序它是常用的一种反爬手段，包括：定制开发小程序滑块验证码，图片验证码，算术验证码，点击验证码，所讲的图片验证码是较简单的，因为有大佬，给我们造好了轮子，我们直接套用就行！

🚁测试

🚁对比Pytesseract

这是题外的，为什么要做对比呢，有对比才能知道他的优缺点。

安装pytesseract

pip install pytesseract1

准备

def get_captcha():    image = Image.open('VerifyCode.png')    image = image.convert('L')  # 灰度处理    threshold = 220 # 阈值,二值化处理    table = []    for i in range(256):        if i < threshold:            table.append(0)        else:            table.append(1)    image = image.point(table, '1')    image.show()    ans = pytesseract.image_to_string(image)    print(ans)get_captcha()1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

结果
这是处理过的图片

发现差别还是有点大的，要是没有训练模型，生产使用确实不太行！

🚁使用ddddocr

🚁简介

硬性要求

python >= 3.8

安装

pip install ddddocr1

测试，还是刚刚那种图。

import ddddocrdef recognize():    ocr = ddddocr.DdddOcr()    with open('code_img/VerifyCode.png', 'rb') as f:        img_bytes = f.read()    res = ocr.classification(img_bytes)    print(res)recognize()1
2
3
4
5
6
7
8
9
10

结果，一眼可见，没有对比就没有伤害

短短5行代码，就饶过了图片验证，是不是觉得很爽！

🚁实战

利用Amazon的机器人验证，帮助我们绕过反爬，获取我们所要的数据

from selenium import webdriverfrom selenium.webdriver import ChromeOptionsfrom io import BytesIOimport timefrom ocr_code import recognizefrom PIL import Imageoptions = ChromeOptions()options.add_experimental_option('excludeSwitches', ['enable-automation'])options.add_argument("disable-blink-features=AutomationControlled")options.add_argument(    'User-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36')url = 'https://www.amazon.com/errors/validateCaptcha'browser = webdriver.Chrome('chromedriver.exe', options=options)def getCookie():    browser.set_window_size(1920, 1080)    browser.get(url)    time.sleep(1)    '''    /处理验证码    '''    # 要截图的元素     try:         element = browser.find_element_by_xpath('//div[@class="a-row a-text-center"]')         # 坐标         x, y = element.location.values()         # 宽高         h, w = element.size.values()         # 把截图以二进制形式的数据返回         image_data = browser.get_screenshot_as_png()         # 以新图片打开返回的数据         screenshot = Image.open(BytesIO(image_data))         # 对截图进行裁剪         result = screenshot.crop((x, y, x + w, y + h))         # 显示图片         # result.show()         # 保存验证码图片         result.save('VerifyCode.png')         # 调用recognize方法识别验证码         code = recognize('VerifyCode.png')         print(code)         # 输入验证码         browser.find_element_by_name('field-keywords').send_keys(code)         # 点击确认         browser.find_element_by_class_name('a-button-text').click()         time.sleep(1)     except:         breakif __name__ == '__main__':    getCookie()1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

ocr_code.py

import ddddocrdef recognize(image):    ocr = ddddocr.DdddOcr()    with open(image, 'rb') as f:        img_bytes = f.read()    res = ocr.classification(img_bytes)    return res1
2
3
4
5
6
7
8
9
10

🚁成果

截取的验证码

打印

点关注不迷路，本文章若对你有帮助，烦请三连支持一下 ❤️❤️❤️
各位的支持和认可就是我最大的动力❤️❤️❤️

定制开发小程序小白都能轻松掌握，python最稳定的图片识别库ddddocr

本文目录

🚁前言

🚁测试

🚁对比Pytesseract

🚁使用ddddocr

🚁简介

🚁实战

🚁成果