mitmproxy 注入js代码解决 chrome chromium 被 window.navigator.webdriver 反爬识别

作者: 鲁智深 分类: python 发布时间: 2020-01-21 16:37

有界面模式下

其实用 Selenium Webdriver chromedriver 做爬虫也不是无敌的,同样会被反爬。

在运行模拟浏览器中输入以下js代码

1
window.navigator.webdriver

发现是 true

window.navigator.webdriver

window.navigator.webdriver

正常打开的浏览器返回是 undefined

undefined

undefined

熟悉 js 的程序员就很容易判断是否正常打开的浏览器

解决方案:

在启动Chromedriver之前,为Chrome开启开发者模式:

1
option.add_experimental_option('excludeSwitches', ['enable-automation'])

再次在开发者工具的Console选项卡中查询window.navigator.webdriver,可以发现这个值已经自动变成 undefined 了。

undefined

undefined

这里要注意:开启了开发者模式后一定要测试值是否已经自动变成 undefined ,Chrome 79版本会出现开发者模式 无法返回 true的情况

———ChromeDriver 79.0.3945.36 (2019-11-18)———
Supports Chrome version 79
Resolved issue 2117: Chromedriver locks when an alert()(js) is raised while taking a screenshot [Pri-2]
Resolved issue 2435: Chrome driver reports platform and platformName as XP on Win10 machine [Pri-2]
Resolved issue 2487: “Element is not clickable” when using headless [Pri-]
Resolved issue 3005: WPT test in element_clear “test_not_editable_inputs[hidden]” does not pass [Pri-3]
Resolved issue 3073: Alerts coming from backend response cause ChromeDriver in W3C mode disconnect from browser – unable to interact with Chrome anymore – java.net.ConnectException: Failed to connect to localhost/0:0:0:0:0:0:0:1:38699 [Pri-2]
Resolved issue 3133: window.navigator.webdriver is undefined when “enable-automation” is excluded in non-headless mode (should be true) [Pri-2]
Resolved issue 3148: ChromeDriver always ignores certificate errors [Pri-2]
Resolved issue 3205: Chrome driver 78 moveToElement action sometimes moves to wrong y coordinate [Pri-1]

无界面模式下

但是在无界面模式下,这么配置是无效的,我这里用到的是 mitmproxy 进行js代码注入

安装 mitmproxy

https://github.com/luzhisheng/mklearn/tree/master/spider/crawler_learn

modify_response.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import mitmproxy.http

t0 = 'Object.defineProperties(navigator,{webdriver:{get:() => false}});'

class Tb(object):
    def response(slef,flow: mitmproxy.http.HTTPFlow):
        if '.js' in flow.request.url or 'um.js' in flow.request.url:
                flow.response.text = t0 + flow.response.text
                print('注入成功')


addons = [
    Tb()
]

启动 mitmdump

1
mitmdump -p 7777 -s modify_response.py

配置代理 selenium

1
self.chromeOptions.add_argument("--proxy-server=http://127.0.0.1:7777")

运行脚本 test.py

相关文章:

记录一次爬虫抓取淘宝直播数据全过程

如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!

发表评论

电子邮件地址不会被公开。 必填项已用*标注