scrapy模拟登录代码 - 行业资讯 - 肥雀云_南京肥雀信息技术有限公司

　　介绍

本文章向大家介绍scrapy模拟登录代码,主要包括{* *}的使用实例,应用技巧,基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

<强>,,,,,,(1)请求时携带饼干

<强>,,,,,,(2)发送帖子请求获取饼干

请求时携带饼干

对于一些饼干过期时间很长的不规范网站,如果我们能够在饼干过期之前爬取到所有我们想要的数据,可以考虑在请求时直接将饼干信息带上来模拟用户登录。

以下是模拟登录Github的示例代码:

#, - *安康;编码:utf-8 - * - 　　import scrapy 　　import 再保险　　, 　　class TmallLoginSpider (scrapy.Spider): 　　时间=name 才能;& # 39;github_login3& # 39; 　　allowed_domains 才能=,(& # 39;github.com& # 39;】　　时间=start_urls 才能;[& # 39;https://github.com/& # 39;] 　　, 　　def 才能start_requests(自我):,#,请求时携带饼干　　,,,cookies =, & # 39; _ga=GA1.2.363045452.1554860671;, tz=% 2 fshanghai亚洲;,_octo=GH1.1.1405577398.1554860677;, _device_id=ee3ff12512668a1f9dc6fb33e388ea20;, ignored_unsupported_browser_notice=false;, has_recent_activity=1;, user_session=5 oxrsfszcor1ijfcgrxxyeaxd8hcmzeugh70-xhwljqkt62q;, __Host-user_session_same_site=5 oxrsfszcor1ijfcgrxxyeaxd8hcmzeugh70-xhwljqkt62q;, logged_in=yes;, dotcom_user=pengjunlee;, _gat=1 & # 39; 　　,,,cookies =, {i.split (& # 39;=& # 39;) [0]:, i.split (& # 39;=& # 39;) [1], for 小姐:拷贝cookies.split (& # 39;;, & # 39;)} 　　,,,油品收率scrapy.Request (self.start_urls[0],,饼干=饼干) 　　,,,, 　　def 才能解析(自我,,反应):,#,验证是否请求成功　　,,,print (re.findall (& # 39; Learn Git 以及GitHub without any 代码! & # 39;,response.body.decode ()))

执行爬虫后,后台部分日志截图如下:

scrapy模拟登录代码

发送帖子请求模拟登录

scrapy还提供了两种通过发送帖子请求来获取饼干的方法。

scrapy.FormRequest ()

使用scrapy.FormRequest()发送帖子请求实现模拟登陆,需要人为找出登录请求的地址以及构造出登录时所需的请求数据。

使用scrapy.FormRequest()模拟登录Github的示例代码:,

#, - *安康;编码:utf-8 - * - 　　import scrapy 　　import 再保险　　, 　　class GithubLoginSpider (scrapy.Spider): 　　时间=name 才能;& # 39;github_login& # 39; 　　allowed_domains 才能=,(& # 39;github.com& # 39;】　　时间=start_urls 才能;[& # 39;https://github.com/login& # 39;] 　　, 　　def 才能解析(自我,,反应):,#,发送帖子请求获取饼干　　,,,authenticity_token =, response.xpath(& # 39;//输入[@ name=癮uthenticity_token"]/@ value # 39;) .extract_first () 　　,,,utf8 =, response.xpath(& # 39;//输入[@ name=皍tf8"]/@ value # 39;) .extract_first () 　　,,,commit =, response.xpath(& # 39;//输入[@ name=癱ommit"]/@ value # 39;) .extract_first () 　　,,,form_data =, { 　　,,,,,& # 39;登录# 39;:,& # 39;pengjunlee@163.com& # 39; 　　,,,,,& # 39;密码# 39;:,& # 39;123456 & # 39; 　　,,,,,& # 39;webauthn-support& # 39;:, & # 39;支持# 39; 　　,,,,,& # 39;authenticity_token& # 39;:, authenticity_token, 　　,,,,,& # 39;use utf8 # 39;:, use utf8, 　　,,,,,& # 39;提交# 39;:,提交} 　　,,,油品收率scrapy.FormRequest (“https://github.com/session",, formdata=https://www.yisu.com/zixun/form_data,回调=self.after_login) 　　　　def after_login(自我、响应):#验证是否请求成功　　(重新打印。findall('学习Git和GitHub没有任何代码!’,response.body.decode ()))

从后台日志不难看出,Scrapy在请求完https://github.com/session后,自动帮我们重定向到了Github首页。

Scrapy模拟登录代码

scrapy.FormRequest.from_response ()

scrapy.FormRequest.from_response()使用起来比scrapy.FormRequest()更加简单方便,我们通常只需要提供用户相关信息(账户和密码)即可,scrapy.FormRequest.from_response()将通过模拟点击为我们填充好其他的表单字段并提交表单。

使用scrapy.FormRequest.from_response()模拟登录Github的示例代码:,

#, - *安康;编码:,utf-8 - * - 　　import scrapy 　　import 再保险　　, 　　class GithubLogin2Spider (scrapy.Spider): 　　时间=name 才能;& # 39;github_login2& # 39; 　　allowed_domains 才能=,(& # 39;github.com& # 39;】　　时间=start_urls 才能;[& # 39;https://github.com/login& # 39;] 　　, 　　def 才能解析(自我,,反应):,#,发送帖子请求获取饼干　　,,,form_data =, { 　　,,,,,& # 39;登录# 39;:,& # 39;pengjunlee@163.com& # 39; 　　,,,,,& # 39;密码# 39;:,& # 39;123456 & # 39; 　　,,,} 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null