python爬取马蜂窝景点翻页文字评论的实现 - 行业资讯 - 肥雀云

使用Chrome, python3.7,请求库和VSCode进行爬取马蜂窝黄鹤楼的文字评论(http://www.mafengwo.cn/poi/5426285.html)。
　　

首先,我们复制一段评论,查看网页源代码,按Ctrl + F查找,发现没有找到评论,说明评论内容不在http://www.mafengwo.cn/poi/5426285.html页面。

　　　　　　进口再保险　　导入的时间　　进口的要求　　#评论内容所在的url, ?后面是得到请求需要的参数内容　　comment_url=' http://pagelet.mafengwo.cn/poi/pagelet/poiCommentListApi& # 63; 　　　　requests_headers={ 　　“推荐人”:“http://www.mafengwo.cn/poi/5426285.html”, 　　“用户代理”:“Mozilla/5.0 (Windows NT 10.0;Win64;AppleWebKit x64)/537.36 (KHTML,像壁虎)Chrome/79.0.3945.88 Safari/537.36” 　　}#请求头　　　　num的范围(1,6): 　　requests_data=https://www.yisu.com/zixun/{“参数”:“{”poi_id”:“5426285”,“页面”:“% d”、“just_comment”: 1}“% (num) #经过测试只需要用参数参数就能爬取内容　　} 　　响应=requests.get (url=comment_url头=requests_headers params=requests_data) 　　如果200==response.status_code: 　　页面=response.content.decode (“unicode-escape”、“忽略”)。编码(“utf - 8”、“忽略”).decode (utf - 8) #爬取页面并且解码　　页面=Ｈ〈?' \ \/','/')#将\/转换成/#日期列表　　date_pattern=r & lt; class=眀tn-comment _j_comment“title=疤砑悠缆邸痹谄缆? lt;/a> * & # 63;。\ n . * & # 63; & lt;跨类=笆奔洹弊４?* & # 63;)& lt;/span>” 　　date_list=re.compile (date_pattern) .findall(页面) 　　#星级列表　　star_pattern=r & lt;跨类=蹦甏餍堑拿餍?\ d)“祝辞& lt;/span>” 　　star_list=re.compile (star_pattern) .findall(页面) 　　#评论列表　　comment_pattern=r & lt; p class=眗ev-txt”祝辞([\ s \ s] * & # 63;) & lt;/p>” 　　comment_list=re.compile (comment_pattern) .findall(页面) 　　num的范围(0,len (date_list)): 　　#日期　　日期=date_list (num) 　　#星级评分　　明星=star_list (num) 　　#评论内容,处理一些标签和符号　　评论=comment_list (num) 　　评论=str(评论).replace(“和,”,”) 　　评论=comment.replace (' & lt; br> ',”) 　　评论=comment.replace (& lt; br/祝辞,”) 　　打印(日期+“t \”+明星+ \ t +评论) 　　其他: 　　打印(“爬取失败”) 　　　　

结果　　

python爬取马蜂窝景点翻页文字评论的实现

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持。