使用python怎么抓取网页内容并进行语音播报 - 行业资讯 - 肥雀云

　　介绍

使用python怎么抓取网页内容并进行语音播报?针对这个问题,这篇文章详细介绍了相对应的分析和解答,希望可以帮助更多想解决这个问题的小伙伴找到更简单易行的方法。

<强>先放抓取模块BDWM。py的代码:

#, - *安康;编码:utf-8 - * - 　　import urllib2 　　import HTMLParser 　　, 　　class MyParser (HTMLParser.HTMLParser): 　　自我,def __init__ (): 　　,HTMLParser.HTMLParser.__init__(自我),=,self.nowtag & # 39; & # 39; 　　self.count =, 0 　　,self.flag =False 　　,self.isLink =False 　　self.count2 =, 0=,self.dict , {}=,self.temp & # 39; & # 39; 　　,def handle_starttag(自我,,标签,attrs):==,if tag & # 39;跨度# 39;: 　　for 键,才能,value 拷贝attrs: 　　if 才能;key ==, & # 39;类# 39;,以及(& # 39;Rank1AmongHisBoard& # 39;,拷贝值): 　　,,self.count +=1 　　,,if self.count & lt;, 11: 　　,,self.flag =,真的==,if tag & # 39;一个# 39;: 　　self.isLink 才能=,真的　　,其他的: 　　self.isLink 才能=False 　　,def handle_data(自我,,数据): 　　,if self.flag 以及self.isLink: 　　self.count2 才能+=1 　　if 才能;self.count2 ==, 1: 　　时间=self.temp 才能;数据　　if 才能;self.count2 ==, 3: 　　self.flag 才能=False 　　self.count2 才能=0 　　self.dict才能[self.temp],=, data 　　, 　　时间=res urllib2.urlopen (& # 39; https://www.bdwm.net/bbs/main0.php& # 39;) 　　时间=my MyParser () 　　my.feed (res.read () .decode (“gbk")) 　　时间=result & # 39; & # 39; 　　时间=str “,版,“ 　　时间=str str.decode (& # 39; use utf8 # 39;) 　　for 小姐:my.dict拷贝: 　　+=,result 小姐:+,str +, my.dict[我],+,& # 39;\ n # 39; 　　print 结果

<强> F5运行,抓取结果如下:

祝辞祝辞祝辞=======================重启=======================
在祝辞祝辞
化学与分子工程学院版不喜欢做实验怎么办
三角地版烈士旅正在对对研究生会实施最高军事占领的
十六周年站庆版★★毕业季|未名BBS历年纪念品特卖会★★
遗迹保卫版母校两日游,想借个饭卡
别问我是谁版遇到性骚扰,打电话跟男朋友倾诉……
美食天地版请问北大附近哪里有好吃的饺子
男孩子版被戴绿帽,万念俱灰!
鹊桥版医生mm征GG(#征男友#代征)
谈情说爱版#感觉身边都是嘴上急着脱光但心里不急的人#
北京大学研究生会版农园一层和自称“常代会”的占座女吵起来了(转载)(转载)

可以看到我们成功抓取到了未名论坛十大的版面信息与标题。

<强>下面放语音播报模块,也是整个程序的入口:

#, - *安康;编码:utf-8 - * - 　　& # 39;& # 39;& # 39; 　　Author ,: Peizhong 居　　Latest Update : 2016/4/21 　　时间:Function Use Baidu Voice API 用说话　　& # 39;& # 39;& # 39; 　　import urllib urllib2 　　import json 　　import ConfigParser 　　import BDWM 　　, 　　时间=config ConfigParser.ConfigParser () 　　config.readfp(打开(& # 39;config.ini& # 39;)) 　　时间=TOKEN config.get(& # 39;百度# 39;,,& # 39;令牌# 39;) 　　时间=local config.get (& # 39; dir # 39;,, & # 39; mp3 # 39;) 　　时间=words & # 39; & # 39; 　　, 　　def GetVoice ():=,,text urllib.quote(单词)=,url & # 39; http://tsn.baidu.com/text2audio?tex=& # 39;, +, text +, & # 39;, cuid=b888e32e868c&局域网=zh& ctp=1,托托=& # 39;+,令牌=,,rep urllib.urlretrieve (url,本地) 　　,CheckError () 　　, 　　def GetAccessToken ():=,,client_id config.get(& # 39;百度# 39;,,& # 39;client_id # 39;)=,,client_secret config.get(& # 39;百度# 39;,,& # 39;client_secret # 39;)=,,rep urllib2.urlopen (& # 39; https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials& client_id=& # 39; + client_id + & # 39;, client_secret=& # 39; + client_secret)=,,hjson json.loads (rep.read ()) 　　,return hjson [& # 39; access_token& # 39;】　　, 　　def CheckError (): 　　global 令牌=,file_object 开放(本地) 　　,试一试: 　　时间=all_the_text 才能;file_object.read () 　　if 才能;(all_the_text [0],==, & # 39; {& # 39;): 　　时间=hjson 才能;json.loads (all_the_text) 　　# print 才能hjson [& # 39; err_no& # 39;】　　if 才能;(hjson [& # 39; err_no& # 39;],==, 502): 　　,,print & # 39; Getting new access 令牌……& # 39; 　　,,TOKEN =, GetAccessToken () 　　,,config.set(& # 39;百度# 39;,,& # 39;标记# 39;,,令牌) 　　,,config.write(开放(& # 39;config.ini& # 39;,,“r +“)) 　　,,GetVoice () 　　其他的才能: 　　,,print all_the_text 　　其他的才能: 　　print 才能;& # 39;(成功),& # 39;+,单词　　,最后: 　　file_object.close才能() 　　, 　　试一试:=,,words BDWM.result.encode (& # 39; use utf8 # 39;) 　　,GetVoice () 　　,# use other software 用play 它　　except Exception as e: 　　null 　　null 　　null 　　null 　　null 　　null 　　null 　　null