怎么使用Python爬虫库BeautifulSoup遍历文档树并对标签进行操作 - 行业资讯

　　介绍

这篇文章主要介绍怎么使用Python爬虫库BeautifulSoup遍历文档树并对标签进行操作,文中介绍的非常详细,具有一定的参考价值,感兴趣的小伙伴们一定要看完!

Python是什么意思

Python是一种跨平台的,具有解释性,编译性,互动性和面向对象的脚本语言,其最初的设计是用于编写自动化脚本,随着版本的不断更新和新功能的添加,常用于用于开发独立的项目和大型项目。

使用Python爬虫库BeautifulSoup对文档树进行遍历并对标签进行操作的实例

html_doc =,“““ 　　& lt; html> & lt; head> & lt; title>,睡鼠# 39;s story & lt;/head> 　　　　& lt; p 类=皌itle"祝辞& lt; b>,睡鼠# 39;s story & lt;/p> 　　　　& lt; p 类=皊tory"祝辞Once upon a time there were three little 姐妹;以及their names 　　https://www.yisu.com/zixun/& lt; a href=" http://example.com/elsie " rel=巴獠縩ofollow”rel=巴獠縩ofollow”rel=巴獠縩ofollow”rel=巴獠縩ofollow”class="妹妹" id=" link1 "> Elsie , 　　莱斯　　蒂莉; 　　他们住在井底。

…

　　”“” 　　　　从bs4进口BeautifulSoup 　　汤=BeautifulSoup (html_doc, lxml)

一、子节点

一个标签可能包含多个字符串或者其他标签,这些都是这个标签的子节点.BeautifulSoup提供了许多操作和遍历子结点的属性。

1。通过标签的名字来获得标记

打印(soup.head) 　　打印(soup.title) & lt; head> & lt; title>,睡鼠# 39;s story & lt;/head> 　　& lt; title>,睡鼠# 39;s story

通过名字的方法只能获得第一个标签,如果要获得所有的某种标签可以使用find_all方法

soup.find_all(& # 39;一个# 39;) [& lt; a 类=皊ister", https://www.yisu.com/zixun/href=" http://example.com/elsie " rel=巴獠縩ofollow”rel=巴獠縩ofollow”rel=巴獠縩ofollow”rel=巴獠縩ofollow”id=" link1 "> Elsie , 　　莱斯, 　　蒂莉

2。内容属性:将标记的子节点通过列表的方式返回

head_tag =soup.head 　　head_tag。内容 [& lt; title>,睡鼠# 39;s story] title_tag =, head_tag.contents [0] 　　title_tag & lt; title>,睡鼠# 39;s story title_tag。内容 (“,睡鼠# 39;s story"]

3。孩子们:通过该属性对子节点进行循环

for child 拷贝title_tag.children: 　　打印才能(孩子) ,睡鼠# 39;s 故事