Python爬虫中搜索文档树的方法 - 行业资讯 - 肥雀云

　　介绍

这篇文章将为大家详细讲解有Python爬关虫中搜索文档树的方法,小编觉得挺实用的,因此分享给大家做个参考,希望大家阅读完这篇文章后可以有所收获。

<强>搜索文档树

1。find_all(名称、attrs递归、文本、* * kwargs)

<强> 1)名字参数

名称参数可以查找所有名字为名称的标签,字符串对象会被自动忽略掉。

。传字符串

最简单的过滤器就是字符串,在搜索方法中传入一个字符串参数,美丽的汤会查找与字符串完整匹配所有的内容,返回一个列表。

# !/usr/bin/python3 　　#,- *安康;编码:utf-8 - * 　　, 　　得到bs4 import BeautifulSoup 　　, 　　时间=html “““ 　　& lt; html> & lt; head> & lt; title>,睡鼠# 39;s story & lt;/head> 　　& lt; body> 　　& lt; p 类=皌itle", name=癲romouse"祝辞& lt; b>,睡鼠# 39;s story & lt;/p> 　　& lt; p 类=皊tory"祝辞Once upon a time there were three little 姐妹;以及their names 　　& lt; a https://www.yisu.com/zixun/href=" http://example.com/elsie " class="妹妹" id=" link1 "> , 　　莱斯　　蒂莉; 　　他们住在井底。

…

　　”“” 　　　　#创建漂亮的汤对象,指定lxml解析器　　汤=BeautifulSoup (html、“lxml”) 　　　　print (soup.find_all (" b ")) 　　print (soup.find_all (a))

运行结果

[& lt; b>,睡鼠# 39;s story】　　[& lt; a 类=皊ister", https://www.yisu.com/zixun/href=" http://example.com/elsie " id=" link1 "> ,<类=懊妹谩? 　　href=" http://example.com/lacie " id=" link2 ">莱斯、蒂莉

B。传正则表达式

如果传入正则表达式作为参数,美丽的汤会通过正则表达式匹配()来匹配内容。

# !/usr/bin/python3 　　#,- *安康;编码:utf-8 - * 　　, 　　得到bs4 import BeautifulSoup 　　import 再保险　　, 　　时间=html “““ 　　& lt; html> & lt; head> & lt; title>,睡鼠# 39;s story & lt;/head> 　　& lt; body> 　　& lt; p 类=皌itle", name=癲romouse"祝辞& lt; b>,睡鼠# 39;s story & lt;/p> 　　& lt; p 类=皊tory"祝辞Once upon a time there were three little 姐妹;以及their names 　　& lt; a https://www.yisu.com/zixun/href=" http://example.com/elsie " class="妹妹" id=" link1 "> , 　　莱斯　　蒂莉; 　　他们住在井底。

…

　　”“” 　　　　#创建漂亮的汤对象,指定lxml解析器　　汤=BeautifulSoup (html、“lxml”) 　　　　在soup.find_all标签(re.compile (b“^”)): 　　打印(tag.name)

运行结果

的身体　　b

C。传列表