PHP/Python/前端/Linux 等等学习笔记

BeautifulSoup-解析html

最后更新于：2022-04-02 02:16:43

[TOC] ## BeautifulSoup-解析html >文档地址 [中文文档](https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html) code ``` html_doc = """ The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

...

""" soup = bs4.BeautifulSoup(html_doc, "html.parser") print(soup.a) #打印Elsie print(soup.a.string) #打印a标签的内容 print(soup.a['href']) #打印a标签的href属性的值 print(soup.find(id='link3')) print(soup.find('a',class_='sister')) #Python的class 有关键字所以加'_' print(soup.find_all('a',class_='sister')) print(soup.find('p',{'class','story'}).get_text()) print(soup.find_all("a",href=re.compile(r'^http://example.com'))) print(soup.find_all("input",type=re.compile('text'))) ```