今天看<python数据采集>一书,跟着敲了点代码,代码如下:

from urllib.request import urlopen
from bs4 import BeautifulSoup


html = urlopen("http://www.pythonscraping.com/pages/page1.html")
bsObj = BeautifulSoup(html.read())
print(bsObj.h1)

这个书中没提,但是会报警告

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/jiangxiaohan/Desktop/PythonDemo/beautifulSoupDemo/beautifulSoupTest1.py
/Users/jiangxiaohan/Library/Python/3.6/lib/python/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 6 of the file /Users/jiangxiaohan/Desktop/PythonDemo/beautifulSoupDemo/beautifulSoupTest1.py. To get rid of this warning, change code that looks like this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

  markup_type=markup_type))
<h1>An Interesting Title</h1>

Process finished with exit code 0

这个就是说你没有指定beautifulsoup的解析器,所以作者默认使用html.parser来解析,一般没什么问题,但是如果运行在其它系统或环境它可能会使用不同的解析器(可能会导致不同的结果)。如果想消除这个警告信息你可以这样写

BeautifulSoup(html.read(), "html.parser")

这样就好了

作者:蒋昉霖
链接:https://www.jianshu.com/p/e09403f4cd6a
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

发表评论

邮箱地址不会被公开。 必填项已用*标注