一个简单的爬虫小程序,可以抓取bing输入关键字后第一个页面的标题、链接。

[python]

import re,urllib.parse,urllib.request,urllib.error
from bs4 import BeautifulSoup as BS

baseUrl = 'http://cn.bing.com/search?'
word = '鹿晗 吴亦凡 张艺兴'
print(word)
word = word.encode(encoding='utf-8', errors='strict')
#print(word)

data = {'q':word}
data = urllib.parse.urlencode(data)
#print(data)
url = baseUrl+data
print(url)

try:
html = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
print(e.code)
except urllib.error.URLError as e:
print(e.reason)

soup = BS(html,"html.parser")
td = soup.findAll("h2")
count = soup.findAll(class_="sb_count")
for c in count:
print(c.get_text())

for t in td:
print(t.get_text())
pattern = re.compile(r'href="([^"]*)"')
h = re.search(pattern,str(t))
if h:
for x in h.groups():
print(x)

[/python]

————————————————
版权声明:本文为CSDN博主「机器喵喵喵喵」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/zhaohe1995/java/article/details/52564474

发表评论

邮箱地址不会被公开。 必填项已用*标注