爬蟲小demo

pf_miles 發(fā)布于2019-07-31 10:25 / 807人閱讀

摘要：爬取的數(shù)據(jù)存入表格分析要爬取的內(nèi)容的網(wǎng)頁(yè)結(jié)構(gòu)是庫(kù)寫入表所用讀取表所用通過(guò)解析文檔為用戶提供需要抓取的數(shù)據(jù)改變標(biāo)準(zhǔn)輸出的默認(rèn)編碼我們開始利用來(lái)獲取網(wǎng)頁(yè)并利用解析網(wǎng)頁(yè)返回的是狀態(tài)碼，加上以字節(jié)形式二進(jìn)制返回?cái)?shù)據(jù)。

爬取的數(shù)據(jù)存入Excel表格

分析要爬取的內(nèi)容的網(wǎng)頁(yè)結(jié)構(gòu)：

demo.py:

import requests    #requests是HTTP庫(kù)
import re
from openpyxl import workbook  # 寫入Excel表所用
from openpyxl import load_workbook  # 讀取Excel表所用
from bs4 import BeautifulSoup as bs   #bs:通過(guò)解析文檔為用戶提供需要抓取的數(shù)據(jù)
import os
import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding="utf8") #改變標(biāo)準(zhǔn)輸出的默認(rèn)編碼


#我們開始利用requests.get（）來(lái)獲取網(wǎng)頁(yè)并利用bs4解析網(wǎng)頁(yè)：
def getData(src):

    html = requests.get(src).content    # requests.get(src)返回的是狀態(tài)碼，加上.content以字節(jié)形式（二進(jìn)制返回?cái)?shù)據(jù)。   和前端一樣，分為get post等  http://www.cnblogs.com/ranxf/p/7808537.html
    soup = bs(html,"lxml")   # lxml解析器解析字節(jié)形式的數(shù)據(jù)，得到完整的類似頁(yè)面的html代碼結(jié)構(gòu)的數(shù)據(jù)
    print(soup)

    global ws
    Name = []
    Introductions = []
    introductions = soup.find_all("a",class_="book-item-name")
    nameList = soup.find_all("a",class_="author")
    print (nameList)
    for name in nameList:
        print (name.text)
        Name.append(name.text)
    for introduction in introductions:
        Introductions.append(introduction.text)
    for i in range(len(Name)):
        ws.append([Name[i],Introductions[i]])

if __name__ == "__main__":
    #   讀取存在的Excel表測(cè)試
    #     wb = load_workbook("t est.xlsx") #加載存在的Excel表
    #     a_sheet = wb.get_sheet_by_name("Sheet1") #根據(jù)表名獲取表對(duì)象
    #     for row in a_sheet.rows: #遍歷輸出行數(shù)據(jù)
    #         for cell in row: #每行的 每一個(gè)單元格
    #             print cell.value,

    #  創(chuàng)建Excel表并寫入數(shù)據(jù)
    wb = workbook.Workbook()  # 創(chuàng)建Excel對(duì)象
    ws = wb.active  # 獲取當(dāng)前正在操作的表對(duì)象
    # 往表中寫入標(biāo)題行,以列表形式寫入！
    ws.append(["角色名字", "票數(shù)"])
    src = "http://www.lrts.me/book/category/3058"
    getData(src)
    wb.save("qinshi.xlsx")  # 存入所有信息后，保存為filename.xlsx

執(zhí)行：python demo.py

效果生成一個(gè)qinshi.xlsx文件

云服務(wù)器 GPU云服務(wù)器微信小程序Demo ios 騰訊云小直播demo 爬蟲小工具 demos

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://specialneedsforspecialkids.com/yun/43809.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

pf_miles

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

Transition 實(shí)現(xiàn)輪播圖

閱讀 2436·2019-08-30 15:52
pt，px，rem和em之間區(qū)別總結(jié)

閱讀 2237·2019-08-30 12:51
一批面試記錄

閱讀 2833·2019-08-29 18:41
移動(dòng)端下彈框禁止背景滑動(dòng)

閱讀 2812·2019-08-29 17:04
es6讓操作unicode字符集更加簡(jiǎn)單

閱讀 811·2019-08-29 15:11
CSS Grid布局：合并單元格布局

閱讀 1720·2019-08-28 18:02
如何結(jié)合npm做一個(gè)前端腳手架

閱讀 3603·2019-08-26 10:22
Object 對(duì)象的相關(guān)方法

閱讀 2510·2019-08-26 10:12

国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

爬蟲小demo

相關(guān)文章

node：爬蟲爬取網(wǎng)頁(yè)圖片

SegmentFault 技術(shù)周刊 Vol.30 - 學(xué)習(xí) Python 來(lái)做一些神奇好玩的事情吧

后端知識(shí)拓展 - 收藏集 - 掘金

后端知識(shí)拓展 - 收藏集 - 掘金

發(fā)表評(píng)論

0條評(píng)論

pf_miles

男|高級(jí)講師

TA的文章

Transition 實(shí)現(xiàn)輪播圖

pt，px，rem和em之間區(qū)別總結(jié)

一批面試記錄

移動(dòng)端下彈框禁止背景滑動(dòng)

es6讓操作unicode字符集更加簡(jiǎn)單

CSS Grid布局：合并單元格布局

如何結(jié)合npm做一個(gè)前端腳手架

Object 對(duì)象的相關(guān)方法

最新活動(dòng)