小白爬蟲scrapy第四篇

cnio 發(fā)布于2019-07-30 15:15 / 3100人閱讀

摘要：沒有做具體數(shù)據(jù)處理了直接把他們保存為數(shù)據(jù)了很長很長一段眼花下一篇是如何去保存在數(shù)據(jù)庫中

在上篇中沒有說到啟動如何去啟動,scrapy是使用cmd命令行去啟動的
咱們用scrapy的cmdline去啟動
命名point.py

# 導(dǎo)入cmdline 中的execute用來執(zhí)行cmd命令
from scrapy.cmdline import execute
# 執(zhí)行cmd命令參數(shù)為[ scrapy, 爬蟲, 爬蟲名稱]
execute(["scrapy", "crawl", "AiquerSpider"])

這個文件放在項目根目錄下
如圖:

如果各位同學(xué)按照我的前面幾篇的步驟寫完的話可以用這個去測試一下(把部分代碼注釋去了),你會發(fā)現(xiàn)有好多神秘的藍色鏈接,哇啊啊啊啊!!!!!我的右手在燃燒!!!!!!!

先在咱們?nèi)ケ４鏀?shù)據(jù)吧!我這幾天寫項目需求寫到崩潰就不去做具體數(shù)據(jù)處理了,直接貼代碼

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don"t forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
import json

class AiquerPipeline(object):
    def __init__(self):
        # 打開文件
        self.file = open("data.json", "w", encoding="utf-8")

    def process_item(self, item, spider):
        # 讀取item中的數(shù)據(jù)
        line = json.dumps(dict(item), ensure_ascii=False) + "
"
        # 寫入文件
        self.file.write(line)
        # 返回item
        return item

        # 該方法在spider被開啟時被調(diào)用。
        def open_spider(self, spider):

            pass

        # 該方法在spider被關(guān)閉時被調(diào)用。
        def close_spider(self, spider):

            pass

在運行這個東西之前是要注冊的,回到settings.py里面找到Configure item pipelines,將下面的注釋去掉就行了,咱們沒有具體需求所以不用改優(yōu)先級別

# Configure item pipelines
# See http://scrapy.readthedocs.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
    "AiQuer.pipelines.AiquerPipeline": 300,
}

AiQuer.pipelines.AiquerPipeline是為你要注冊的類，右側(cè)的’300’為該Pipeline的優(yōu)先級，范圍1～1000，越小越先執(zhí)行。
沒有做具體數(shù)據(jù)處理了,直接把他們保存為json數(shù)據(jù)了,很長很長一段眼花
下一篇是如何去保存在數(shù)據(jù)庫中