摘要:開(kāi)源安裝通過(guò)通過(guò)使用方法作為運(yùn)行請(qǐng)先確保當(dāng)前主機(jī)已經(jīng)安裝和啟動(dòng)通過(guò)命令啟動(dòng)訪問(wèn)假設(shè)運(yùn)行于端口訪問(wèn)以獲取某個(gè)爬蟲(chóng)任務(wù)的日志分析詳情配合實(shí)現(xiàn)爬蟲(chóng)進(jìn)度可視化詳見(jiàn)在代碼中使用
GitHub 開(kāi)源
my8100 / logparser
安裝通過(guò) pip:
pip install logparser
通過(guò) git:
git clone https://github.com/my8100/logparser.git cd logparser python setup.py install使用方法 作為 service 運(yùn)行
請(qǐng)先確保當(dāng)前主機(jī)已經(jīng)安裝和啟動(dòng) Scrapyd
通過(guò)命令 logparser 啟動(dòng) LogParser
訪問(wèn) http://127.0.0.1:6800/logs/stats.json (假設(shè) Scrapyd 運(yùn)行于端口 6800)
訪問(wèn) http://127.0.0.1:6800/logs/projectname/spidername/jobid.json 以獲取某個(gè)爬蟲(chóng)任務(wù)的日志分析詳情
配合 ScrapydWeb 實(shí)現(xiàn)爬蟲(chóng)進(jìn)度可視化詳見(jiàn) my8100 / scrapydweb
In [1]: from logparser import parse In [2]: log = """2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo) ...: 2018-10-23 18:29:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats: ...: {"downloader/exception_count": 3, ...: "downloader/exception_type_count/twisted.internet.error.TCPTimedOutError": 3, ...: "downloader/request_bytes": 1336, ...: "downloader/request_count": 7, ...: "downloader/request_method_count/GET": 7, ...: "downloader/response_bytes": 1669, ...: "downloader/response_count": 4, ...: "downloader/response_status_count/200": 2, ...: "downloader/response_status_count/302": 1, ...: "downloader/response_status_count/404": 1, ...: "dupefilter/filtered": 1, ...: "finish_reason": "finished", ...: "finish_time": datetime.datetime(2018, 10, 23, 10, 29, 41, 174719), ...: "httperror/response_ignored_count": 1, ...: "httperror/response_ignored_status_count/404": 1, ...: "item_scraped_count": 2, ...: "log_count/CRITICAL": 5, ...: "log_count/DEBUG": 14, ...: "log_count/ERROR": 5, ...: "log_count/INFO": 75, ...: "log_count/WARNING": 3, ...: "offsite/domains": 1, ...: "offsite/filtered": 1, ...: "request_depth_max": 1, ...: "response_received_count": 3, ...: "retry/count": 2, ...: "retry/max_reached": 1, ...: "retry/reason_count/twisted.internet.error.TCPTimedOutError": 2, ...: "scheduler/dequeued": 7, ...: "scheduler/dequeued/memory": 7, ...: "scheduler/enqueued": 7, ...: "scheduler/enqueued/memory": 7, ...: "start_time": datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)} ...: 2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)""" In [3]: d = parse(log, headlines=1, taillines=1) In [4]: d Out[4]: OrderedDict([("head", "2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)"), ("tail", "2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)"), ("first_log_time", "2018-10-23 18:28:34"), ("latest_log_time", "2018-10-23 18:29:42"), ("elapsed", "0:01:08"), ("first_log_timestamp", 1540290514), ("latest_log_timestamp", 1540290582), ("datas", []), ("pages", 3), ("items", 2), ("latest_matches", {"resuming_crawl": "", "latest_offsite": "", "latest_duplicate": "", "latest_crawl": "", "latest_scrape": "", "latest_item": "", "latest_stat": ""}), ("latest_crawl_timestamp", 0), ("latest_scrape_timestamp", 0), ("log_categories", {"critical_logs": {"count": 5, "details": []}, "error_logs": {"count": 5, "details": []}, "warning_logs": {"count": 3, "details": []}, "redirect_logs": {"count": 1, "details": []}, "retry_logs": {"count": 2, "details": []}, "ignore_logs": {"count": 1, "details": []}}), ("shutdown_reason", "N/A"), ("finish_reason", "finished"), ("last_update_timestamp", 1547559048), ("last_update_time", "2019-01-15 21:30:48")]) In [5]: d["elapsed"] Out[5]: "0:01:08" In [6]: d["pages"] Out[6]: 3 In [7]: d["items"] Out[7]: 2 In [8]: d["finish_reason"] Out[8]: "finished"
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://specialneedsforspecialkids.com/yun/43066.html
摘要:支持一鍵部署項(xiàng)目到集群。添加郵箱帳號(hào)設(shè)置郵件工作時(shí)間和基本觸發(fā)器,以下示例代表每隔小時(shí)或當(dāng)某一任務(wù)完成時(shí),并且當(dāng)前時(shí)間是工作日的點(diǎn),點(diǎn)和點(diǎn),將會(huì)發(fā)送通知郵件。除了基本觸發(fā)器,還提供了多種觸發(fā)器用于處理不同類型的,包括和等。 showImg(https://segmentfault.com/img/remote/1460000018772067?w=1680&h=869); 安裝和配置 ...
摘要:通用網(wǎng)絡(luò)爬蟲(chóng)通用網(wǎng)絡(luò)爬蟲(chóng)又稱全網(wǎng)爬蟲(chóng),爬取對(duì)象從一些種子擴(kuò)充到整個(gè)。為提高工作效率,通用網(wǎng)絡(luò)爬蟲(chóng)會(huì)采取一定的爬取策略。介紹是一個(gè)國(guó)人編寫的強(qiáng)大的網(wǎng)絡(luò)爬蟲(chóng)系統(tǒng)并帶有強(qiáng)大的。 爬蟲(chóng) 簡(jiǎn)單的說(shuō)網(wǎng)絡(luò)爬蟲(chóng)(Web crawler)也叫做網(wǎng)絡(luò)鏟(Web scraper)、網(wǎng)絡(luò)蜘蛛(Web spider),其行為一般是先爬到對(duì)應(yīng)的網(wǎng)頁(yè)上,再把需要的信息鏟下來(lái)。 分類 網(wǎng)絡(luò)爬蟲(chóng)按照系統(tǒng)結(jié)構(gòu)和實(shí)現(xiàn)技術(shù),...
摘要:通用網(wǎng)絡(luò)爬蟲(chóng)通用網(wǎng)絡(luò)爬蟲(chóng)又稱全網(wǎng)爬蟲(chóng),爬取對(duì)象從一些種子擴(kuò)充到整個(gè)。為提高工作效率,通用網(wǎng)絡(luò)爬蟲(chóng)會(huì)采取一定的爬取策略。介紹是一個(gè)國(guó)人編寫的強(qiáng)大的網(wǎng)絡(luò)爬蟲(chóng)系統(tǒng)并帶有強(qiáng)大的。 爬蟲(chóng) 簡(jiǎn)單的說(shuō)網(wǎng)絡(luò)爬蟲(chóng)(Web crawler)也叫做網(wǎng)絡(luò)鏟(Web scraper)、網(wǎng)絡(luò)蜘蛛(Web spider),其行為一般是先爬到對(duì)應(yīng)的網(wǎng)頁(yè)上,再把需要的信息鏟下來(lái)。 分類 網(wǎng)絡(luò)爬蟲(chóng)按照系統(tǒng)結(jié)構(gòu)和實(shí)現(xiàn)技術(shù),...
摘要:時(shí)間永遠(yuǎn)都過(guò)得那么快,一晃從年注冊(cè),到現(xiàn)在已經(jīng)過(guò)去了年那些被我藏在收藏夾吃灰的文章,已經(jīng)太多了,是時(shí)候把他們整理一下了。那是因?yàn)槭詹貖A太亂,橡皮擦給設(shè)置私密了,不收拾不好看呀。 ...
摘要:現(xiàn)已全面發(fā)布,采用主線內(nèi)核,并且支持離線安裝,給你更好的部署體驗(yàn)。在中,新的服務(wù)裝載著內(nèi)核服務(wù),下載源代碼后進(jìn)行編譯,接著創(chuàng)建并啟動(dòng)一種可以在操作臺(tái)顯示的服務(wù)。 RancherOS v0.8.0現(xiàn)已全面發(fā)布,采用Linux 4.9.9主線內(nèi)核,并且支持離線安裝,給你更好的部署體驗(yàn)。同時(shí),還有更早啟動(dòng)cloud-init、支持cloud-config驗(yàn)證、新的ZFS服務(wù)等一系列新功能。 ...
閱讀 1702·2021-11-25 09:43
閱讀 2665·2019-08-30 15:53
閱讀 1808·2019-08-30 15:52
閱讀 2898·2019-08-29 13:56
閱讀 3317·2019-08-26 12:12
閱讀 565·2019-08-23 17:58
閱讀 2127·2019-08-23 16:59
閱讀 932·2019-08-23 16:21