摘要:準(zhǔn)備工作運(yùn)行本地?cái)?shù)據(jù)庫(kù)服務(wù)器安裝建表連接數(shù)據(jù)庫(kù)用操作還是比較簡(jiǎn)單的,如果有一點(diǎn)數(shù)據(jù)庫(kù)基礎(chǔ)的話,可以直接上手,最后一定不要忘了寫提交,不然數(shù)據(jù)只是緩存,存不到數(shù)據(jù)庫(kù)里完整示例爬取百度上最熱的幾個(gè)新聞標(biāo)題,并存儲(chǔ)到數(shù)據(jù)庫(kù),太懶了沒(méi)寫注釋
準(zhǔn)備工作
運(yùn)行本地?cái)?shù)據(jù)庫(kù)服務(wù)器
mysql -u root -p
安裝pymysql
pip install pymysql建表
CREATE DATABASE crawls; // show databases; use db; CREATE TABLE IF NOT EXISTS baiduNews(" "id INT PRIMARY KEY NOT NULL AUTO_INCREMENT," "ranking VARCHAR(30)," "title VARCHAR(60)," "datetime TIMESTAMP," "hot VARCHAR(30)); // show tables;pymysql連接數(shù)據(jù)庫(kù)
db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456", db="crawls", charset="utf8") cursor = db.cursor() cursor.execute(sql_query) db.commit()
用python操作mysql還是比較簡(jiǎn)單的,如果有一點(diǎn)數(shù)據(jù)庫(kù)基礎(chǔ)的話,可以直接上手,最后一定不要忘了寫commit提交,不然數(shù)據(jù)只是緩存,存不到數(shù)據(jù)庫(kù)里
完整示例爬取百度上最熱的幾個(gè)新聞標(biāo)題,并存儲(chǔ)到數(shù)據(jù)庫(kù),太懶了沒(méi)寫注釋-_- (確保本地mysql服務(wù)器已經(jīng)打開)
""" Get the hottest news title on baidu page, then save these data into mysql """ import datetime import pymysql from pyquery import PyQuery as pq import requests from requests.exceptions import ConnectionError URL = "https://www.baidu.com/s?wd=%E7%83%AD%E7%82%B9" headers = { "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36", "Upgrade-Insecure-Requests": "1" } def get_html(url): try: response = requests.get(url, headers=headers) if response.status_code == 200: return response.text return None except ConnectionError as e: print(e.args) return None def parse_html(html): doc = pq(html) trs = doc(".FYB_RD table.c-table tr").items() for tr in trs: index = tr("td:nth-child(1) span.c-index").text() title = tr("td:nth-child(1) span a").text() hot = tr("td:nth-child(2)").text().strip(""") yield { "index":index, "title":title, "hot":hot } def save_to_mysql(items): try: db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456", db="crawls", charset="utf8") cursor = db.cursor() cursor.execute("use crawls;") cursor.execute("CREATE TABLE IF NOT EXISTS baiduNews(" "id INT PRIMARY KEY NOT NULL AUTO_INCREMENT," "ranking VARCHAR(30)," "title VARCHAR(60)," "datetime TIMESTAMP," "hot VARCHAR(30));") try: for item in items: print(item) now = datetime.datetime.now() now = now.strftime("%Y-%m-%d %H:%M:%S") sql_query = "INSERT INTO baiduNews(ranking, title, datetime, hot) VALUES ("%s", "%s", "%s", "%s")" % ( item["index"], item["title"], now, item["hot"]) cursor.execute(sql_query) print("Save into mysql") db.commit() except pymysql.MySQLError as e: db.rollback() print(e.args) return except pymysql.MySQLError as e: print(e.args) return def check_mysql(): try: db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456", db="crawls", charset="utf8") cursor = db.cursor() cursor.execute("use crawls;") sql_query = "SELECT * FROM baiduNews" results = cursor.execute(sql_query) print(results) except pymysql.MySQLError as e: print(e.args) def main(): html = get_html(URL) items = parse_html(html) save_to_mysql(items) #check_mysql() if __name__ == "__main__": main()
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://specialneedsforspecialkids.com/yun/43127.html
摘要:本篇內(nèi)容為網(wǎng)絡(luò)爬蟲初級(jí)操作的簡(jiǎn)單介紹,內(nèi)容主要有以下部分解析網(wǎng)頁(yè)數(shù)據(jù)庫(kù)解析網(wǎng)頁(yè)一般來(lái)說(shuō),解析網(wǎng)頁(yè)有三種方式正則表達(dá)式。關(guān)于,我們最后再來(lái)看一個(gè)實(shí)戰(zhàn)項(xiàng)目爬取北京二手房?jī)r(jià)格。代碼如下第頁(yè)這樣就成功爬取了安居客上前頁(yè)的北京二手房?jī)r(jià)格。 本篇內(nèi)容為 python 網(wǎng)絡(luò)爬蟲初級(jí)操作的簡(jiǎn)單介紹,內(nèi)容主要有以下 2 部分: 解析網(wǎng)頁(yè) 數(shù)據(jù)庫(kù) 解析網(wǎng)頁(yè) 一般來(lái)說(shuō),解析網(wǎng)頁(yè)有三種方式:正則表達(dá)式、...
摘要:本篇內(nèi)容為網(wǎng)絡(luò)爬蟲初級(jí)操作的簡(jiǎn)單介紹,內(nèi)容主要有以下部分解析網(wǎng)頁(yè)數(shù)據(jù)庫(kù)解析網(wǎng)頁(yè)一般來(lái)說(shuō),解析網(wǎng)頁(yè)有三種方式正則表達(dá)式。關(guān)于,我們最后再來(lái)看一個(gè)實(shí)戰(zhàn)項(xiàng)目爬取北京二手房?jī)r(jià)格。代碼如下第頁(yè)這樣就成功爬取了安居客上前頁(yè)的北京二手房?jī)r(jià)格。 本篇內(nèi)容為 python 網(wǎng)絡(luò)爬蟲初級(jí)操作的簡(jiǎn)單介紹,內(nèi)容主要有以下 2 部分: 解析網(wǎng)頁(yè) 數(shù)據(jù)庫(kù) 解析網(wǎng)頁(yè) 一般來(lái)說(shuō),解析網(wǎng)頁(yè)有三種方式:正則表達(dá)式、...
摘要:學(xué)習(xí)網(wǎng)絡(luò)爬蟲主要分個(gè)大的版塊抓取,分析,存儲(chǔ)另外,比較常用的爬蟲框架,這里最后也詳細(xì)介紹一下。網(wǎng)絡(luò)爬蟲要做的,簡(jiǎn)單來(lái)說(shuō),就是實(shí)現(xiàn)瀏覽器的功能。 Python學(xué)習(xí)網(wǎng)絡(luò)爬蟲主要分3個(gè)大的版塊:抓取,分析,存儲(chǔ) 另外,比較常用的爬蟲框架Scrapy,這里最后也詳細(xì)介紹一下。 首先列舉一下本人總結(jié)的相關(guān)文章,這些覆蓋了入門網(wǎng)絡(luò)爬蟲需要的基本概念和技巧:寧哥的小站-網(wǎng)絡(luò)爬蟲,當(dāng)我們?cè)跒g覽器中輸入...
閱讀 1961·2021-09-04 16:45
閱讀 747·2019-08-30 15:44
閱讀 894·2019-08-30 13:07
閱讀 456·2019-08-29 16:06
閱讀 1377·2019-08-29 13:43
閱讀 1269·2019-08-26 17:00
閱讀 1526·2019-08-26 13:51
閱讀 2294·2019-08-26 11:48