python爬蟲之連接mysql

ISherry 發(fā)布于2019-07-31 10:02 / 1196人閱讀

摘要：準(zhǔn)備工作運(yùn)行本地?cái)?shù)據(jù)庫(kù)服務(wù)器安裝建表連接數(shù)據(jù)庫(kù)用操作還是比較簡(jiǎn)單的，如果有一點(diǎn)數(shù)據(jù)庫(kù)基礎(chǔ)的話，可以直接上手，最后一定不要忘了寫提交，不然數(shù)據(jù)只是緩存，存不到數(shù)據(jù)庫(kù)里完整示例爬取百度上最熱的幾個(gè)新聞標(biāo)題，并存儲(chǔ)到數(shù)據(jù)庫(kù)，太懶了沒(méi)寫注釋

準(zhǔn)備工作

運(yùn)行本地?cái)?shù)據(jù)庫(kù)服務(wù)器

    mysql -u root -p

安裝pymysql

    pip install pymysql

建表

CREATE DATABASE crawls;
// show databases; 
use db;

CREATE TABLE IF NOT EXISTS baiduNews("
       "id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,"
       "ranking VARCHAR(30),"
       "title VARCHAR(60),"
       "datetime TIMESTAMP,"
       "hot VARCHAR(30));
// show tables;

pymysql連接數(shù)據(jù)庫(kù)

db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456", 
                    db="crawls", charset="utf8")
cursor = db.cursor()
cursor.execute(sql_query)
db.commit()

用python操作mysql還是比較簡(jiǎn)單的，如果有一點(diǎn)數(shù)據(jù)庫(kù)基礎(chǔ)的話，可以直接上手，最后一定不要忘了寫commit提交，不然數(shù)據(jù)只是緩存，存不到數(shù)據(jù)庫(kù)里

完整示例

爬取百度上最熱的幾個(gè)新聞標(biāo)題，并存儲(chǔ)到數(shù)據(jù)庫(kù)，太懶了沒(méi)寫注釋-_- (確保本地mysql服務(wù)器已經(jīng)打開）

"""
Get the hottest news title on baidu page,
then save these data into mysql
"""
import datetime

import pymysql
from pyquery import PyQuery as pq
import requests
from requests.exceptions import ConnectionError

URL = "https://www.baidu.com/s?wd=%E7%83%AD%E7%82%B9"
headers = {
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36",
    "Upgrade-Insecure-Requests": "1"
}

def get_html(url):
    try:
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            return response.text
        return None
    except ConnectionError as e:
        print(e.args)
        return None

def parse_html(html):
    doc = pq(html)
    trs = doc(".FYB_RD table.c-table tr").items()
    for tr in trs:
        index = tr("td:nth-child(1) span.c-index").text()
        title = tr("td:nth-child(1) span a").text()
        hot = tr("td:nth-child(2)").text().strip(""")
        yield {
            "index":index,
            "title":title,
            "hot":hot
        }

def save_to_mysql(items):
    try:
        db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456",
                             db="crawls", charset="utf8")
        cursor = db.cursor()
        cursor.execute("use crawls;")
        cursor.execute("CREATE TABLE IF NOT EXISTS baiduNews("
                       "id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,"
                       "ranking VARCHAR(30),"
                       "title VARCHAR(60),"
                       "datetime TIMESTAMP,"
                       "hot VARCHAR(30));")
        try:
            for item in items:
                print(item)
                now = datetime.datetime.now()
                now = now.strftime("%Y-%m-%d %H:%M:%S")
                sql_query = "INSERT INTO baiduNews(ranking, title, datetime, hot) VALUES ("%s", "%s", "%s", "%s")" % (
                            item["index"], item["title"], now, item["hot"])
                cursor.execute(sql_query)
                print("Save into mysql")
            db.commit()
        except pymysql.MySQLError as e:
            db.rollback()
            print(e.args)
            return
    except pymysql.MySQLError as e:
        print(e.args)
        return

def check_mysql():
    try:
        db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456",
                             db="crawls", charset="utf8")
        cursor = db.cursor()
        cursor.execute("use crawls;")
        sql_query = "SELECT * FROM baiduNews"
        results = cursor.execute(sql_query)
        print(results)
    except pymysql.MySQLError as e:
        print(e.args)

def main():
    html = get_html(URL)
    items = parse_html(html)
    save_to_mysql(items)
    #check_mysql()

if __name__ == "__main__":
    main()

云服務(wù)器 GPU云服務(wù)器 python連接MySQL python35連接mysql python3連接mysql 連接之云服務(wù)器失敗

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://specialneedsforspecialkids.com/yun/43127.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

ISherry

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

Linux——Linux驅(qū)動(dòng)之雜項(xiàng)設(shè)備（基本概念、注冊(cè)流程、雜項(xiàng)設(shè)備的驅(qū)動(dòng)編寫）

閱讀 1961·2021-09-04 16:45
給Ant Design list列表增加滑動(dòng)框功能

閱讀 747·2019-08-30 15:44
Bootstrap網(wǎng)格系統(tǒng)

閱讀 894·2019-08-30 13:07
css里的BFC的用法

閱讀 456·2019-08-29 16:06
Material-UI menuItem和NavLink組合使用時(shí)的樣式控制

閱讀 1377·2019-08-29 13:43
UCloud云主機(jī)CentOS 6.X下配置Keepalived VIP

閱讀 1269·2019-08-26 17:00
ES6 static相關(guān)

閱讀 1526·2019-08-26 13:51
面試的信心來(lái)源于過(guò)硬的基礎(chǔ)

閱讀 2294·2019-08-26 11:48

国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

python爬蟲之連接mysql

相關(guān)文章

爬蟲初級(jí)操作（二）

爬蟲初級(jí)操作（二）

Python入門網(wǎng)絡(luò)爬蟲之精華版

發(fā)表評(píng)論

0條評(píng)論

ISherry

男|高級(jí)講師

TA的文章

Linux——Linux驅(qū)動(dòng)之雜項(xiàng)設(shè)備（基本概念、注冊(cè)流程、雜項(xiàng)設(shè)備的驅(qū)動(dòng)編寫）

給Ant Design list列表增加滑動(dòng)框功能

Bootstrap網(wǎng)格系統(tǒng)

css里的BFC的用法

Material-UI menuItem和NavLink組合使用時(shí)的樣式控制

UCloud云主機(jī)CentOS 6.X下配置Keepalived VIP

ES6 static相關(guān)

面試的信心來(lái)源于過(guò)硬的基礎(chǔ)

最新活動(dòng)

資訊專欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

python爬蟲之連接mysql

相關(guān)文章

發(fā)表評(píng)論

0條評(píng)論

男|高級(jí)講師

TA的文章

最新活動(dòng)

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！