爬取博客園首頁并定時(shí)發(fā)送到微信

aaron 發(fā)布于2019-07-30 16:17 / 835人閱讀

摘要：應(yīng)女朋友要求，為了能及時(shí)掌握技術(shù)動(dòng)向，特意寫了這個(gè)爬蟲，每天定時(shí)爬取博客園首頁并發(fā)送至微信。

應(yīng)女朋友要求，為了能及時(shí)掌握技術(shù)動(dòng)向，特意寫了這個(gè)爬蟲，每天定時(shí)爬取博客園首頁并發(fā)送至微信。

環(huán)境：

Python3.4

第三方庫

Requests:向服務(wù)器發(fā)送請(qǐng)求

BeautifulSoup4：解析Html

wxpy：微信接口

Schedule：定時(shí)器

代碼

# -*-coding:utf-8 -*-

import requests
from requests import exceptions
from bs4 import BeautifulSoup as bs
import re
from wxpy import *
import  schedule
import  time


bot=Bot(cache_path=True)

#獲取網(wǎng)頁內(nèi)容
def getHtml(pageIndex):
    #定義請(qǐng)求頭 偽裝成瀏覽器
    headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"}
    #pageIndex代表頁數(shù)
    payload={"CategoryType": "SiteHome", "ParentCategoryId": "0", "CategoryId": "808", "PageIndex": pageIndex, "TotalPostCount": "4000"}
    try:
        r=requests.post("https://www.cnblogs.com/mvc/AggSite/PostList.aspx",data=payload,headers=headers)
        r.raise_for_status()
        r.encoding=r.apparent_encoding
        return r.text
    except requests.RequestException as e:
        return e.strerror
#向微信文件傳輸助手發(fā)送消息
def sendblogmsg(content):
    #搜索自己的好友
    #my_friend = bot.friends().search("")[0]
    my_friend=bot.file_helper
    my_friend.send(content)

def job():
    contents=""
    #i表示當(dāng)前頁數(shù)
    for i in range(1,3):
        html=getHtml(i)
        soup=bs(html,"html.parser")
        blogs=soup.findAll("div",{"class":"post_item_body"})
        for blog in blogs:
            title=blog.find("h3").get_text()
            summary=blog.find("p",{"class":"post_item_summary"}).get_text()
            link=blog.find("a",{"class":"titlelnk"})["href"]
            content="標(biāo)題："+title+"
鏈接："+link+"
-----------
"
            contents+=content
        sendblogmsg(contents)
#定時(shí)
schedule.every().day.at("06:00").do(job)
while True:
    schedule.run_pending()
    time.sleep(1)
bot.join()

注意事項(xiàng)：

不要進(jìn)行惡意攻擊行為

盡量在空閑時(shí)間訪問網(wǎng)站，控制訪問頻率，不要惡意消耗網(wǎng)站資源