Home > Tech > Content

Automated Python Web Scraper for University Announcements with WeChat Push Notifications

Tech Apr 19 21

Selecting a Push Notification Service

To receive real-time alerts, a lgihtweight push service is required. The Xiatusha (xtuis) WeChat Official Account API provides a straightforward method. By following the account and retrieving a personal authorization token, text-based alerts can be dispatched diretcly to a WeChat client.

Configuring the Scraping Interval

To avoid overwhelming the target institution's web server while maintaining reasonable timeliness, a one-hour interval between execution cycles is optimal. This balance ensures prompt updates without causing disruptive traffic spikes.

Cloud Server Deployment

Continuous 24/7 execution necessitates a reliable hosting environment. Alibaba Cloud Compute offers a free two-month trial for authenticated students, making it an ideal choice for hosting the scraper. A lightweight ECS instance running CentOS is sufficient for this task.

Implementation Process

1. Acquiring the API Token

Register with the Xiatusha WeChat Official Account and copy the assigned token. This token is required to authenticate the push requests.

2. Developing the Scraping Logic

Analyze the HTML structure of the target university's notice board. Inspect the elements containing the announcements—in this case, paragraphs with the newscontent clas, which encapsulate the anchor tags for links and span tags for publication dates.

import requests
from bs4 import BeautifulSoup
from datetime import datetime

TARGET_URL = "https://zs.gpnu.edu.cn/bkzn/a2023sefdzsb.htm"
DOMAIN_PREFIX = "https://zs.gpnu.edu.cn/"
PUSH_API = "https://wx.xtuis.cn/YOUR_TOKEN.send"

def check_for_updates():
    http_response = requests.get(TARGET_URL)
    http_response.encoding = 'utf-8'
    
    if http_response.status_code != 200:
        raise ConnectionError("Failed to fetch the target page")
        
    parsed_dom = BeautifulSoup(http_response.text, "html.parser")
    news_items = parsed_dom.find_all("p", class_="newscontent")
    
    current_date_tag = datetime.now().strftime('[%Y-%m-%d]')
    
    for item in news_items:
        link_element = item.find("a")
        date_element = item.find("span")
        
        if not link_element or not date_element:
            continue
            
        relative_path = link_element["href"][3:]
        article_title = link_element.get_text(strip=True)
        publish_date = date_element.get_text(strip=True)
        absolute_url = DOMAIN_PREFIX + relative_path
        
        if publish_date == current_date_tag:
            notify_new_announcement(article_title, absolute_url, publish_date)

def notify_new_announcement(title, url, date):
    payload = {
        'text': 'New Announcement Alert!',
        'desp': f'Title: {title}<br>URL: {url}<br>Published: {date}'
    }
    requests.post(PUSH_API, data=payload)

if __name__ == "__main__":
    check_for_updates()

3. Server Configuration

After provisioning the Alibaba Cloud lightweight server, connect via SSH. To simplify environment management, install the BaoTa panel using the following CentOS command:

su
yum install -y wget && wget -O install.sh https://download.bt.cn/install/install_6.0.sh && sh install.sh ed8484bec

Configure a cron job within the BaoTa panel or the server's crontab to execute the Python script every hour.

Back to List

Prev: Setting Up the LibGDX Framework for Android Game Development

Next: Getting Started with Matter.js: A Guide to Interactive Physics Simulations

Fading Coder

Automated Python Web Scraper for University Announcements with WeChat Push Notifications

Selecting a Push Notification Service

Configuring the Scraping Interval

Cloud Server Deployment

Implementation Process

1. Acquiring the API Token

2. Developing the Scraping Logic

3. Server Configuration

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Automated Python Web Scraper for University Announcements with WeChat Push Notifications

Selecting a Push Notification Service

Configuring the Scraping Interval

Cloud Server Deployment

Implementation Process

1. Acquiring the API Token

2. Developing the Scraping Logic

3. Server Configuration

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment