Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Automated Python Web Scraper for University Announcements with WeChat Push Notifications

Tech 1

Selecting a Push Notification Service

To receive real-time alerts, a lightweight push service is required. The Xiatusha (xtuis) WeChat Official Account API provides a straightforward method. By following the account and retrieving a personal authorization token, text-based alerts can be dispatched directly to a WeChat client.

Configuring the Scraping Interval

To avoid overwhelming the target institution's web server while maintaining reasonable timeliness, a one-hour interval between execution cycles is optimal. This balance ensures prompt updates without causing disruptive traffic spikes.

Cloud Server Deploymant

Continuous 24/7 execution necessitates a reliable hosting environment. Alibaba Cloud Compute offers a free two-month trial for authenticated students, making it an ideal choice for hosting the scraper. A lightweight ECS instance running CentOS is sufficient for this task.

Implementation Process

1. Acquiring the API Token

Register with the Xiatusha WeChat Official Account and copy the assigned token. This token is requried to authenticate the push requests.

2. Developing the Scraping Logic

Analyze the HTML structure of the target university's notice board. Inspect the elements containing the announcements—in this case, paragraphs with the newscontent class, which encapsulate the anchor tags for links and span tags for publication dates.

python import requests from bs4 import BeautifulSoup from datetime import datetime

TARGET_URL = "https://zs.gpnu.edu.cn/bkzn/a2023sefdzsb.htm" DOMAIN_PREFIX = "https://zs.gpnu.edu.cn/" PUSH_API = "https://wx.xtuis.cn/YOUR_TOKEN.send"

def check_for_updates(): http_response = requests.get(TARGET_URL) http_response.encoding = 'utf-8'

if http_response.status_code != 200:
    raise ConnectionError("Failed to fetch the target page")
    
parsed_dom = BeautifulSoup(http_response.text, "html.parser")
news_items = parsed_dom.find_all("p", class_="newscontent")

current_date_tag = datetime.now().strftime('[%Y-%m-%d]')

for item in news_items:
    link_element = item.find("a")
    date_element = item.find("span")
    
    if not link_element or not date_element:
        continue
        
    relative_path = link_element["href"][3:]
    article_title = link_element.get_text(strip=True)
    publish_date = date_element.get_text(strip=True)
    absolute_url = DOMAIN_PREFIX + relative_path
    
    if publish_date == current_date_tag:
        notify_new_announcement(article_title, absolute_url, publish_date)

def notify_new_announcement(title, url, date): payload = { 'text': 'New Announcement Alert!', 'desp': f'Title: {title}URL: {url}Published: {date}' } requests.post(PUSH_API, data=payload)

if name == "main": check_for_updates()

3. Server Configuration

After provisioning the Alibaba Cloud lightweight server, connect via SSH. To simplify environment management, install the BaoTa panel using the following CentOS command:

bash su yum install -y wget && wget -O install.sh https://download.bt.cn/install/install_6.0.sh && sh install.sh ed8484bec

Configure a cron job within the BaoTa panel or the server's crontab to execute the Python script every hour.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.