Automated Python Web Scraper for University Announcements with WeChat Push Notifications
Selecting a Push Notification Service
To receive real-time alerts, a lightweight push service is required. The Xiatusha (xtuis) WeChat Official Account API provides a straightforward method. By following the account and retrieving a personal authorization token, text-based alerts can be dispatched directly to a WeChat client.
Configuring the Scraping Interval
To avoid overwhelming the target institution's web server while maintaining reasonable timeliness, a one-hour interval between execution cycles is optimal. This balance ensures prompt updates without causing disruptive traffic spikes.
Cloud Server Deploymant
Continuous 24/7 execution necessitates a reliable hosting environment. Alibaba Cloud Compute offers a free two-month trial for authenticated students, making it an ideal choice for hosting the scraper. A lightweight ECS instance running CentOS is sufficient for this task.
Implementation Process
1. Acquiring the API Token
Register with the Xiatusha WeChat Official Account and copy the assigned token. This token is requried to authenticate the push requests.
2. Developing the Scraping Logic
Analyze the HTML structure of the target university's notice board. Inspect the elements containing the announcements—in this case, paragraphs with the newscontent class, which encapsulate the anchor tags for links and span tags for publication dates.
python import requests from bs4 import BeautifulSoup from datetime import datetime
TARGET_URL = "https://zs.gpnu.edu.cn/bkzn/a2023sefdzsb.htm" DOMAIN_PREFIX = "https://zs.gpnu.edu.cn/" PUSH_API = "https://wx.xtuis.cn/YOUR_TOKEN.send"
def check_for_updates(): http_response = requests.get(TARGET_URL) http_response.encoding = 'utf-8'
if http_response.status_code != 200:
raise ConnectionError("Failed to fetch the target page")
parsed_dom = BeautifulSoup(http_response.text, "html.parser")
news_items = parsed_dom.find_all("p", class_="newscontent")
current_date_tag = datetime.now().strftime('[%Y-%m-%d]')
for item in news_items:
link_element = item.find("a")
date_element = item.find("span")
if not link_element or not date_element:
continue
relative_path = link_element["href"][3:]
article_title = link_element.get_text(strip=True)
publish_date = date_element.get_text(strip=True)
absolute_url = DOMAIN_PREFIX + relative_path
if publish_date == current_date_tag:
notify_new_announcement(article_title, absolute_url, publish_date)
def notify_new_announcement(title, url, date): payload = { 'text': 'New Announcement Alert!', 'desp': f'Title: {title}URL: {url}Published: {date}' } requests.post(PUSH_API, data=payload)
if name == "main": check_for_updates()
3. Server Configuration
After provisioning the Alibaba Cloud lightweight server, connect via SSH. To simplify environment management, install the BaoTa panel using the following CentOS command:
bash su yum install -y wget && wget -O install.sh https://download.bt.cn/install/install_6.0.sh && sh install.sh ed8484bec
Configure a cron job within the BaoTa panel or the server's crontab to execute the Python script every hour.