Fading Coder

One Final Commit for the Last Sprint

Scraping NetEase Cloud Music Hot Comments to Generate Word Clouds

Data Collection Building a word cloud requires raw data first. For NetEase Cloud Music, this involves several steps: Packet analysis to locate the API endpoint Handling encrypted request parameters Extracting hot comment content Packet Analysis Using Chrome DevTools, the comment API endpoint becomes...

Web Scraping Sina Weibo Hot Search List

Core Libraries import json import time from urllib.parse import quote import requests import warnings warnings.filterwarnings('ignore') Fetching Hot Search Data This function retrieves the raw JSON data from Weibo's hot search endpoint. def get_hot_search_feed(): target_url = 'https://weibo.com/ajax...

Troubleshooting Common Development and Data Collection Errors

1. SSL/TLS Connection Failures A SSLError(SSLEOFError(...)) often indicates a protocol violation during SSL handshake. When web scraping foreign websites, this can be caused by proxy settings interfering with the connection. Solution: Check and disable proxy environment variables. Test connectivity...

Five Python Approaches to Complete the First Heibanke Crawler Practice Level

The target practice site is http://www.heibanke.com/lesson/crawler_ex00/, which requires navigating through a sequence of 5-digit numeric values appended to the base URL path until reaching the final challenge page. Below are five Python-based automation methods to complete this level. Method 1: Usi...

Storing Web Scraped Data in Python: TXT, JSON, and CSV Formats

Storing Web Scraped Data in Python: TXT, JSON, and CSV Formats
1. TXT File Storage Saving data to plain text files is straightforward, and TXT files are compatible with nearly all platforms. However, a significant drawback is their poor suitability for data retrieval and structured queries. If search functionality and complex data structures are not priorities,...

Automated Scraping of WeChat Official Account Articles Using Playwright with Auto-Scrolling

This article demonstrates how to use Playwright with automatic scrolling to scrape all historical article titles and links from a WeChat Offficial Account. The code is provided for educational purposes only. import re from playwright.sync_api import sync_playwright def scrape_wechat_articles(): with...

Building Your First Web Scraper with Node.js

Introduction Generally speaking, when it comes to web scraping, Python is often the preferred choice due to its simplicity and ease of use. However, recently I have written several articles about scrapers, but I found that using only Python becomes inefficient when dealing with large-scale data extr...

Web Scraping Image Galleries with Python: Overcoming Referer Headers and Anti-Scraping Measures

When implementing a web scraper to download images from a gallery site, the initial attempt to download pictures resulted in corrupted files. Directly accessing the image URLs in a browser worked for previously viewed images but failed for new ones, suggesting a server-side check. Analysis of networ...

Automating Blog Article Access and Statistics with Python

This tutorial demonstrates how to programmatical control web browsers using Python to automate the opening of blog articles, collect statistics, and manage browser processes efficiently. Basic Browser Automation The following approach automatically launches your default browser and opens specific we...

Python Web Scraping Basics: The requests Library

1. Installing requests The requests library provides features including URL retrieval, HTTP persistent connections and connection pooling, browser-style SSL verification, authentication, cookie sessions, chunked file uploads, streaming downloads, HTTP(S) proxy support, and connection timeout handlin...