Fading Coder

One Final Commit for the Last Sprint

Web Scraping Fundamentals: Understanding Crawlers and HTTP Protocol

Data Acquisition in the Digital Age In today's data-driven landscape, information originates from multiple channels: Enterprise-generated user data: Baidu Index, Alibaba Index, Tencent Browsing Index, Weibo Index Purchased datasets: Data marketplaces and exchanges Government/institutional open data:...

Python Web Scraping: Single-Threaded vs Multi-Threaded Approaches

Overview Web scraping is a common technique for extracting data from websites. This article demonstrates how to build an image scraper in Python using two different approaches: a sequential single-threaded version and a concurrent multi-threaded version. The code examples illlustrate key concepts li...

Building Web Scrapers with Python: Core Concepts and Practical Foundations

In today’s data-driven world, extracting structured information from websites has become a fundamental skill. Whether tracking price fluctuations across e-commerce platforms, monitoring stock trends, or aggregating public datasets, web scraping enables automation where manual effort is impractical....

Web Scraping with Python using BeautifulSoup

Introduction to Web Scraping Web scraping involves extracting data from websites for analysis and storage. This technique is valuable when working with publicly available web data to research or learning purposes. Required Libraries Install these packages if not already available: pip install reques...

Web Scraping Baidu Search Results with Python

Method 1: Automated Browser Interaction with SeleniumThe first approach involves using Selenium WebDriver to automate browser interactions. This method navigates to the Baidu homepage, locates the search input field, submits a query, and extracts the results.import time from selenium import webdrive...

Building a Multi-threaded Novel Scraper with Python

A straightforward approach to extracting novel content from websites using Python's requests library and lxml for HTML parsing, with multi-threaded download capabilities. Core Configuration stop_flag = False worker_threads = 5 running_state = False thread_lock = threading.Lock() Data Model class Nov...

Automating Lazy-Loaded Image Scraping and Screenshots with Selenium WebDriver

Handling dynamic web pages often requires interacting with JavaScript elements that load content only when visible in the viewport. Standard HTTP requests fail here because the DOM is populated asynchronously. Selenium WebDriver provides a solution by controlling a real browser instance, allowing fo...

Scrapy Framework Fundamentals and Advanced Usage

Project Initialization and Execution To create a new Scrapy project: scrapy startproject project_name cd project_name scrapy genspider spider_name domain.com Run the spider with: scrapy crawl spider_name If dependency errors occur, install compatible versions: pip install Twisted==22.10.0 urllib3==1...

Adapting a Selenium Web Scraper for Windows Environments

To run the web scraping script on a Windows operating system, specific environment configurations are required. Begin by installing the Selenium bindings via the Python package manager. pip install selenium Verify the installation by attempting to import the module in a Python shell. No errors shoul...

Scraping Property Listings from Lianjia Using Python Scrapy

Define a Scrapy Item class to structure the extracted real estate attributes including community identifiers, geographic locations, and transaction URLs. import scrapy class HousingData(scrapy.Item): estate_name = scrapy.Field() listing_link = scrapy.Field() street_address = scrapy.Field() zone_name...