Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Automating Lazy-Loaded Image Scraping and Screenshots with Selenium WebDriver

Tech 1

Handling dynamic web pages often requires interacting with JavaScript elements that load content only when visible in the viewport. Standard HTTP requests fail here because the DOM is populated asynchronously. Selenium WebDriver provides a solution by controlling a real browser instance, allowing for actions like scroling to trigger load events and capturing the state via screenshots.

The following implementation demonstrates how to navigate to a target URL, iteratively scroll down to reveal hidden content, capture screenshots at each interval, and count the number of loaded image elements. This approach is particularly useful for galleries or infinite scroll pages where data retrieval depends on user interaction simulation.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

class LazyLoadHandler:
    def __init__(self):
        # Configure headless browser options
        options = Options()
        options.add_argument('--headless')
        options.add_argument('--disable-gpu')
        # Initialize the driver (ensure chromedriver is in PATH)
        self.driver = webdriver.Chrome(options=options)
        self.driver.set_window_size(1920, 1080)
    
    def process_page(self):
        target_url = "https://mm.taobao.com/self/album_photo.htm?spm=719.6642053.0.0.o5BDC0&user_id=687471686&album_id=183809402&album_flag=0"
        
        try:
            self.driver.get(target_url)
            
            # JavaScript snippet to scroll to the bottom
            scroll_script = "window.scrollTo(0, document.body.scrollHeight);"
            
            for iteration in range(50):
                # Execute scroll action
                self.driver.execute_script(scroll_script)
                
                # Allow time for dynamic content to render
                time.sleep(0.2)
                
                # Capture the current state of the viewport
                filename = f"screenshot_iteration_{iteration}.png"
                self.driver.save_screenshot(filename)
                
                # Locate image containers based on specific class structure
                elements = self.driver.find_elements(By.XPATH, '//div[@class="mm-photoW-cell-middle"]')
                count = len(elements)
                
                print(f"Iteration {iteration + 1}: Detected {count} loaded images.")
                
        except Exception as e:
            print(f"An error occurred during processing: {e}")
        finally:
            self.driver.quit()

if __name__ == '__main__':
    handler = LazyLoadHandler()
    handler.process_page()

Upon execution, the script initializes a headless browser session and navigtaes to the specified gallery. As the loop progresses, the page scrolls downward, triggering the lazy-loading mechanism. The console output reflects the increasing count of discovered image elements as more content becomes visible in the DOM. Screenshots are saved sequentially, providing a visual record of the loading process at each step.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.