Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Automating Lazy-Loaded Image Scraping and Screenshots with Selenium WebDriver

Tech Apr 27 8

Handling dynamic web pages often requires interacting with JavaScript elements that load content only when visible in the viewport. Standard HTTP requests fail here because the DOM is populated asynchronously. Selenium WebDriver provides a solution by controlling a real browser instance, allowing for actions like scroling to trigger load events and capturing the state via screenshots.

The following implementation demonstrates how to navigate to a target URL, iteratively scroll down to reveal hidden content, capture screenshots at each interval, and count the number of loaded image elements. This approach is particularly useful for galleries or infinite scroll pages where data retrieval depends on user interaction simulation.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

class LazyLoadHandler:
    def __init__(self):
        # Configure headless browser options
        options = Options()
        options.add_argument('--headless')
        options.add_argument('--disable-gpu')
        # Initialize the driver (ensure chromedriver is in PATH)
        self.driver = webdriver.Chrome(options=options)
        self.driver.set_window_size(1920, 1080)
    
    def process_page(self):
        target_url = "https://mm.taobao.com/self/album_photo.htm?spm=719.6642053.0.0.o5BDC0&user_id=687471686&album_id=183809402&album_flag=0"
        
        try:
            self.driver.get(target_url)
            
            # JavaScript snippet to scroll to the bottom
            scroll_script = "window.scrollTo(0, document.body.scrollHeight);"
            
            for iteration in range(50):
                # Execute scroll action
                self.driver.execute_script(scroll_script)
                
                # Allow time for dynamic content to render
                time.sleep(0.2)
                
                # Capture the current state of the viewport
                filename = f"screenshot_iteration_{iteration}.png"
                self.driver.save_screenshot(filename)
                
                # Locate image containers based on specific class structure
                elements = self.driver.find_elements(By.XPATH, '//div[@class="mm-photoW-cell-middle"]')
                count = len(elements)
                
                print(f"Iteration {iteration + 1}: Detected {count} loaded images.")
                
        except Exception as e:
            print(f"An error occurred during processing: {e}")
        finally:
            self.driver.quit()

if __name__ == '__main__':
    handler = LazyLoadHandler()
    handler.process_page()

Upon execution, the script initializes a headless browser session and navigtaes to the specified gallery. As the loop progresses, the page scrolls downward, triggering the lazy-loading mechanism. The console output reflects the increasing count of discovered image elements as more content becomes visible in the DOM. Screenshots are saved sequentially, providing a visual record of the loading process at each step.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.