Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Obtaining Valid Cookies for Web Scraping: Browser Automation Approach for Anti-Scraping and Encrypted Cookie Scenarios

Tech 1

When building web scrapers or sending simulated HTTP requests, especially when working with sites that use captchas, anti-scraping protected sites often rotate or invalidate cookies on a regular basis. Manually copying cookies from a browser for reuse quickly becomes non-functional. To bypass this restriction, we need to generate a trusted, valid cookie the same way a real browser does when visiting the site. Selenium is an ideal tool for this workflow.

What is Selenium

Selenium was originally developed as an automated web testing tool. It can programmatically drive a real browser to execute custom logic, fully simulating the behavior of a human user navigating and interacting with a target site. This capability makes it extremely useful for scraping workflows that require legitimate browser-generated cookies.

Prerequisites

  • Install the Google Chrome browser
  • Download a version of ChromeDriver that matches your installed Chrome version

Check your Chrome version

To view your current Chrome version, enter the following line into Chrome's address bar:

chrome://version/

Download matching ChromeDriver

The major version number of ChromeDriver must exactly match your installed Chrome version. Note that very new Chrome releases may not have an official matching ChromeDriver available immediately.

Block automatic Chrome updates

Chrome will often update automatically in the background, which breaks compatibility with an existing ChromeDriver installation. To prevent this, add the fololwing line to your system hosts file to block Chrome's update servers:

127.0.0.1 update.googleapis.com

Code Implementation

Below is a working Java example to extract valid cookies from a browser session:

// Target site URL, can also be a direct captcha endpoint
String targetSite = "https://www.baidu.com";
// Set the file path to your ChromeDriver executable
System.setProperty("webdriver.chrome.driver", "./chromedriver.exe");

// Optional: enable headless mode to run without a visible browser window
ChromeOptions options = new ChromeOptions();
options.addArguments("--headless=new");
options.addArguments("--disable-gpu");

ChromeDriver driver = new ChromeDriver(options);
driver.get(targetSite);

// Collect all cookies from the current browser session
Set<Cookie> rawCookies = driver.manage().getCookies();
StringBuilder cookieBuilder = new StringBuilder();

for (Cookie cookie : rawCookies) {
    cookieBuilder.append(cookie.getName())
                .append("=")
                .append(cookie.getValue())
                .append(";");
}

// Output the formatted cookie string ready for use in other requests
System.out.println(cookieBuilder.toString());

// Clean up the driver instance to avoid leftover processes
driver.quit();

Alternative Quick Cookie Extraction

For users working with other programming languages, different operating systems, or developers that do not want to configure a full Chrome + ChromeDriver environment, there are alternative lightweight solutions available in relevant project documentation.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.