Obtaining Valid Cookies for Web Scraping: Browser Automation Approach for Anti-Scraping and Encrypted Cookie Scenarios
When building web scrapers or sending simulated HTTP requests, especially when working with sites that use captchas, anti-scraping protected sites often rotate or invalidate cookies on a regular basis. Manually copying cookies from a browser for reuse quickly becomes non-functional. To bypass this restriction, we need to generate a trusted, valid cookie the same way a real browser does when visiting the site. Selenium is an ideal tool for this workflow.
What is Selenium
Selenium was originally developed as an automated web testing tool. It can programmatically drive a real browser to execute custom logic, fully simulating the behavior of a human user navigating and interacting with a target site. This capability makes it extremely useful for scraping workflows that require legitimate browser-generated cookies.
Prerequisites
- Install the Google Chrome browser
- Download a version of ChromeDriver that matches your installed Chrome version
Check your Chrome version
To view your current Chrome version, enter the following line into Chrome's address bar:
chrome://version/
Download matching ChromeDriver
The major version number of ChromeDriver must exactly match your installed Chrome version. Note that very new Chrome releases may not have an official matching ChromeDriver available immediately.
Block automatic Chrome updates
Chrome will often update automatically in the background, which breaks compatibility with an existing ChromeDriver installation. To prevent this, add the fololwing line to your system hosts file to block Chrome's update servers:
127.0.0.1 update.googleapis.com
Code Implementation
Below is a working Java example to extract valid cookies from a browser session:
// Target site URL, can also be a direct captcha endpoint
String targetSite = "https://www.baidu.com";
// Set the file path to your ChromeDriver executable
System.setProperty("webdriver.chrome.driver", "./chromedriver.exe");
// Optional: enable headless mode to run without a visible browser window
ChromeOptions options = new ChromeOptions();
options.addArguments("--headless=new");
options.addArguments("--disable-gpu");
ChromeDriver driver = new ChromeDriver(options);
driver.get(targetSite);
// Collect all cookies from the current browser session
Set<Cookie> rawCookies = driver.manage().getCookies();
StringBuilder cookieBuilder = new StringBuilder();
for (Cookie cookie : rawCookies) {
cookieBuilder.append(cookie.getName())
.append("=")
.append(cookie.getValue())
.append(";");
}
// Output the formatted cookie string ready for use in other requests
System.out.println(cookieBuilder.toString());
// Clean up the driver instance to avoid leftover processes
driver.quit();
Alternative Quick Cookie Extraction
For users working with other programming languages, different operating systems, or developers that do not want to configure a full Chrome + ChromeDriver environment, there are alternative lightweight solutions available in relevant project documentation.