Configuring Chrome Browser Options with Selenium and Python
Background
When using Selenium for browser rendering to scrape websites, the default is a clean Chrome browser. However, we often use browser extensions, proxies, or other customizations during normal browsing. Correspondingly, when scraping with Chrome, we may need to apply specific configurations to optimize the scraper's behavior.
Common configurations include:
- Disabling image and video loading to speed up page loading.
- Adding a proxy to access certain pages or bypass IP-based anti-scraping measures.
- Using mobile user agents to access mobile sites, wich often have weaker anti-scraping defenses.
- Adding extensions to replicate normal browser functionality.
- Setting encoding to prevent garbled text on Chinese sites.
- Disabling JavaScript execusion.
- And more.
Environment
- Python 3.6.1
- OS: Windows 7
- IDE: PyCharm
- Chrome browser installed
- ChromeDriver configured
- Selenium 3.7.0
chromeOptions
chromeOptions is a class for configuring Chrome startup properties. Through this class, we can set the following parameters (as seen in Selenium source code):
- Set Chrome binary location (
binary_location) - Add startup arguments (
add_argument) - Add extensions (
add_extension,add_encoded_extension) - Add experimental options (
add_experimental_option) - Set debugger address (
debugger_address)
Source code snippet:
# .\Lib\site-packages\selenium\webdriver\chrome\options.py
class Options(object):
def __init__(self):
self._binary_location = ''
self._arguments = []
self._extension_files = []
self._extensions = []
self._experimental_options = {}
self._debugger_address = None
Usage example:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('lang=zh_CN.UTF-8')
driver = webdriver.Chrome(chrome_options=options)
Common Configurations
1. Set Encoding
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('lang=zh_CN.UTF-8')
driver = webdriver.Chrome(chrome_options=options)
2. Simulate Mobile Device
Mobile device user-agent list: http://www.fynas.com/ua
# Simulate Android QQ browser
options.add_argument('user-agent="MQQBrowser/26 Mozilla/5.0 (Linux; U; Android 2.3.7; zh-cn; MB200 Build/GRJ22; CyanogenMod-7) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"')
# Simulate iPhone 6
options.add_argument('user-agent="Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1"')
3. Disable Image Loading
Disabling images can improve page load speed.
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.set_window_size(configure.windowHeight, configure.windowWidth)
wait = WebDriverWait(driver, timeout=configure.timeoutMain)
4. Add Proxy
When adding a proxy, prefer static IPs for better stability. Dynamic proxies may have short lifetimes (1-3 minutes).
from selenium import webdriver
PROXY = "proxy_host:proxy_port"
options = webdriver.ChromeOptions()
desired_capabilities = options.to_capabilities()
desired_capabilities['proxy'] = {
"httpProxy": PROXY,
"ftpProxy": PROXY,
"sslProxy": PROXY,
"noProxy": None,
"proxyType": "MANUAL",
"class": "org.openqa.selenium.Proxy",
"autodetect": False
}
driver = webdriver.Chrome(desired_capabilities=desired_capabilities)
5. Browser Settings
Selenium typically launches a clean browser without extensions. To modify settings like Flash permisssions or clear cookies, one approach is to navigate to chrome://settings/content and automate the configuration.
6. Add Extensions
To load extensions, download the .crx file and use add_extension.
Example: Loading XPath Helper
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
# Set extension path
extension_path = 'D:/extension/XPath-Helper_v2.0.2.crx'
chrome_options.add_extension(extension_path)
driver = webdriver.Chrome(chrome_options=chrome_options)
Note:
- Minimize the number of extensions for better performance.
- Attempting to load all Chrome configurations via
user-data-dirmay cause crashes and is not recommended.
Additional Parameters
Chrome URL Commands
Enter these in the address bar:
about:version- Show versionabout:memory- Memory usageabout:plugins- Installed pluginsabout:histograms- Historyabout:dns- DNS statusabout:cache- Cached pagesabout:gpu- GPU hardware accelerationabout:flags- Experimental featureschrome://extensions/- Extensions list
Useful Command-Line Arguments
These can be passed via add_argument:
--user-data-dir=[PATH]- Specify user data directory--disk-cache-dir=[PATH]- Cache directory--disk-cache-size=N- Cache size in bytes--first-run- Reset to initial state--incognito- Incognito mode--disable-javascript- Disable JavaScript--user-agent="..."- Custom user agent--disable-plugins- Disable all plugins--start-maximized- Start maximized--no-sandbox- Disable sandbox (use with caution)--single-process- Single process--disable-popup-blocking- Disable popup blocker--disable-images- Disable images--lang=zh-CN- Set language--proxy-pac-url=URL- Use PAC proxy--enable-sync- Enable bookmark sync
Source: Adapted from CSDN blog