How Selenium WebDriver Communicates with Browsers
Selenium is a widely-used web automation framework that drives browsers by mimicking real user interactions. Understanding its internal communication mechanism helps developers debug issues and build more efficient test frameworks.
Architecture Overview
Selenium's architecture follows a classic client-server model:
- Client (Test Script): The code written by the tester that issues automation commands
- Browser Driver: A standalone server process specific to each browser (chromedriver, geckodriver, etc.)
- Browser: The actual browser instance being controlled
The interaction flow works as follows:
- test script creates an HTTP request targeting the browser driver's endpoint
- The browser driver acts as an HTTP server, receiving and translating these requests
- The driver manipulates the browser through its native APIs
- Execution results travel back through the same chain in reverse order
Communication Protocols
HTTP Protocol
HTTP serves as the foundational transport layer. WebDriver uses a client-server architecture where test script acts as the client and the browser driver functions as the server. Every command is sent as an HTTP request, and responses return as JSON payloads.
JSON Wire Protocol
Built atop HTTP, JSON Wire Protocol standardizes the request and response body formats. Commands like findElement and click map to specific HTTP endpoints with well-defined request/response structures.
Common HTTP methods used:
- GET: Retrieves information from the browser (page title, current URL)
- POST: Sends commands to perform actions (element location, clicking)
- DELETE: Terminates sessions or closes windows
Status Codes
WebDriver uses its own set of status codes beyond standard HTTP codes:
7: Element not found11: Element not visible0: Success
Source Code Deep Dive
Driver Initialization
When instantiating a Chrome WebDriver, the process follows a specific sequence:
# selenium/webdriver/chrome/webdriver.py
class WebDriver(RemoteWebDriver):
def __init__(self, executable_path="chromedriver", port=0,
options=None, service_args=None,
desired_capabilities=None, service_log_path=None,
chrome_options=None, keep_alive=True):
# Merge capabilities from options and desired_capabilities
if options is None:
if desired_capabilities is None:
desired_capabilities = self.create_options().to_capabilities()
else:
if desired_capabilities is None:
desired_capabilities = options.to_capabilities()
else:
desired_capabilities.update(options.to_capabilities())
# Start the driver service
self.service = Service(
executable_path,
port=port,
service_args=service_args,
log_path=service_log_path)
self.service.start()
# Initialize the remote connection
RemoteWebDriver.__init__(
self,
command_executor=ChromeRemoteConnection(
remote_server_addr=self.service.service_url,
keep_alive=keep_alive),
desired_capabilities=desired_capabilities)
Service Startup
The Service class launches the browser driver executable as a subprocess:
# selenium/webdriver/common/service.py
def start(self):
try:
cmd = [self.path]
cmd.extend(self.command_line_args())
self.process = subprocess.Popen(
cmd,
env=self.env,
close_fds=platform.system() != 'Windows',
stdout=self.log_file,
stderr=self.log_file,
stdin=PIPE)
except OSError as err:
if err.errno == errno.ENOENT:
raise WebDriverException(
"'%s' executable needs to be in PATH." % os.path.basename(self.path))
# Wait for the service to become available
while True:
self.assert_process_still_running()
if self.is_connectable():
break
time.sleep(1)
The driver executable (chromedriver) runs as a separate process listening on a specific port (typically 9515 for ChromeDriver).
Session Establishment
The parent RemoteWebDriver class handles session creation:
# selenium/webdriver/remote/webdriver.py
def start_session(self, capabilities, browser_profile=None):
w3c_caps = _make_w3c_caps(capabilities)
parameters = {
"capabilities": w3c_caps,
"desiredCapabilities": capabilities
}
# POST to /session endpoint creates a new browser session
response = self.execute(Command.NEW_SESSION, parameters)
self.session_id = response['sessionId']
self.capabilities = response.get('value')
This sends a POST request to http://localhost:9515/session with JSON payload containing browser capabilities. The response includes a sessionId used for all subsequent requests.
Command Execution
All browser interactions flow through a unified execute method:
# selenium/webdriver/remote/remote_connection.py
def execute(self, command, params):
command_info = self._commands[command]
path = string.Template(command_info[1]).substitute(params)
data = utils.dump_json(params)
url = '%s%s' % (self._url, path)
return self._request(command_info[0], url, body=data)
def _request(self, method, url, body=None):
LOGGER.debug('%s %s %s' % (method, url, body))
parsed_url = parse.urlparse(url)
headers = self.get_remote_connection_headers(parsed_url, self.keep_alive)
if self.keep_alive:
resp = self._conn.request(method, url, body=body, headers=headers)
else:
http = urllib3.PoolManager(timeout=self._timeout)
resp = http.request(method, url, body=body, headers=headers)
data = resp.data.decode('UTF-8')
return utils.load_json(data.strip())
API Endpoint Mapping
The _commands dictionary maps high-level Selenium commands to HTTP endpoints:
| Command | Method | Endpoint |
|---|---|---|
| NEW_SESSION | POST | /session |
| GET | POST | /session/$sessionId/url |
| FIND_ELEMENT | POST | /session/$sessionId/element |
| CLICK_ELEMENT | POST | /session/$sessionId/element/$id/click |
| GET_TITLE | GET | /session/$sessionId/title |
| QUIT | DELETE | /session/$sessionId |
Manual HTTP Requests
Understanding the underlying protocol enables direct HTTP interaction with the driver.
Creating a Session
import requests
import json
endpoint = 'http://localhost:9515/session'
payload = {
"capabilities": {
"alwaysMatch": {
"browserName": "chrome"
},
"firstMatch": [
{}
]
},
"desiredCapabilities": {
"platform": "ANY",
"browserName": "chrome",
"version": "",
"chromeOptions": {
"args": [],
"extensions": []
}
}
}
response = requests.post(endpoint, json=payload).json()
session_id = response['sessionId']
print(f"Session created: {session_id}")
Response structure:
{
"sessionId": "44fdb7b1b048a76c0f625545b0d2567b",
"status": 0,
"value": {
"browserName": "chrome",
"platform": "Mac OS X",
"javascriptEnabled": true
}
}
Navigating to a URL
nav_endpoint = f'http://localhost:9515/session/{session_id}/url'
nav_payload = {
"url": "https://www.example.com"
}
requests.post(nav_endpoint, json=nav_payload)
Locating Elements
find_endpoint = f'http://localhost:9515/session/{session_id}/element'
find_payload = {
"using": "css selector",
"value": "#main-content"
}
element_response = requests.post(find_endpoint, json=find_payload).json()
element_id = element_response['value']['ELEMENT']
Clicking Elements
click_endpoint = f'http://localhost:9515/session/{session_id}/element/{element_id}/click'
requests.post(click_endpoint, json={"id": element_id})
Closing the Session
requests.delete(f'http://localhost:9515/session/{session_id}')
Complete Low-Level Example
import requests
import time
# Launch browser
config = {
"capabilities": {
"alwaysMatch": {"browserName": "chrome"},
"firstMatch": [{}]
},
"desiredCapabilities": {
"browserName": "chrome",
"platform": "ANY"
}
}
res = requests.post('http://127.0.0.1:9515/session', json=config).json()
sid = res['sessionId']
# Navigate
requests.post(
f'http://127.0.0.1:9515/session/{sid}/url',
json={"url": "https://www.google.com"}
)
time.sleep(2)
# Tear down
requests.delete(f'http://127.0.0.1:9515/session/{sid}')
This demonstrates that UI automation is fundamentally HTTP-based API interaction—Selenium simply provides a convenient high-level abstraction over these low-level protocols.
Key Takeaways
- WebDriver operates over HTTP, treating the browser driver as a web service
- Every action (navigation, element location, clicks) maps to a specific REST endpoint
- The
sessionIdmaintains state across all interactions within a single browser instance - Understanding this architecture enables debugging, proxying, and building custom automation tools