Home > Tech > Content

How Selenium WebDriver Communicates with Browsers

Tech May 8 4

Selenium is a widely-used web automation framework that drives browsers by mimicking real user interactions. Understanding its internal communication mechanism helps developers debug issues and build more efficient test frameworks.

Architecture Overview

Selenium's architecture follows a classic client-server model:

Client (Test Script): The code written by the tester that issues automation commands
Browser Driver: A standalone server process specific to each browser (chromedriver, geckodriver, etc.)
Browser: The actual browser instance being controlled

The interaction flow works as follows:

test script creates an HTTP request targeting the browser driver's endpoint
The browser driver acts as an HTTP server, receiving and translating these requests
The driver manipulates the browser through its native APIs
Execution results travel back through the same chain in reverse order

Communication Protocols

HTTP Protocol

HTTP serves as the foundational transport layer. WebDriver uses a client-server architecture where test script acts as the client and the browser driver functions as the server. Every command is sent as an HTTP request, and responses return as JSON payloads.

JSON Wire Protocol

Built atop HTTP, JSON Wire Protocol standardizes the request and response body formats. Commands like findElement and click map to specific HTTP endpoints with well-defined request/response structures.

Common HTTP methods used:

GET: Retrieves information from the browser (page title, current URL)
POST: Sends commands to perform actions (element location, clicking)
DELETE: Terminates sessions or closes windows

Status Codes

WebDriver uses its own set of status codes beyond standard HTTP codes:

7: Element not found
11: Element not visible
0: Success

Source Code Deep Dive

Driver Initialization

When instantiating a Chrome WebDriver, the process follows a specific sequence:

# selenium/webdriver/chrome/webdriver.py
class WebDriver(RemoteWebDriver):
    def __init__(self, executable_path="chromedriver", port=0,
                 options=None, service_args=None,
                 desired_capabilities=None, service_log_path=None,
                 chrome_options=None, keep_alive=True):
        
        # Merge capabilities from options and desired_capabilities
        if options is None:
            if desired_capabilities is None:
                desired_capabilities = self.create_options().to_capabilities()
        else:
            if desired_capabilities is None:
                desired_capabilities = options.to_capabilities()
            else:
                desired_capabilities.update(options.to_capabilities())

        # Start the driver service
        self.service = Service(
            executable_path,
            port=port,
            service_args=service_args,
            log_path=service_log_path)
        self.service.start()

        # Initialize the remote connection
        RemoteWebDriver.__init__(
            self,
            command_executor=ChromeRemoteConnection(
                remote_server_addr=self.service.service_url,
                keep_alive=keep_alive),
            desired_capabilities=desired_capabilities)

Service Startup

The Service class launches the browser driver executable as a subprocess:

# selenium/webdriver/common/service.py
def start(self):
    try:
        cmd = [self.path]
        cmd.extend(self.command_line_args())
        self.process = subprocess.Popen(
            cmd,
            env=self.env,
            close_fds=platform.system() != 'Windows',
            stdout=self.log_file,
            stderr=self.log_file,
            stdin=PIPE)
    except OSError as err:
        if err.errno == errno.ENOENT:
            raise WebDriverException(
                "'%s' executable needs to be in PATH." % os.path.basename(self.path))
    
    # Wait for the service to become available
    while True:
        self.assert_process_still_running()
        if self.is_connectable():
            break
        time.sleep(1)

The driver executable (chromedriver) runs as a separate process listening on a specific port (typically 9515 for ChromeDriver).

Session Establishment

The parent RemoteWebDriver class handles session creation:

# selenium/webdriver/remote/webdriver.py
def start_session(self, capabilities, browser_profile=None):
    w3c_caps = _make_w3c_caps(capabilities)
    parameters = {
        "capabilities": w3c_caps,
        "desiredCapabilities": capabilities
    }
    
    # POST to /session endpoint creates a new browser session
    response = self.execute(Command.NEW_SESSION, parameters)
    
    self.session_id = response['sessionId']
    self.capabilities = response.get('value')

This sends a POST request to http://localhost:9515/session with JSON payload containing browser capabilities. The response includes a sessionId used for all subsequent requests.

Command Execution

All browser interactions flow through a unified execute method:

# selenium/webdriver/remote/remote_connection.py
def execute(self, command, params):
    command_info = self._commands[command]
    path = string.Template(command_info[1]).substitute(params)
    data = utils.dump_json(params)
    url = '%s%s' % (self._url, path)
    return self._request(command_info[0], url, body=data)


def _request(self, method, url, body=None):
    LOGGER.debug('%s %s %s' % (method, url, body))
    
    parsed_url = parse.urlparse(url)
    headers = self.get_remote_connection_headers(parsed_url, self.keep_alive)
    
    if self.keep_alive:
        resp = self._conn.request(method, url, body=body, headers=headers)
    else:
        http = urllib3.PoolManager(timeout=self._timeout)
        resp = http.request(method, url, body=body, headers=headers)
    
    data = resp.data.decode('UTF-8')
    return utils.load_json(data.strip())

API Endpoint Mapping

The _commands dictionary maps high-level Selenium commands to HTTP endpoints:

Command	Method	Endpoint
NEW_SESSION	POST	`/session`
GET	POST	`/session/$sessionId/url`
FIND_ELEMENT	POST	`/session/$sessionId/element`
CLICK_ELEMENT	POST	`/session/$sessionId/element/$id/click`
GET_TITLE	GET	`/session/$sessionId/title`
QUIT	DELETE	`/session/$sessionId`

Manual HTTP Requests

Understanding the underlying protocol enables direct HTTP interaction with the driver.

Creating a Session

import requests
import json

endpoint = 'http://localhost:9515/session'
payload = {
    "capabilities": {
        "alwaysMatch": {
            "browserName": "chrome"
        },
        "firstMatch": [
            {}
        ]
    },
    "desiredCapabilities": {
        "platform": "ANY",
        "browserName": "chrome",
        "version": "",
        "chromeOptions": {
            "args": [],
            "extensions": []
        }
    }
}

response = requests.post(endpoint, json=payload).json()
session_id = response['sessionId']
print(f"Session created: {session_id}")

Response structure:

{
  "sessionId": "44fdb7b1b048a76c0f625545b0d2567b",
  "status": 0,
  "value": {
    "browserName": "chrome",
    "platform": "Mac OS X",
    "javascriptEnabled": true
  }
}

Navigating to a URL

nav_endpoint = f'http://localhost:9515/session/{session_id}/url'
nav_payload = {
    "url": "https://www.example.com"
}
requests.post(nav_endpoint, json=nav_payload)

Locating Elements

find_endpoint = f'http://localhost:9515/session/{session_id}/element'
find_payload = {
    "using": "css selector",
    "value": "#main-content"
}
element_response = requests.post(find_endpoint, json=find_payload).json()
element_id = element_response['value']['ELEMENT']

Clicking Elements

click_endpoint = f'http://localhost:9515/session/{session_id}/element/{element_id}/click'
requests.post(click_endpoint, json={"id": element_id})

Closing the Session

requests.delete(f'http://localhost:9515/session/{session_id}')

Complete Low-Level Example

import requests
import time

# Launch browser
config = {
    "capabilities": {
        "alwaysMatch": {"browserName": "chrome"},
        "firstMatch": [{}]
    },
    "desiredCapabilities": {
        "browserName": "chrome",
        "platform": "ANY"
    }
}

res = requests.post('http://127.0.0.1:9515/session', json=config).json()
sid = res['sessionId']

# Navigate
requests.post(
    f'http://127.0.0.1:9515/session/{sid}/url',
    json={"url": "https://www.google.com"}
)

time.sleep(2)

# Tear down
requests.delete(f'http://127.0.0.1:9515/session/{sid}')

This demonstrates that UI automation is fundamentally HTTP-based API interaction—Selenium simply provides a convenient high-level abstraction over these low-level protocols.

Key Takeaways

WebDriver operates over HTTP, treating the browser driver as a web service
Every action (navigation, element location, clicks) maps to a specific REST endpoint
The sessionId maintains state across all interactions within a single browser instance
Understanding this architecture enables debugging, proxying, and building custom automation tools

Back to List

Prev: Node.js MySQL: Distinguishing Between createConnection and Connection Pools

Next: Generating Phase-Shifted GPIO Pulses Using STM32 Timer Interrupts

Fading Coder

How Selenium WebDriver Communicates with Browsers

Architecture Overview

Communication Protocols

HTTP Protocol

JSON Wire Protocol

Status Codes

Source Code Deep Dive

Driver Initialization

Service Startup

Session Establishment

Command Execution

API Endpoint Mapping

Manual HTTP Requests

Creating a Session

Navigating to a URL

Locating Elements

Clicking Elements

Closing the Session

Complete Low-Level Example

Key Takeaways

Related Articles

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

How Selenium WebDriver Communicates with Browsers

Architecture Overview

Communication Protocols

HTTP Protocol

JSON Wire Protocol

Status Codes

Source Code Deep Dive

Driver Initialization

Service Startup

Session Establishment

Command Execution

API Endpoint Mapping

Manual HTTP Requests

Creating a Session

Navigating to a URL

Locating Elements

Clicking Elements

Closing the Session

Complete Low-Level Example

Key Takeaways

Related Articles

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment