Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

How Selenium WebDriver Communicates with Browsers

Tech May 8 4

Selenium is a widely-used web automation framework that drives browsers by mimicking real user interactions. Understanding its internal communication mechanism helps developers debug issues and build more efficient test frameworks.

Architecture Overview

Selenium's architecture follows a classic client-server model:

  • Client (Test Script): The code written by the tester that issues automation commands
  • Browser Driver: A standalone server process specific to each browser (chromedriver, geckodriver, etc.)
  • Browser: The actual browser instance being controlled

The interaction flow works as follows:

  1. test script creates an HTTP request targeting the browser driver's endpoint
  2. The browser driver acts as an HTTP server, receiving and translating these requests
  3. The driver manipulates the browser through its native APIs
  4. Execution results travel back through the same chain in reverse order

Communication Protocols

HTTP Protocol

HTTP serves as the foundational transport layer. WebDriver uses a client-server architecture where test script acts as the client and the browser driver functions as the server. Every command is sent as an HTTP request, and responses return as JSON payloads.

JSON Wire Protocol

Built atop HTTP, JSON Wire Protocol standardizes the request and response body formats. Commands like findElement and click map to specific HTTP endpoints with well-defined request/response structures.

Common HTTP methods used:

  • GET: Retrieves information from the browser (page title, current URL)
  • POST: Sends commands to perform actions (element location, clicking)
  • DELETE: Terminates sessions or closes windows

Status Codes

WebDriver uses its own set of status codes beyond standard HTTP codes:

  • 7: Element not found
  • 11: Element not visible
  • 0: Success

Source Code Deep Dive

Driver Initialization

When instantiating a Chrome WebDriver, the process follows a specific sequence:

# selenium/webdriver/chrome/webdriver.py
class WebDriver(RemoteWebDriver):
    def __init__(self, executable_path="chromedriver", port=0,
                 options=None, service_args=None,
                 desired_capabilities=None, service_log_path=None,
                 chrome_options=None, keep_alive=True):
        
        # Merge capabilities from options and desired_capabilities
        if options is None:
            if desired_capabilities is None:
                desired_capabilities = self.create_options().to_capabilities()
        else:
            if desired_capabilities is None:
                desired_capabilities = options.to_capabilities()
            else:
                desired_capabilities.update(options.to_capabilities())

        # Start the driver service
        self.service = Service(
            executable_path,
            port=port,
            service_args=service_args,
            log_path=service_log_path)
        self.service.start()

        # Initialize the remote connection
        RemoteWebDriver.__init__(
            self,
            command_executor=ChromeRemoteConnection(
                remote_server_addr=self.service.service_url,
                keep_alive=keep_alive),
            desired_capabilities=desired_capabilities)

Service Startup

The Service class launches the browser driver executable as a subprocess:

# selenium/webdriver/common/service.py
def start(self):
    try:
        cmd = [self.path]
        cmd.extend(self.command_line_args())
        self.process = subprocess.Popen(
            cmd,
            env=self.env,
            close_fds=platform.system() != 'Windows',
            stdout=self.log_file,
            stderr=self.log_file,
            stdin=PIPE)
    except OSError as err:
        if err.errno == errno.ENOENT:
            raise WebDriverException(
                "'%s' executable needs to be in PATH." % os.path.basename(self.path))
    
    # Wait for the service to become available
    while True:
        self.assert_process_still_running()
        if self.is_connectable():
            break
        time.sleep(1)

The driver executable (chromedriver) runs as a separate process listening on a specific port (typically 9515 for ChromeDriver).

Session Establishment

The parent RemoteWebDriver class handles session creation:

# selenium/webdriver/remote/webdriver.py
def start_session(self, capabilities, browser_profile=None):
    w3c_caps = _make_w3c_caps(capabilities)
    parameters = {
        "capabilities": w3c_caps,
        "desiredCapabilities": capabilities
    }
    
    # POST to /session endpoint creates a new browser session
    response = self.execute(Command.NEW_SESSION, parameters)
    
    self.session_id = response['sessionId']
    self.capabilities = response.get('value')

This sends a POST request to http://localhost:9515/session with JSON payload containing browser capabilities. The response includes a sessionId used for all subsequent requests.

Command Execution

All browser interactions flow through a unified execute method:

# selenium/webdriver/remote/remote_connection.py
def execute(self, command, params):
    command_info = self._commands[command]
    path = string.Template(command_info[1]).substitute(params)
    data = utils.dump_json(params)
    url = '%s%s' % (self._url, path)
    return self._request(command_info[0], url, body=data)


def _request(self, method, url, body=None):
    LOGGER.debug('%s %s %s' % (method, url, body))
    
    parsed_url = parse.urlparse(url)
    headers = self.get_remote_connection_headers(parsed_url, self.keep_alive)
    
    if self.keep_alive:
        resp = self._conn.request(method, url, body=body, headers=headers)
    else:
        http = urllib3.PoolManager(timeout=self._timeout)
        resp = http.request(method, url, body=body, headers=headers)
    
    data = resp.data.decode('UTF-8')
    return utils.load_json(data.strip())

API Endpoint Mapping

The _commands dictionary maps high-level Selenium commands to HTTP endpoints:

Command Method Endpoint
NEW_SESSION POST /session
GET POST /session/$sessionId/url
FIND_ELEMENT POST /session/$sessionId/element
CLICK_ELEMENT POST /session/$sessionId/element/$id/click
GET_TITLE GET /session/$sessionId/title
QUIT DELETE /session/$sessionId

Manual HTTP Requests

Understanding the underlying protocol enables direct HTTP interaction with the driver.

Creating a Session

import requests
import json

endpoint = 'http://localhost:9515/session'
payload = {
    "capabilities": {
        "alwaysMatch": {
            "browserName": "chrome"
        },
        "firstMatch": [
            {}
        ]
    },
    "desiredCapabilities": {
        "platform": "ANY",
        "browserName": "chrome",
        "version": "",
        "chromeOptions": {
            "args": [],
            "extensions": []
        }
    }
}

response = requests.post(endpoint, json=payload).json()
session_id = response['sessionId']
print(f"Session created: {session_id}")

Response structure:

{
  "sessionId": "44fdb7b1b048a76c0f625545b0d2567b",
  "status": 0,
  "value": {
    "browserName": "chrome",
    "platform": "Mac OS X",
    "javascriptEnabled": true
  }
}

Navigating to a URL

nav_endpoint = f'http://localhost:9515/session/{session_id}/url'
nav_payload = {
    "url": "https://www.example.com"
}
requests.post(nav_endpoint, json=nav_payload)

Locating Elements

find_endpoint = f'http://localhost:9515/session/{session_id}/element'
find_payload = {
    "using": "css selector",
    "value": "#main-content"
}
element_response = requests.post(find_endpoint, json=find_payload).json()
element_id = element_response['value']['ELEMENT']

Clicking Elements

click_endpoint = f'http://localhost:9515/session/{session_id}/element/{element_id}/click'
requests.post(click_endpoint, json={"id": element_id})

Closing the Session

requests.delete(f'http://localhost:9515/session/{session_id}')

Complete Low-Level Example

import requests
import time

# Launch browser
config = {
    "capabilities": {
        "alwaysMatch": {"browserName": "chrome"},
        "firstMatch": [{}]
    },
    "desiredCapabilities": {
        "browserName": "chrome",
        "platform": "ANY"
    }
}

res = requests.post('http://127.0.0.1:9515/session', json=config).json()
sid = res['sessionId']

# Navigate
requests.post(
    f'http://127.0.0.1:9515/session/{sid}/url',
    json={"url": "https://www.google.com"}
)

time.sleep(2)

# Tear down
requests.delete(f'http://127.0.0.1:9515/session/{sid}')

This demonstrates that UI automation is fundamentally HTTP-based API interaction—Selenium simply provides a convenient high-level abstraction over these low-level protocols.

Key Takeaways

  • WebDriver operates over HTTP, treating the browser driver as a web service
  • Every action (navigation, element location, clicks) maps to a specific REST endpoint
  • The sessionId maintains state across all interactions within a single browser instance
  • Understanding this architecture enables debugging, proxying, and building custom automation tools

Related Articles

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.