Home > Tech > Content

Automating Captcha Entry in Selenium with Beginner-Friendly Tesseract OCR

Tech May 16 14

Tesseract OCR Initial Setup & Basic Script

from PIL import Image
import pytesseract

# Tesseract segmentation modes simplified reference (common use cases highlighted)
# 1 = Auto page + script/layout detection (great for most simple captchas)
# 7 = Treat image strictly as a single text line
# 10 = Single character extraction only

def extract_simple_captcha(psm_mode=1, img_path="temp_captcha.png"):
    # Update path to match your Tesseract installation directory
    pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
    # Convert input to 8-bit grayscale to reduce noise interference
    grayscale_img = Image.open(img_path).convert("L")
    # Extract text with specified configuration
    detected_text = pytesseract.image_to_string(grayscale_img, config=f"--psm {psm_mode}").strip()
    return detected_text


if __name__ == "__main__":
    result = extract_simple_captcha()
    print(f"Detected captcha value: {result}")

Tesseract OCR is a pre-built open-source optical character recognitoin engine that Python can interface with via the pytesseract library. Installation involves two steps: first, downloading the Tesseract executable from its official repository or trusted software hubs, and second, specifying the full file path to tesseract.exe in your code if it isn’t added to your system’s PATH environment variable.

For most straightforward captchas, mode 1 (auto segmentation with layout detection) works reliably without needing specialized image preprocessing.

Selenium Integration for Dynamic Captcha Handling

import time
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException

# Assume `driver` is a pre-configured Chrome WebDriver instance, and `attempt_login` is your custom login trigger

def capture_and_submit_captcha():
    time.sleep(0.8)
    try:
        # Locate captcha image element
        captcha_elem = driver.find_element(By.XPATH, "//img[contains(@id, 'verifyImg')]")
        # Capture only the captcha element, avoid full-page screenshots that waste time/space
        captcha_elem.screenshot("temp_captcha.png")
        time.sleep(0.5)
        # Get cleaned OCR result
        captcha_value = extract_simple_captcha()
        # Locate input field and send text
        captcha_input = driver.find_element(By.XPATH, "//input[@name='captchaInput']")
        captcha_input.clear()
        captcha_input.send_keys(captcha_value)
        time.sleep(1.2)
        # Verify login success/failure
        if check_verify_error():
            driver.refresh()
            time.sleep(1.5)
            attempt_login()
    except NoSuchElementException:
        print("Captcha or input field not found; retrying page load...")
        driver.refresh()
        time.sleep(2)
        attempt_login()

Dynamic captchas (those that refresh their image source on each load or inspection) cannot be reliably fetched with direct URL downloads. Instead, use Selenium’s built-in element-specific screenshot method to capture the exact visible captcha displayed to the user during the automation session.

If OCR returns a empty or invalid string, or if a verfiication error appears after submission, trigger a page refresh or captcha refresh and restart the login attempt.

Back to List

Prev: Java Class and Object Concepts with File I/O Implementation

Next: Deploying the Trove Database Service on OpenStack Train

Fading Coder

Automating Captcha Entry in Selenium with Beginner-Friendly Tesseract OCR

Tesseract OCR Initial Setup & Basic Script

Selenium Integration for Dynamic Captcha Handling

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Automating Captcha Entry in Selenium with Beginner-Friendly Tesseract OCR

Tesseract OCR Initial Setup & Basic Script

Selenium Integration for Dynamic Captcha Handling

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment