Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Automating Captcha Entry in Selenium with Beginner-Friendly Tesseract OCR

Tech May 16 1

Tesseract OCR Initial Setup & Basic Script

from PIL import Image
import pytesseract

# Tesseract segmentation modes simplified reference (common use cases highlighted)
# 1 = Auto page + script/layout detection (great for most simple captchas)
# 7 = Treat image strictly as a single text line
# 10 = Single character extraction only

def extract_simple_captcha(psm_mode=1, img_path="temp_captcha.png"):
    # Update path to match your Tesseract installation directory
    pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
    # Convert input to 8-bit grayscale to reduce noise interference
    grayscale_img = Image.open(img_path).convert("L")
    # Extract text with specified configuration
    detected_text = pytesseract.image_to_string(grayscale_img, config=f"--psm {psm_mode}").strip()
    return detected_text


if __name__ == "__main__":
    result = extract_simple_captcha()
    print(f"Detected captcha value: {result}")

Tesseract OCR is a pre-built open-source optical character recognitoin engine that Python can interface with via the pytesseract library. Installation involves two steps: first, downloading the Tesseract executable from its official repository or trusted software hubs, and second, specifying the full file path to tesseract.exe in your code if it isn’t added to your system’s PATH environment variable.

For most straightforward captchas, mode 1 (auto segmentation with layout detection) works reliably without needing specialized image preprocessing.

Selenium Integration for Dynamic Captcha Handling

import time
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException

# Assume `driver` is a pre-configured Chrome WebDriver instance, and `attempt_login` is your custom login trigger

def capture_and_submit_captcha():
    time.sleep(0.8)
    try:
        # Locate captcha image element
        captcha_elem = driver.find_element(By.XPATH, "//img[contains(@id, 'verifyImg')]")
        # Capture only the captcha element, avoid full-page screenshots that waste time/space
        captcha_elem.screenshot("temp_captcha.png")
        time.sleep(0.5)
        # Get cleaned OCR result
        captcha_value = extract_simple_captcha()
        # Locate input field and send text
        captcha_input = driver.find_element(By.XPATH, "//input[@name='captchaInput']")
        captcha_input.clear()
        captcha_input.send_keys(captcha_value)
        time.sleep(1.2)
        # Verify login success/failure
        if check_verify_error():
            driver.refresh()
            time.sleep(1.5)
            attempt_login()
    except NoSuchElementException:
        print("Captcha or input field not found; retrying page load...")
        driver.refresh()
        time.sleep(2)
        attempt_login()

Dynamic captchas (those that refresh their image source on each load or inspection) cannot be reliably fetched with direct URL downloads. Instead, use Selenium’s built-in element-specific screenshot method to capture the exact visible captcha displayed to the user during the automation session.

If OCR returns a empty or invalid string, or if a verfiication error appears after submission, trigger a page refresh or captcha refresh and restart the login attempt.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.