Automating Captcha Entry in Selenium with Beginner-Friendly Tesseract OCR
Tesseract OCR Initial Setup & Basic Script
from PIL import Image
import pytesseract
# Tesseract segmentation modes simplified reference (common use cases highlighted)
# 1 = Auto page + script/layout detection (great for most simple captchas)
# 7 = Treat image strictly as a single text line
# 10 = Single character extraction only
def extract_simple_captcha(psm_mode=1, img_path="temp_captcha.png"):
# Update path to match your Tesseract installation directory
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Convert input to 8-bit grayscale to reduce noise interference
grayscale_img = Image.open(img_path).convert("L")
# Extract text with specified configuration
detected_text = pytesseract.image_to_string(grayscale_img, config=f"--psm {psm_mode}").strip()
return detected_text
if __name__ == "__main__":
result = extract_simple_captcha()
print(f"Detected captcha value: {result}")
Tesseract OCR is a pre-built open-source optical character recognitoin engine that Python can interface with via the pytesseract library. Installation involves two steps: first, downloading the Tesseract executable from its official repository or trusted software hubs, and second, specifying the full file path to tesseract.exe in your code if it isn’t added to your system’s PATH environment variable.
For most straightforward captchas, mode 1 (auto segmentation with layout detection) works reliably without needing specialized image preprocessing.
Selenium Integration for Dynamic Captcha Handling
import time
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
# Assume `driver` is a pre-configured Chrome WebDriver instance, and `attempt_login` is your custom login trigger
def capture_and_submit_captcha():
time.sleep(0.8)
try:
# Locate captcha image element
captcha_elem = driver.find_element(By.XPATH, "//img[contains(@id, 'verifyImg')]")
# Capture only the captcha element, avoid full-page screenshots that waste time/space
captcha_elem.screenshot("temp_captcha.png")
time.sleep(0.5)
# Get cleaned OCR result
captcha_value = extract_simple_captcha()
# Locate input field and send text
captcha_input = driver.find_element(By.XPATH, "//input[@name='captchaInput']")
captcha_input.clear()
captcha_input.send_keys(captcha_value)
time.sleep(1.2)
# Verify login success/failure
if check_verify_error():
driver.refresh()
time.sleep(1.5)
attempt_login()
except NoSuchElementException:
print("Captcha or input field not found; retrying page load...")
driver.refresh()
time.sleep(2)
attempt_login()
Dynamic captchas (those that refresh their image source on each load or inspection) cannot be reliably fetched with direct URL downloads. Instead, use Selenium’s built-in element-specific screenshot method to capture the exact visible captcha displayed to the user during the automation session.
If OCR returns a empty or invalid string, or if a verfiication error appears after submission, trigger a page refresh or captcha refresh and restart the login attempt.