Home > Tech > Content

Automated Meme Battles: Scraping, Searching, and Sending Stickers with Python

Tech May 9 14

Meme battles (斗图) demand instant replies with the right reaction image. We can automate the antire pipeline in three stages: collect a large sticker dataset from a public website, enable local fuzzy search by keyword, and integrate with a WeChat messaging interface to send images automatically.

Scraping Stickers from Doutula

The website http://www.doutula.com hosts thousands of stickers across many paginated gallery pages. Each page follows a simple structure, making it easy to parse with requests and a regex. The script below fetches images from a range of pages concurrently using ThreadPoolExecutor, extracts the image URL and caption, cleans the filename, and saves to a local doutula folder.

import requests
import re
import os
from concurrent.futures import ThreadPoolExecutor

def fetch_and_save(page):
    base_url = 'http://www.doutula.com/photo/list/?page='
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
        'Accept': 'text/html,application/xhtml+xml,*/*',
        'Accept-Language': 'zh-CN,zh;q=0.9'
    }
    target = f"{base_url}{page}"
    resp = requests.get(target, headers=headers, timeout=10)
    content = resp.text

    # Capture image URL and alt text simultaneously
    pattern = re.compile(r'data-original="(.*?)".*?alt="(.*?)"', re.DOTALL)
    matches = pattern.findall(content)

    os.makedirs('doutula', exist_ok=True)
    for img_url, alt_text in matches:
        # Clean alt text to be a valid filename
        safe_name = re.sub(r'[\\/:*?"<>|《》。？！.!&\#（）()]', '', alt_text)
        ext = img_url.split('.')[-1].split('?')[0]  # handle possible query strings
        file_path = os.path.join('doutula', f"{safe_name}.{ext}")
        try:
            # Download and save the image
            img_data = requests.get(img_url, headers=headers, timeout=10).content
            with open(file_path, 'wb') as f:
                f.write(img_data)
            print(f"Saved {file_path}")
        except Exception as e:
            print(f"Failed {img_url}: {e}")

if __name__ == '__main__':
    pages = range(1, 51)    # adjust range as needed
    with ThreadPoolExecutor(max_workers=10) as pool:
        pool.map(fetch_and_save, pages)

Local Fuzzy Search with Glob

After downloading thousands of stickers, we want to quick find images whose filenames contain a given keyword. The standard libray glob can do this using a wildcard pattern:

import glob
import os

keyword = "失望"
sticker_dir = os.path.join(os.getcwd(), "doutula")
pattern = os.path.join(sticker_dir, f"*{keyword}*.*")
for path in glob.glob(pattern):
    print(path)

Alternatively, with pathlib:

from pathlib import Path

folder = Path('doutula')
for img in folder.glob(f'*{keyword}*.*'):
    print(img)

Both approaches return the matching sticker paths, ready to be sent.

WeChat Automation with itchat

To participate in a meme battle, we use itchat to log into Web WeChat, listen for incoming text messages, and reply with up to three relevant stickers. The script matches the message text against local filenames and sends the first matches with a small delay for a natural feel.

import itchat
import glob
import time
import os

def find_stickers(keyword, limit=3):
    directory = os.path.join(os.getcwd(), 'doutula')
    pattern = os.path.join(directory, f'*{keyword}*.*')
    results = []
    for path in glob.glob(pattern):
        results.append(path)
        if len(results) >= limit:
            break
    return results

@itchat.msg_register(['TEXT'])
def reply_with_meme(msg):
    kw = msg.text.strip()
    if not kw:
        return
    stickers = find_stickers(kw)
    if not stickers:
        # fallback: send a random sticker or do nothing
        pass
    else:
        for sticker in stickers:
            msg.user.send_image(sticker)
            time.sleep(0.3)

if __name__ == '__main__':
    itchat.auto_login(hotReload=True)
    itchat.run()

This setup enables fully automatic meme responses. The collection and search steps are independent, so you can rebuild the local library as needed, and the bot will always reply with matching (and sometimes hilarious) images.

Tags: Python

Back to List

Prev: Understanding Static and Dynamic Dispatch in Java Method Invocation

Next: Configuring Dual-Aggregation Dual-Core Network with MSTP and VRRP

Fading Coder

Automated Meme Battles: Scraping, Searching, and Sending Stickers with Python

Scraping Stickers from Doutula

Local Fuzzy Search with Glob

WeChat Automation with itchat

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor