Automated Meme Battles: Scraping, Searching, and Sending Stickers with Python
Meme battles (斗图) demand instant replies with the right reaction image. We can automate the antire pipeline in three stages: collect a large sticker dataset from a public website, enable local fuzzy search by keyword, and integrate with a WeChat messaging interface to send images automatically.
Scraping Stickers from Doutula
The website http://www.doutula.com hosts thousands of stickers across many paginated gallery pages. Each page follows a simple structure, making it easy to parse with requests and a regex. The script below fetches images from a range of pages concurrently using ThreadPoolExecutor, extracts the image URL and caption, cleans the filename, and saves to a local doutula folder.
import requests
import re
import os
from concurrent.futures import ThreadPoolExecutor
def fetch_and_save(page):
base_url = 'http://www.doutula.com/photo/list/?page='
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
'Accept': 'text/html,application/xhtml+xml,*/*',
'Accept-Language': 'zh-CN,zh;q=0.9'
}
target = f"{base_url}{page}"
resp = requests.get(target, headers=headers, timeout=10)
content = resp.text
# Capture image URL and alt text simultaneously
pattern = re.compile(r'data-original="(.*?)".*?alt="(.*?)"', re.DOTALL)
matches = pattern.findall(content)
os.makedirs('doutula', exist_ok=True)
for img_url, alt_text in matches:
# Clean alt text to be a valid filename
safe_name = re.sub(r'[\\/:*?"<>|《》。?!.!&\#()()]', '', alt_text)
ext = img_url.split('.')[-1].split('?')[0] # handle possible query strings
file_path = os.path.join('doutula', f"{safe_name}.{ext}")
try:
# Download and save the image
img_data = requests.get(img_url, headers=headers, timeout=10).content
with open(file_path, 'wb') as f:
f.write(img_data)
print(f"Saved {file_path}")
except Exception as e:
print(f"Failed {img_url}: {e}")
if __name__ == '__main__':
pages = range(1, 51) # adjust range as needed
with ThreadPoolExecutor(max_workers=10) as pool:
pool.map(fetch_and_save, pages)
Local Fuzzy Search with Glob
After downloading thousands of stickers, we want to quick find images whose filenames contain a given keyword. The standard libray glob can do this using a wildcard pattern:
import glob
import os
keyword = "失望"
sticker_dir = os.path.join(os.getcwd(), "doutula")
pattern = os.path.join(sticker_dir, f"*{keyword}*.*")
for path in glob.glob(pattern):
print(path)
Alternatively, with pathlib:
from pathlib import Path
folder = Path('doutula')
for img in folder.glob(f'*{keyword}*.*'):
print(img)
Both approaches return the matching sticker paths, ready to be sent.
WeChat Automation with itchat
To participate in a meme battle, we use itchat to log into Web WeChat, listen for incoming text messages, and reply with up to three relevant stickers. The script matches the message text against local filenames and sends the first matches with a small delay for a natural feel.
import itchat
import glob
import time
import os
def find_stickers(keyword, limit=3):
directory = os.path.join(os.getcwd(), 'doutula')
pattern = os.path.join(directory, f'*{keyword}*.*')
results = []
for path in glob.glob(pattern):
results.append(path)
if len(results) >= limit:
break
return results
@itchat.msg_register(['TEXT'])
def reply_with_meme(msg):
kw = msg.text.strip()
if not kw:
return
stickers = find_stickers(kw)
if not stickers:
# fallback: send a random sticker or do nothing
pass
else:
for sticker in stickers:
msg.user.send_image(sticker)
time.sleep(0.3)
if __name__ == '__main__':
itchat.auto_login(hotReload=True)
itchat.run()
This setup enables fully automatic meme responses. The collection and search steps are independent, so you can rebuild the local library as needed, and the bot will always reply with matching (and sometimes hilarious) images.