All tests run on an 8-year-old MacBook Air.
This is Part 3 of my series on training a card game AI with Google Colab.
Part 1: Google Colab basics
Part 2: Reinforcement learning + 200,000 episodes
In Part 2, I got the RL side working. The next challenge: feeding real game state into the model automatically. That meant recognizing cards from screenshots. I spent about two weeks on it. It didn't work. Here's exactly what happened.
The Goal
The pipeline I was aiming for:
Card image (JPG)
↓ OpenCV
Card name recognized
↓ Match against TOML
Effect + cost retrieved
↓ RL environment
Board evaluated (good move / bad move)
Simple in theory. Painful in practice.
What OpenCV Was Supposed to Do
The plan was template matching — compare a screenshot crop against ~300 master card images from GitHub, find the closest match.
img = cv2.imread("card.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(...)
result = cv2.matchTemplate(input_card, template, cv2.TM_CCOEFF_NORMED)
Straightforward enough. Except it kept failing.
The 3 Mistakes I Kept Making
After 6+ hours of debugging, I found three recurring issues:
Mistake 1: Not normalizing image size
The master images and screenshot crops weren't the same dimensions. Even a few pixels difference tanks template matching accuracy.
Mistake 2: Threshold set too strict
if score > 0.99: # This almost never triggers
Even visually identical images often scored below 0.95 due to minor rendering differences. 3px of misalignment was enough to break it.
Mistake 3: Comparing in color
Color variation between the master JPG and the screenshot was enough to cause mismatches. Always convert to grayscale first.
I Tried Everything
- ✅ Template matching → poor accuracy
- ✅ ORB feature matching → still failed on decorative card fonts
- ✅ Threw images at Gemini Vision → "I don't recognize this card"
- ✅ Threw images at Claude Opus → same answer
- ✅ Spent 2 days running experiments on Colab
The core problem: these are original game cards that no AI model has been trained on. And the visual difference between master data and a smartphone screenshot — even cropped carefully — was just enough to break every approach I tried.
What I Actually Learned
Giving up was the right call — but I learned something useful in the process.
If you have a decent machine, this problem is actually pretty solvable: just continuously capture the screen and process frames in real time. No template matching needed.
The reason I struggled wasn't the approach itself — it was the hardware constraint. I was trying to push an 8-year-old MacBook Air to its absolute limit, minimizing every layer of the stack to see how far I could get.
Turns out, some problems genuinely need more horsepower. That's a valid finding too.
Closing
Two weeks, multiple approaches, zero working card recognition. But I understand exactly why it failed — and what it would take to make it work.
If you're attempting something similar: normalize your image sizes, go grayscale early, and don't set your threshold above 0.95. And if your cards are from an obscure offline game, don't expect Vision AI to save you 😅
Part 1: Google Colab basics
Part 2: Reinforcement learning + 200,000 episodes
Top comments (0)