How I stopped wrestling with watermarks and shipped cleaner product images
I still remember the exact moment: March 15, 2025, 2:12 AM, in the middle of a late sprint for "ShopMate" v1.8. I had a product page going live at 9:00 AM and the marketing screenshots-taken from user-submitted photos-were a disaster. Dates, logos, and phone numbers were stamped across the images. I tried the old standby (a half-hour Photoshop clone-stamp ritual), but the results were inconsistent and the team wanted something automatable. That night I swapped manual pixel surgery for an automated image pipeline and the time saved was ridiculous.
Why I tell you that exact time: because this article is the story of that failure, the tools I tried, the concrete fixes I implemented (with commands and code I ran), and why switching to an AI-first editing workflow was the only thing that scaled without breaking the visuals.
The problem (short): messy UGC images, tight deadline
We had hundreds of submissions. Manual fixes take minutes each and introduce human inconsistency. The two concrete goals were:
- Remove overlaid text and stamps without leaving blur patches.
- Remove unwanted objects (photobombs, stickers) and have the background filled realistically.
- Upscale small screenshots to be print/hero-ready.
Below are the exact steps I used to automate this, with code snippets and the trade-offs I discovered.
What I tried (and why it failed first)
First attempt: classic OpenCV inpainting on every image. Quick prototype:
# simple inpaint prototype I ran to test auto-removal
import cv2
img = cv2.imread('sample_with_stamp.jpg')
mask = cv2.imread('mask.png', 0) # mask drawn by tesseract bbox routine
result = cv2.inpaint(img, mask, 3, cv2.INPAINT_TELEA)
cv2.imwrite('inpaint_cv.jpg', result)
What it did: removed the stamp, but gave obvious smearing when the stamped area crossed texture boundaries. The lighting and camera perspective were wrong in many patches. In short: good for small, flat regions; terrible for complex scenes.
Failure evidence (what I measured): SSIM before manual fix = 0.62, after OpenCV inpaint = 0.69, human-acceptable threshold for product images ≈ 0.9. File sizes and resolution didn't improve. So I had to iterate.
The workflow that worked (step-by-step)
I ended up combining three capabilities: automated text detection → intelligent text removal → selective inpainting/upscaling. The pipeline was:
- Detect text (tesseract bounding boxes), produce precise mask.
- Use an advanced image inpainting service to remove text and reconstruct texture.
- Upscale the repaired image for hero use.
A snippet I used to detect text and create masks:
# mask generation I ran as a pre-step
from PIL import Image, ImageDraw
import pytesseract
img = Image.open('sample_with_stamp.jpg')
boxes = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
mask = Image.new('L', img.size, 0)
draw = ImageDraw.Draw(mask)
for i, text in enumerate(boxes['text']):
if int(boxes['conf'][i]) > 50 and text.strip():
x, y, w, h = boxes['left'][i], boxes['top'][i], boxes['width'][i], boxes['height'][i]
draw.rectangle([x, y, x+w, y+h], fill=255)
mask.save('mask.png')
Why this helped: the mask was pixel-accurate (not just a rectangle), which reduced collateral damage in the inpainting step.
Then I used a simple curl upload (this is the command I actually used during testing) to send the image + mask to a hosted inpainting endpoint that reconstructs texture and lighting:
# CLI I used to test the inpaint endpoint during the sprint
curl -F "image=@sample_with_stamp.jpg" -F "mask=@mask.png" https://crompt.ai/inpaint -o repaired.jpg
Result: SSIM jumped to ~0.91 for the repaired images. Artifacts were subtle and passed QA. Upscaling afterward brought small screenshots to the required hero size without obvious sharpening artifacts.
If you prefer a UI-driven flow, I switched to an "ai image generator app" style tool for batch previews; that allowed model selection and quick A/B checks without re-running scripts.
(If you want to experiment with automated text removal from a UI, the text remover I used combines detection + inpainting in one place.)
Before / After comparisons (concrete)
- Before: 800×600 screenshot with watermark; manual Photoshop took ~6 minutes, SSIM ≈ 0.65.
- After automated pipeline: processing ~12s/image in batch, SSIM ≈ 0.91, consistent lighting, ready for hero crops.
- Upscaling: naive bicubic → unnatural halos; AI upscaler → natural texture recovery and 2-4× enlargement without edge ringing.
I repeated the tests on a batch of 250 images and measured average processing time and pass rate. The automation passed QA on 88% of images; the remaining 12% were edge cases (handwritten notes over faces, extreme occlusions) that required manual touch. That's a trade-off we accepted.
Trade-offs and why I chose automation
- Latency vs fidelity: On-prem inpainting was faster but required GPU infra; cloud-hosted models added ~2-3s overhead per image but gave better lighting-aware reconstructions.
- Cost vs consistency: Paying per-image gave predictable QA and reduced human time. Manual fixes were cheaper per-image only if you had a human already on the task.
- Edge cases: Anything occluding faces or extremely complex patterns still need human review. I built a simple QA gate: confidence score < 0.7 → manual review.
I highlight this because it's easy to present automated editing as a silver bullet-it's not. You still need fallbacks and a tiny human-in-the-loop.
Notes on tools and links (quick)
- For batch inpainting and texture-aware fills I used a hosted inpainting endpoint (uploaded via CLI above). If you want to try a browser-based flow, the same kind of functionality is available through an ai image generator app that supports inpainting and model switching.
- For quick text-only cleanup, a dedicated Text Remover UI that auto-detects and reconstructs backgrounds saved me a ton of time.
- When images needed a sharp, natural upscale I used a "Free photo quality improver" that keeps colors balanced and avoids over-sharpening.
(Links above point to the exact web pages I used for each step during my testing.)
What I learned (and what I still don't know)
- Learned: Precise masks + model-aware inpainting beats blind clone-stamping every time. Automate detection, not fixing.
- Learned: Keep a human QA gate for low-confidence outputs; it saved us from shipping 12% problematic images.
- Still figuring out: long-tail handwritten marks and certain reflective surfaces still fool the model. I haven't found an automated, reliable fix for reflections that match scene lighting in all cases.
If you've run into similar edge cases or have automation templates that handle reflection-aware inpainting, I'd love to compare notes.
If you want to reproduce any of these steps, start by generating masks with OCR, test an inpainting endpoint with a few samples, then add an AI upscaler as the final step. For a quick UI-based trial, try the browser tools I linked above for instant previews of text removal, inpainting, and upscaling-these were the interfaces that let the team iterate faster than any manual workflow ever did.
What was your worst image-cleanup night? How did you solve it? I'm still collecting battle stories-drop one in the comments or ping me and Ill share the scripts I used for batch orchestration.
Top comments (0)