DEV Community: Michael Liu

Built a Chrome extension to right-click any social video and get an AI transcript

Michael Liu — Thu, 28 May 2026 11:24:05 +0000

I shipped a Chrome extension this week that I've been wanting for a while: right-click any social video on a page and get a clean transcript back, without leaving the tab.

Why I built this

I write content about short-form video and spend a lot of time pulling hooks, scripts, and outlines out of TikToks, Reels, and Shorts. The flow used to be:

Copy URL
Open another transcription tool
Paste, wait, download
Tab back to where I was

That's a lot of context switches for "I just want the words from this video." A right-click context menu felt like the right interface.

What it does

Voqusa Chrome Extension is the built version. You right-click any web page (or paste a video URL into the popup) and a transcript appears in 30–60 seconds.

Supports TikTok, YouTube Shorts, Instagram Reels, Facebook, Twitter/X, LinkedIn, Pinterest
Whisper-grade speech-to-text on the backend
Anonymous users get 3 free transcripts; sign in at voqusa.com to use the full free tier or pay-as-you-go credits
133 KiB install, no tracking pixels

Architecture

The extension itself is tiny — it just hands the video URL to the cloud transcription API. The constraints I cared about:

Privacy: the extension only sends the URL you explicitly submit. No DOM, no cookies, no browsing history.
Anonymous-first: local device ID stores transcripts so you can try it without signing in.
Fast: Whisper-grade model, downloaded audio only (not full video).

Use cases I'm using it for

Pulling hooks and outlines from competitor short-form videos
Turning lecture clips into citable text
Building swipe files of high-performing scripts
Reading along when audio isn't an option (open offices, late nights)

If you find yourself transcribing social videos often, give it a try and let me know what's missing.

Transcribing TikTok and short-form social videos: a quick comparison of approaches

Michael Liu — Sun, 10 May 2026 14:06:17 +0000

When I started analyzing viral content for a side project, I assumed transcription would be the easy part. It's not — at least not for short-form social video. Here's what I learned trying a few different approaches.

The problem with file-based tools

Most popular transcription tools (Otter, Descript, VideoTranscriber.ai, Whisper-based desktop apps) expect you to feed them an audio or video file. That's fine for podcasts, Zoom recordings, or YouTube long-form videos you've already downloaded. But for TikTok / Reels / Shorts you usually start with a public URL, and converting that into a file means:

Find or pay for a TikTok/IG/X video downloader
Wait for the download
Upload to the transcription tool
Wait again for the transcribe
Repeat for every single clip

For a 30-clip swipe file that's a real time sink.

URL-native transcription

The approach I ended up using is Voqusa — you paste the public URL of the video and it returns the transcript. Supports TikTok, YouTube, Instagram, Facebook, Twitter/X, LinkedIn, and Pinterest. Captions are free; speech-to-text is pay-as-you-go (no subscription) and failed transcripts cost zero credits, which is a nice detail when you're testing it on borderline-quality audio.

14 languages also helped me when I was looking at Spanish and Portuguese creators in the same niche.

When each fits

File-based tools (Descript, VideoTranscriber.ai, Otter): long-form, multi-speaker, podcasts, meetings, anything you already have on disk. Editor features matter most here.
URL-based tools (Voqusa): short-form social, viral analysis, content repurposing, quick research where you just need the text fast.

Not a strict either/or — I use both depending on the input I'm starting from.

Tradeoffs to be aware of

URL-based tools depend on the social platform's public access. If a creator's account is private, you'll need a downloader anyway.
For very low-volume use, captions-only mode (free on Voqusa) is enough. If you need diarization or punctuation cleanup, file-based editors are still ahead.

Mostly posting this so I stop getting DMs asking how I'm pulling 50+ TikTok transcripts a week without losing my mind.

OCR for handwriting and math: comparing tools in 2026

Michael Liu — Sun, 10 May 2026 07:48:50 +0000

If you've ever tried to OCR handwritten notes or math equations from a screenshot, you know the standard tools (Google Vision, Tesseract, AWS Textract) all hit a wall once you leave printed Latin text.

I spent some time benchmarking what's out there in 2026. Here's what's actually working.

What breaks in generic OCR

Handwriting — especially cursive in non-Latin scripts. Most OCRs were trained on printed text and treat ligatures as noise.
Math equations — generic OCR returns "x2 + y2 = 1" instead of x² + y² = 1 or LaTeX.
Tables — column structure flattens into a paragraph; you lose the relationships.
CJK — character recognition is OK; vertical-text and traditional-character handling are not.

Tools I tried

ScanRead.ai — free OCR for the gap cases

Built on PP-OCRv5 + PaddleOCR-VL (~2M params). Has a dedicated Math → LaTeX path that actually preserves multi-line derivations when there's clear bracketing, and CJK accuracy that's competitive with Vision/Textract on my test set. 22 specialized tools (handwriting, receipts, tables, etc.). Free tier 20 pages/day, Pro from $10/mo for batch + watermark-free export.

Google Cloud Vision API

Best general-purpose OCR for printed Latin text. Falls apart on handwriting and math structure. ~$1.50 / 1000 pages.

AWS Textract

Strongest on tables and forms in printed documents. Math support is essentially nonexistent. Pricier.

Mistral OCR (released earlier this year)

Strong on document layout. Less specialized routes than purpose-built tools.

Tesseract (open source)

Free, but 2026 use case is mostly "I need to OCR something offline". Quality on handwriting is poor.

Picking one

For most indie/dev use cases I'd lean on ScanRead for the free tier and CJK + math; Vision if you're processing printed English at volume; and Textract if you have heavy form-extraction needs.

What's your stack? Curious what people are using for handwriting specifically — that's still the hardest case for me.

OCR is back: replacing Tesseract with PP-OCRv5 in my document pipelines

Michael Liu — Fri, 08 May 2026 14:23:58 +0000

OCR is back: how I'm replacing Tesseract with PP-OCRv5 in my pipelines

I've been wrangling OCR pipelines for years — Tesseract for plain text, Google Vision when CJK comes up, AWS Textract for tables. Each has its own pain (Tesseract drops handwritten characters, Vision is pricey at scale, Textract's bbox layout is opinionated).

Recently I've been quietly piping a lot of work through ScanRead.ai instead. It's a free OCR tool built on PP-OCRv5 and the new PaddleOCR-VL model. Here's what changed for me.

What it actually does

Image → text in 100+ languages (including Arabic, Japanese, Chinese, Hindi, Thai)
22 specialized tools: image-to-text, PDF-to-Word, screenshot-to-text, handwriting recognition, math-to-LaTeX, receipt OCR
Outputs to .txt, .md, or .docx — Markdown export is great for pipelines into Notion or Obsidian
Free tier is generous: 20 pages/day, no signup
Pro is $10/mo for 3,000 pages with batch (up to 20 files at once)

Where it shined for me

Handwritten meeting notes. Tesseract gives me garbage on cursive. ScanRead reconstructed three pages of a colleague's whiteboard photos with maybe two errors per page. That's the difference between "useful" and "I'll just retype it."

CJK receipts. I had a folder of Japanese receipts to reconcile. PaddleOCR-VL handles vertical text and mixed kanji/kana way better than I expected — competitive with Google Vision in my spot-check, at zero cost.

Math → LaTeX. Pasting screenshots of equations from PDFs and getting back ( \LaTeX ) source is the kind of small thing that saves a real amount of time over a week.

Where it's weaker

Layout reconstruction for complex multi-column PDFs is okay but Textract is still better for forms with deep nested tables.
The free tier is rate-limited per day, not per minute — fine for humans, awkward for batch jobs.
No public API yet (as of writing); Pro batch UI is the workaround.

Why I'm sharing

If you're paying for Vision/Textract for occasional OCR, try the free tier first. If you do batch scans, the $10/mo Pro plan undercuts both. Link: https://scanread.ai

Curious if anyone else has switched off Tesseract for handwriting. What's your stack?

How I Turn TikTok Videos into Searchable Transcripts in Seconds (Free Tool)

Michael Liu — Wed, 06 May 2026 16:14:45 +0000

Why I needed transcripts

I spend a lot of time studying short-form video — TikTok hooks, YouTube Shorts, Instagram Reels — and the part I actually want is the script, not the video. Re-watching to copy down a 30-second hook is painful, and most "free transcript tools" hide behind a signup wall or only work on YouTube.

So I built Voqusa — paste a TikTok / YouTube / Instagram / Facebook / Twitter / LinkedIn / Pinterest URL, get the transcript instantly. No signup, no paywall on captions.

How it works

Paste the video URL.
Voqusa pulls the audio + any embedded captions.
AI speech-to-text fills in the rest (14 languages supported).
Copy the text and search/repurpose/study it.

A few things I made deliberate:

No account required for caption-based transcripts. You only spend a credit when the AI has to do speech-to-text from scratch.
Failed transcripts cost 0 credits. If we can't pull it, you don't pay.
Privacy: URLs and transcripts aren't kept after your session ends.

What I use it for

Reverse-engineering viral hooks (collect 50 transcripts, find patterns)
Building swipe files of proven video structures
Summarizing podcast clips into LinkedIn posts
Accessibility — adding text alternatives to video content

Try it

If you ever wanted "Ctrl+F for video," it's at voqusa.com. Captions are free; speech-to-text is pay-as-you-go (no subscription, credits valid 12 months). Curious if anyone has other use cases — drop them in the comments.