DEV Community

Anton Zaharenko
Anton Zaharenko

Posted on

I built a Chrome extension

I built a Chrome extension that extracts code from tutorial
videos using Claude's Vision API. Wrote up what I learned:

Key findings:

  1. Vision models handle code extraction surprisingly well better than OCR for anything syntax-sensitive

  2. Prompt engineering matters more than model choice.
    "Extract the code" vs "identify all code blocks, preserve
    indentation, differentiate from terminal output" huge
    difference in quality

  3. Latency: ~2-3 seconds for a 1080p frame. Acceptable for
    this use case, too slow for real-time

  4. Edge case: slides with code + screenshots of code —
    model correctly identifies both as "code" even with
    different visual styling

Full writeup with prompt examples: [link to dev.to article]

The extension itself is free if anyone wants to try the
output quality: codera.click

Top comments (0)