DEV Community

James Halder
James Halder

Posted on

I Built "Personal Software" That Watches TikTok Videos To Automate Research

I wanted to run an analysis on influencer marketing posts on TikTok. It's easy to get a spreadsheet of view counts and hashtags from TikTok, but to understand why a product goes viral, you need to know things like what the creator's face looked like when they opened the box, whether their voice sounded genuinely surprised or professionally enthusiastic, and whether the lighting was ASMR-clean or chaotic-authentic. None of that is in the metadata.

There isn't really a tool I can buy to do this. But creating "personal software" is so much easier now with vibe coding, so I built a system that extracts all this data. I think this is going to catch on more and more as the technical skill and effort required keeps going down.


The Problem(s)

TikTok isn't a great fan of you extracting data. Its bot detection is pretty good, and the data that I actually wanted - product sentiment, visual aesthetic, creator authenticity signals - is inside video frames and audio.


Architecture

Here is what happens between a I trigger a scrape and a structured insight landing in the database.

A flow chart showing the steps in the process: Discovery Service -> Playwright Extractor -> Metadata Storage -> Media Pipeline -> yt-dlp: Download -> ffmpeg: Frames + Audio -> AI Enrichment Service -> Gemini Flash -> Structured Visual DNA -> Prisma / SQLite


Stage 1 - Data Extraction

The first decision was important: don't scrape the DOM.

TikTok's visible page elements are unstable. Class names rotate, element hierarchies shift, and any selector-based scraper is one frontend deploy away from producing garbage. The more robust target is __UNIVERSAL_DATA_FOR_REHYDRATION__ - a JSON blob embedded in a script tag that TikTok uses to hydrate its React frontend on load.

This object contains clean, structured metadata: view counts, creation timestamps, original audio flags, author details, and more. Because it's consumed by TikTok's own frontend code, it's significantly more stable than the rendered output it produces.

For bot detection avoidance, the engine uses playwright-extra with StealthPlugin, combined with randomised user agents and viewport dimensions. Delays between extractions are deliberately non-uniform.


Stage 2 - Getting the Frames

Metadata alone tells you a video exists and performed well. It doesn't tell you why.

The media pipeline handles three jobs in sequence: download the video via yt-dlp, extract frames via ffmpeg, and strip the audio track for transcription. The frames give the AI a narrative structure to work with.


Stage 3 - Gemini Analysis

A request to Gemini Flash transcribes the audio, and then another request packages three things: the frames as images, the audio transcript, and the video's text metadata. The model returns a strict JSON schema that covers:

Visual dimensions:

  • Hook style (ASMR, Talking Head, Text-on-Screen, B-roll)
  • Aesthetic category (Clean-Girl, Maximalist, Minimalist, Raw)
  • Face presence and emotional expression at reveal
  • Setting and production quality tier

Audio dimensions:

  • Sentiment classification (Raving, Positive, Neutral, Critical)
  • Incentive signal (Gifted, Paid Partnership, Organic)
  • Purchase intent language ratio

Filtering layer:

  • Is this actually a product-focused video, or is the product incidental?
  • Is the locale consistent with the target market?

The AI acts as a gatekeeper. Videos that pass metadata filters but fail content filters are excluded before they pollute the dataset.

Gemini has a massive context window, multimodal input, and it's cheap.


Stage 4 - Persistence

I had to store the data somewhere, and it's local, so it doesn't need to be complicated. I also wanted it to save it's progress in case something crashed.

Each video's extraction, download, enrichment, and storage steps are tracked independently. A scrape can be paused, interrupted, or stopped at any point and resumed without reprocessing completed work or losing partial results.

The database layer is Prisma over SQLite - because it's easy.


Problems

The first problem was that the scraper got caught by bot filtering, so I had to use the Playwright StealthPlugin. Then too many requests too fast got my IP banned. I just added a random delay to resolve it.


What This Shows

Doing things like scraping and video analysis are now super easy. A researcher can build out a simple tool and ask: across the 400 videos in this dataset tagged #PRUnboxing, what percentage of high-performing videos used an ASMR hook style, and what was the sentiment distribution among gifted vs. organic posts? That would have been really difficult just a few years ago.

In a few hours, I had a usable dashboard, the process ran relatively automatically and I got all the data I needed out of it! I used the data to build an interactive article about TikTok Reach.


What's Next

Every time I have a problem now I find myself looking at subscription services and then just throwing something together myself. I've read that SaaS is dead, but I don't think that's true. It's just small tools that do (relatively) simple things and don't need to be supported can be built in a couple of hours. Sometimes that makes no sense, but if the other option is expensive or doesn't quite do what you want then it really does.

What personal software have you built to solve your own problems?

Top comments (0)