How I built a local-first content processor with Python + Ollama

Bruce — Wed, 03 Jun 2026 18:40:00 +0000

I have 847 saved videos, articles, and podcasts I will never watch or read.

For two years I told myself I'd get to them. I didn't. The pile just grew.

The problem isn't discipline. It's that "save for later" is a one-way door — things go in but never come out as anything useful.

So I built something to fix that. It's called DRIP.

What it does

DRIP reads your saved content — YouTube videos, articles, podcasts — and converts each one into a structured Markdown note that drops directly into your Obsidian vault (or any folder you point it at).

A saved recipe becomes a note with an ingredient table and steps. A podcast becomes guest, topics, and key takeaways. A long article becomes a summary with the core arguments pulled out.

Everything runs locally on your machine. Nothing is uploaded anywhere.

The stack

Python for orchestration and file handling
Ollama for local LLM inference (llama3, but model-agnostic)
yt-dlp for pulling transcripts from YouTube
Readability for extracting clean article text
Whisper (optional) for podcasts without transcripts

The core loop is simple: fetch content → extract clean text → send to Ollama with a structured prompt → write Markdown to the output directory.

Why local-first

Two reasons.

Privacy. My saved content includes private reading habits, half-formed research, and things I save "just in case." I didn't want to pipe all of that through a third-party API.

Cost. Running a cloud LLM on 847 items would get expensive fast. Running Ollama locally is free after setup.

The tradeoff is speed — local inference is slower than API calls. But this is a background process; I kick it off and come back later.

The hard parts

Transcript quality varies a lot. YouTube auto-captions are often garbled, especially for technical talks. I added a cleaning pass before the LLM step to strip filler words and fix common OCR-style errors.

Getting reliable structured output from Ollama. I needed clean Markdown with consistent heading levels — not JSON, not prose. This took more prompt iteration than expected. The fix was being extremely explicit in the system prompt with a concrete example of the exact output format.

Concurrency limits. Even locally, hammering Ollama with many concurrent requests degrades output quality. I settled on a small queue with a configurable concurrency limit (default: 3).

Current state

It's working and I use it daily. I've processed about 600 items so far — the notes are genuinely useful and I've actually started surfacing things I saved years ago.

I packaged it as a one-time purchase tool at thebvl.com — $39, runs on your machine, no subscription, you own it.

Happy to answer questions about the implementation, particularly the Ollama prompt structure or the transcript pipeline. Both took longer to get right than I expected.

What do you do after you saved doomscrolling video's

Bruce — Mon, 01 Jun 2026 06:13:53 +0000

How I built DRIP — a local-first tool that turns years of saved bookmarks, videos, and posts into usable documents, without leaving a media library behind.
I had a problem most people have and never name: I am a compulsive saver. YouTube "Watch Later" hundreds deep. Browser bookmarks going back years. Saved posts across half a dozen platforms. All of it filed away with the quiet promise that I'd come back to it.

I never came back to it. Almost nobody does. Saving had become a substitute for reading, not a precursor to it.
So I built a tool to actually do something with all of it. This is a write-up of the decisions that turned out to matter more than I expected.

The first decision: don't keep the video
The obvious way to process a saved video is to download it and run something over it. That's how most tools in this space work, and it has two costs people underestimate: storage (downloading hundreds of saved videos is gigabytes you'll never watch) and terms-of-service exposure (pulling full video files is against most platforms' rules).

I didn't want either. And I realised I didn't need the video at all — for almost everything you save, the value is in the words, not the pixels. A tutorial, a recipe, a research talk: what you actually want is the transcript and the structure.

So the rule became: no media library, ever.
In practice that's two paths. For the majority of saved videos, there's already a caption or subtitle track — DRIP reads that directly, and no media touches your drive at all. For the minority with no captions, it downloads audio only to a temporary file, transcribes it locally, and deletes the file the moment transcription finishes. No video is ever stored, and nothing is left behind.

This is not a compromise. It's the right architecture — it keeps the common case completely clean and handles the edge case honestly instead of pretending it doesn't exist.

The second decision: keep it local, but give people the choice
Once it's a text problem, you decide where the processing happens. The thing being processed is a deeply personal map of someone's interests over years. That's not data I want to hold, and it's not data most people want to upload.

So DRIP is local-first: it runs against a local model through Ollama or LM Studio by default, and your content never leaves the machine. But I didn't want to be dogmatic — if you'd rather use a hosted model, you plug in your own Claude, GPT, or Grok key and pay per run. There's also an "auto" mode that tries local first and falls back to cloud only if local fails.
The principle: your AI, your cost, your call. No subscription, no key required to start.

The third decision: one document per item, routed by type
A generic "summarise this" output is almost useless, because saved content isn't homogeneous. A recipe and a podcast want completely different things from a processor.

So DRIP classifies each saved item, then generates one focused document in the format that content actually calls for:

A recipe becomes an ingredient table (imperial + metric) with a numbered method.
A workout becomes an exercise table with sets, reps, and rest.
A podcast becomes guest, host, key topics, quotes, and takeaways.
A business/strategy piece becomes sections with action items and source attribution.
A tech/learning item becomes a step-by-step guide with references.

One PDF per item, sorted into a folder for its topic — so a Workouts file is about one workout, not five unrelated things jammed together. Each PDF comes with a matching Markdown file, which drops straight into Obsidian and similar tools with zero friction.
The part I didn't expect to build: a learning loop
The classifier was good, but generic — it didn't know my patterns. So I added a feedback loop, loosely inspired by Karpathy's idea of generating, testing, and keeping what works.

It's deliberately low-effort: after a run, you go through the day's PDFs and mark each keep or trash — about 30 seconds. Every trashed item becomes a signal: the system derives a one-line rule from the mistake and stores it locally. Future runs inject those learned rules into the classifier as examples. There's also an experiment mode that generates new rules, tests them against examples you've already verified, and commits only the winners.

After a couple of weeks of light feedback, it classifies my content noticeably better than the generic prompt did — because it has actually seen how I sort things. All of that learning lives in a local folder. None of it is uploaded.

Where it is now:
Built in Python. Runs on macOS, Windows, and Linux. Reads from YouTube, X, Instagram, Facebook, TikTok, LinkedIn, and browser bookmarks. Processing is local by default, cloud if you choose. A background scheduler runs it each morning so the documents are just there when you wake up.

I packaged it as DRIP if you want to try it: thebvl.gumroad.com/l/rvwbcv — one-time, no subscription.
But mostly I wanted to write up the captions-first / throwaway-audio approach, because it's the decision I'm happiest with and the one I haven't seen others take.

Happy to answer anything.