DEV Community: networkingguru

How I Built a Web Interface for 1.4 Million Government Documents with FastAPI, HTMX, and SQLite

networkingguru — Fri, 10 Apr 2026 17:51:21 +0000

When government agencies release the same document multiple times with different redaction patterns — which happens more often than you'd think across FOIA batches, Congressional releases, and litigation disclosures — it's possible to cross-reference the releases and algorithmically recover the hidden text. I built a tool called Unobfuscator to do exactly this with the entire Epstein corpus.

The problem is, Unobfuscator's output is a SQLite database. This is annoying but workable if you're a developer. It's useless if you're a journalist, investigator, or anyone else who actually needs to find things in 1.4 million documents.

So I built TEREDACTA — a web interface that makes those recoveries searchable and explorable.

The Dataset

The current deployment covers the Congressional Epstein/Maxwell releases: DOJ volumes, House Oversight releases. 1.4 million documents, 15,220 document match groups, 5,600+ substantive recovered passages.

Some of what's been recovered is genuinely significant — internal BOP/MCC emails about Epstein's case, staff interview lists mapping shifts at the Manhattan Correctional Center, FBI evidence logs with 113 recovered passages, and Ghislaine Maxwell's own PR response drafts.

The Stack

Here's the part that might surprise you: FastAPI + HTMX + Jinja2 + SQLite. That's it. No React, no Vue, no webpack, no npm, no build step. The entire frontend is HTMX with vendored JS and server-rendered templates.

It sounds like it shouldn't work for an interactive investigation tool, but it does.

Why HTMX

I'll be honest — when I started this project, I assumed I'd end up reaching for React or at least Alpine.js. The feature requirements looked like they needed a proper frontend framework: boolean search with real-time results, an entity relationship explorer, a document viewer with highlighted recovered passages, progress indicators for long-running operations.

HTMX handles all of it. Partial page updates via hx-get and hx-swap. Server-sent events for real-time progress on operations that take more than a moment. The result is an interface that feels reactive without shipping a single line of application JavaScript.

The advantage isn't just simplicity — it's debuggability. When something breaks, there's no component tree to inspect, no state management layer to untangle, no build pipeline to suspect. It's HTTP requests and HTML responses. The browser's network tab tells you everything.

The Performance Problem

Boolean search across 1.4 million documents needs to return results fast enough that investigators don't lose their train of thought. "Fast enough" in this context means under 2 seconds, and ideally under half a second.

SQLite is the database, and SQLite is single-writer. For a read-heavy investigation tool, this is actually fine — reads are concurrent and the dataset is static (new documents get added in batches, not continuously). But the query planning needed careful attention.

The entity index — people, organizations, locations, emails, phone numbers extracted from recovered text — lives in a separate SQLite database. This was a deliberate choice for query isolation. The entity queries (which involve relationship traversal) have very different access patterns than the document search queries, and separating them means neither workload contaminates the other's page cache.

Cold-cache performance was the real challenge. First query after a restart could take 10+ seconds as SQLite populated its page cache. The fix was careful index design and strategic PRAGMA tuning — mmap_size for memory-mapped I/O, cache_size for the page cache — rather than adding an external caching layer. Adding Redis or Memcached to what is otherwise a zero-dependency Python app would've been architectural vandalism.

Security Considerations

This is a tool that lets people search through government documents. The security model needs to be airtight not because the data is secret (it's publicly released), but because the tool could be a target.

Authentication uses signed cookies with CSRF tokens. The Unobfuscator database is read-only — the application has no write access to the source data. Input validation includes regex backtracking prevention (a real attack vector against search tools that accept user-supplied patterns). The whole thing runs behind Caddy with automatic TLS.

What I'd Do Differently

The SSE implementation for real-time progress went through three iterations before I got the connection management right. Server-sent events sound simple, but handling client disconnection, reconnection, and the inevitable proxy buffering issues (Caddy, Cloudflare, etc.) required more thought than I expected. I'd document this pattern better from the start.

Try It

Everything is MIT licensed. The methodology is transparent and auditable — this is algorithmically recovered text from publicly released documents, not guessed or AI-generated content.

[Live site] | [GitHub]

How I Built a Tinder-Style Group Decision App with React Native and Firebase

networkingguru — Fri, 10 Apr 2026 17:39:13 +0000

My wife and I have a problem (no, not THAT kind of problem).

It's the same problem every couple has: nobody can decide where to eat. Or what movie to watch. Or what show to binge next.

The conversation follows a depressingly predictable script — "I don't care, what do you want?" repeated ad infinitum until someone either picks something out of frustration or you just stay home.

So I built an app to solve it. WhaTo lets a group of up to 8 people join a session with a 4-letter code, swipe through options (restaurants, movies, or TV shows), and find out what they agree on. Like Tinder, but for dinner.

The Stack

Framework: React Native with Expo (cross-platform iOS, Android, web)
Real-time sync: Firebase Realtime Database
API proxy: Cloudflare Worker (routes calls to Yelp, TMDB, Google Places)
Animations: React Native Gesture Handler + Reanimated
Testing: Jest + React Native Testing Library + Maestro (E2E)

The Interesting Problems

Real-Time Sync for 8 Concurrent Users

The core requirement was that everyone swipes simultaneously and sees results the instant the last person finishes. Firebase Realtime Database handles presence tracking and swipe state broadcast, but the matching algorithm runs client-side. Each client independently computes matches as swipe data arrives from other users. The server just broadcasts state changes.

This was a deliberate choice. Running the matching algorithm server-side would add a round-trip penalty on every swipe completion, and the algorithm itself is lightweight — it's just set intersection. The tradeoff is that every client computes the same result independently, which is redundant work, but the latency improvement is worth it. Results appear instantly as the last person finishes swiping, with no perceptible delay.

Session management was its own challenge. Sessions auto-expire after 24 hours via Firebase TTL rules. The 4-letter codes need to be unique within the active session window but recyclable after expiry — I didn't want to slowly exhaust the namespace.

Gesture Handling and Card Physics

This was the rabbit hole I didn't expect. React Native Gesture Handler + Reanimated handle the swipe animations, but getting the card physics to "feel right" took more iteration than any other feature.

The problem is that Tinder has trained everyone's muscle memory for how a swipe card should behave — the acceleration curve, the rotation on drag, the snap-back animation on an incomplete swipe, the way the card flies off screen on completion. If any of those are slightly off, the whole experience feels wrong, even if the user can't articulate why.

I ended up studying Tinder's actual animation curves by screen-recording the app and stepping through the footage frame by frame. Probably overkill, but the result is that WhaTo's swipe feels natural to anyone who's used a dating app.

Keeping API Keys Out of the Client

The app pulls restaurant data from Yelp, movie/show data from TMDB, and maps from Google Places. Shipping those API keys in the client binary is a non-starter — anyone with a decompiler gets your keys.

The solution is a Cloudflare Worker that acts as an API proxy. The client calls the Worker, the Worker calls the external API with the real key, and the response gets passed through. The Worker also handles rate limiting and request validation, so even if someone figures out the Worker endpoint, they can't abuse the upstream APIs through it.

What I'd Do Differently

If I were starting over, I'd skip Firebase Realtime Database and use something with better offline support. Firebase RTDB works fine when everyone has a connection, but handling the edge case where someone's phone drops to airplane mode mid-session and reconnects later is awkward. Firestore would've been a better choice for this, but by the time I realized it, migrating wasn't worth the effort.

I'd also invest more in E2E testing earlier. I added Maestro late in the process.

Try It

Free, no ads, no account required. Sessions auto-expire after 24 hours and I don't store your data beyond that.

[App Store]

Feedback welcome — especially if you've tried to solve this problem before and have opinions about what works and what doesn't.