DEV Community

Disha Sethi
Disha Sethi

Posted on

Building Digester: A Cloud-Native Knowledge Concierge

Information overload is a massive drain on developer productivity. We all bookmark technical articles, deep-dives, and documentation pages that we promise to read later—but rarely do. Static bookmark lists become digital graveyards where valuable knowledge goes to die.
Digester is an asynchronous, AI-powered knowledge concierge. Instead of giving you a generic summary, it ingests a URL, checks it against your historical reading memory using semantic vector search, and delivers a highly structured, 3-bullet briefing mapping the new insights directly to things you've read in the past.

From a user's standpoint, the experience is fast, frictionless, and seamless:

  1. Submit: You paste a technical article URL into a minimalist, command-palette style input bar.
  2. Observe: The dashboard instantly switches to a loading state ("Processing async scraper & vector search..."). You don't have to wait with a frozen screen while the backend does the heavy lifting.
  3. Read: The dashboard dynamically updates to display exactly three actionable insight cards. Each card tells you the core takeaway and includes a "Context Match" showing how it updates or connects to your previous data.

Technical Architecture & Flow

To ensure the app can handle heavy document processing without timing out, we decoupled the application into three clear layers:

1. Frontend Flow (The Edge Layer)

  • What we used: Next.js (App Router) + Tailwind CSS + Vercel
  • The Flow: This layer handles the global presentation and user state. When a user drops a URL, the frontend instantly kicks off a background task and immediately releases the user connection, keeping the UI snappy and responsive.

2. Backend Flow (The Engine & Memory Layer)

  • What we used: Node.js/Python API + Supabase (PostgreSQL with pgvector)
  • The Flow: Once a URL is captured, our backend scrapes the page down to clean, LLM-ready markdown. It splits this text into chunks, generates semantic embeddings, and runs a vector similarity search inside our database. This retrieves your history, feeds it into our LLM pipeline, and saves the newly learned knowledge back into your memory graph.

3. DevOps Flow (The Infrastructure & Automation Layer)

  • What we used: GitHub Actions + Docker + AWS EC2 + Nginx + CloudWatch
  • The Flow:
    • CI/CD: Every time we push code to GitHub, an automated workflow spins up, builds a secure Docker image of our backend, and deploys it straight to our AWS EC2 instance.
    • Routing & Gates: Nginx sits on the EC2 server acting as a reverse proxy to manage secure incoming web traffic cleanly.
    • Monitoring: AWS CloudWatch continuously aggregates container logs and sets up metric alarms, ensuring any pipeline failures or LLM timeouts are flagged immediately.

What's Next?

We are currently building out the heavy cloud architecture and agentic workflows behind the scenes. We will update you on the execution part very soon, so stay tuned!

Top comments (0)