DEV Community

Deepak chauhan
Deepak chauhan

Posted on

I Built a Meeting Intelligence Platform Because I Was Drowning in Context-Switching

Every Monday morning, I'd open Microsoft Teams and stare at the same problem.
Somewhere in last week's 14 meeting transcripts was the answer to the question my stakeholder just asked. Somewhere in Jira, there was a ticket that explained why we made that architectural decision three months ago. Somewhere in Snowflake, there was a DDL that told me exactly what columns were available before I opened my BI tool.
The knowledge existed. It was just... scattered.
I work on an analytics team. We build dashboards, pipelines, and data products for a large enterprise. We run on Snowflake, Power BI, Tableau, Teams, and Jira. The context for any given question lives across all of them simultaneously — and switching between them to answer one question from a stakeholder was costing me more time than the actual answer did.
So I built a solution.

What It Actually Is?
It's an enterprise meeting intelligence platform — but that description undersells it a bit. More precisely, it's a multi-source knowledge retrieval system that ingests:

Microsoft Teams transcripts (the actual .txt files from recorded meetings)
Jira tickets (pulled via API, linked to the project context)
Knowledge documents (internal guides, process docs, anything relevant)
Snowflake DDL (so it knows your actual data schema)

You ask it a question in plain English. It searches across all of those sources and gives you a grounded answer — with citations back to the original context.
Not hallucinated. Not generic. Grounded in your team's actual conversations and decisions.

The Architecture (and Why I Made These Choices)
The stack might look unusual at first:
Gradio UI
→ Python pipeline (ingest, retrieve, generate)
→ ChromaDB (local vector store)
→ Snowflake Cortex (LLM answer generation)
→ Snowflake (persistence: runs, chunks, queries)
Why ChromaDB + Snowflake Cortex instead of a pure cloud vector DB?
My team already lives in Snowflake. All our data, pipelines, and query history is there. Using Snowflake Cortex for answer generation keeps sensitive enterprise data in an environment we already trust and govern. But Cortex doesn't replace a dedicated vector store for fast semantic retrieval — so ChromaDB handles the local embedding cache, and Snowflake handles persistence and auditability.
why not use snowflake document AI? - Don't have access and didn't wanted to go through cycle of raising requests and justifying to admins.
Why a pipeline instead of an agent framework?
The current architecture is a deliberate, modular pipeline: ingest → chunk → embed → retrieve → generate. Each step is a Python module with a clear responsibility. It's not agentic yet — and that's intentional. Getting the pipeline solid and reliable with real data came first. Agentic orchestration (think: dynamic source selection, multi-hop reasoning) is on the roadmap, but only once the foundation earns it.

What I've Learned So Far
Real data breaks everything. My first test transcript was a clean sample I wrote myself. It ingested perfectly. The actual CI 2026.txt Teams export? Zero chunks. The format was completely different. Lesson: test on the ugliest, most real data you have from day one.
Metadata is not optional. ChromaDB will throw cryptic errors if you pass None values in document metadata. Every field needs a fallback. Every ingestion needs validation. This sounds obvious until 2am when you're staring at a ValueError with no stack trace.
Users don't want automation. They want control. My initial design tried to automatically trigger ingestion pipelines on new data. My users (including myself) pushed back hard. They wanted to see what was being ingested, confirm it, and trigger it manually. The UI has explicit "ingest" buttons now. This is the right call.
Modularity pays off early. Keeping each step (ingestion, chunking, embedding, retrieval, generation) as a separate, replaceable module made debugging dramatically easier. When the transcript parser broke, I didn't have to touch the retrieval layer at all.

What's Coming Next
Recall is still in active development. Upcoming posts in this series will cover:

Post 2: Parsing messy Teams transcripts (and why regex failed me)
Post 3: ChromaDB + Snowflake — why a hybrid store made sense
Post 4: Snowflake Cortex for enterprise RAG — what it can and can't do

If you're building anything in the RAG / meeting intelligence / enterprise knowledge space, I'd genuinely love to compare notes. Drop a comment or follow along.

This is being built as an internal tool. The architecture decisions reflect real enterprise constraints — governed data environments, offshore team collaboration, and stakeholder explainability requirements. Not a startup. Not a demo. A real thing we actually use.

Top comments (0)