Everybody says “just add RAG” like it is a button in settings.
It is not. I checked. Very disappointing.
The Brief: Personalized News Feeds
Pulse started as a personal AI intelligence feed.
Not a chatbot with a search bar glued to it. Not another app where an LLM confidently explains an article it has never seen. I wanted something more useful:
- collect AI engineering content from RSS, GitHub, arXiv, and Gmail newsletters
- summarize and classify articles
- store embeddings
- support exact, semantic, and hybrid search
- answer questions from my own corpus
- cite the articles it used
- say “I do not know” when the corpus has no answer
That last part is important.
A RAG system that cannot say “I do not know” is not intelligent. It is just overconfident autocomplete in formal clothes.
The simple version looked like this:
Very clean. Very incomplete.
The useful version needed much more.
The Actual System Architecture
Pulse uses a FastAPI backend, PostgreSQL with pgvector, Groq for generation, and an Expo Android app.
At a high level:
For retrieval, the important database columns are:
class Article(Base):
title: Mapped[str]
summary: Mapped[str | None]
category: Mapped[str | None]
keywords: Mapped[list[str] | None]
embedding: Mapped[list[float] | None] = mapped_column(Vector(384))
embedding_model: Mapped[str]
enrichment_status: Mapped[str]
hidden: Mapped[bool]
The vector column uses pgvector, which supports vector similarity search inside Postgres including cosine distance and approximate indexes: pgvector README
PostgreSQL also gives full-text search, documented in the PostgreSQL full-text search docs.
So Pulse does not choose between SQL search and vector search.
It uses both.
Because of course one search mode was too peaceful.
Why “Just Use Embeddings” Was Not Enough
Embeddings are useful. They are not magic.
If the user searches:
on-device foundation models
semantic search is great. It can find articles about local AI, small models, mobile inference, and related topics even if the exact words do not match.
But if the user searches:
Anthropic
exact search is often better. The word itself matters. I do not need a poetic interpretation of Anthropic. I need articles that mention Anthropic.
This is where pure vector search becomes annoying.
Vector search is good at meaning. Full-text search is good at exact language. A useful product usually needs both.
So Pulse supports three modes:
Exact -> PostgreSQL full-text search
Semantic -> pgvector cosine similarity
Hybrid -> merge both result sets
Search Mode 1: Exact Search
Exact search uses PostgreSQL full-text search.
This works well for names, tools, companies, and terms that should match literally.
It is also fast and boring.
But boring is underrated. Many production systems are just boring things that work while exciting things are busy timing out.
Search Mode 2: Semantic Search
Semantic search embeds the query and compares it with article embeddings using cosine distance.
query_embedding = await call_embedder(query_text)
distance = Article.embedding.cosine_distance(query_embedding)
rows = await session.execute(
select(Article, distance)
.where(
Article.enrichment_status == "done",
Article.embedding.is_not(None),
Article.hidden.is_(False),
)
.order_by(distance, Article.ingested_at.desc())
.limit(limit)
)
Search Mode 3: Hybrid Search
Hybrid search combines exact and semantic results using Reciprocal Rank Fusion.
The idea is simple:
score = 1 / (k + rank)
If an article ranks well in exact search and semantic search, it rises. If it ranks well in only one, it still has a chance.
We merge both result lists:
scores[article_id] += rrf_score(exact_rank)
scores[article_id] += rrf_score(semantic_rank)
This made hybrid the default.
Why?
Because users do not wake up thinking:
“Today I shall formulate a query that is best served by cosine similarity.”
They type words. The system should adapt.
Hybrid search lets exact names win when they should, while semantic matches still catch broader ideas.
Ask Mode: RAG With Brakes
The Ask mode is where retrieval becomes generation.
The user asks:
What are the recent themes around AI coding tools?
Pulse does this:
Here, the rejection step matters.
If the top retrieved articles are weak, Pulse does not call the LLM.
This is not a failure.
This is the product behaving responsibly.
If I ask:
What is the weather in Mumbai?
Pulse should not a produce meteorology fan fiction.
It should say:
I do not have enough relevant context in the corpus.
Prompting With Context, Not Hope
The Ask prompt includes only controlled context:
Article ID
Title
Summary
URL
Similarity score
Recent conversation messages
Not raw HTML. Not full article bodies. Not the entire database. Not “please be accurate” as a magical spell.
A simplified prompt shape:
def build_ask_prompt(question, articles):
context = "\n\n".join(
f"[{article.id}]\n"
f"Title: {article.title}\n"
f"Summary: {article.summary}\n"
f"URL: {article.url}"
for article in articles
)
return f"""
Answer the user using only the context below.
If the context is not enough, say so.
Context:
{context}
Question:
{question}
"""
The answer includes citations back to article IDs and URLs.
This keeps the system grounded.
Not perfectly. Nothing with an LLM is perfect. But much better than letting the model free-climb the truth.
Personalization: Ranking Is Also Retrieval
Search is not the only retrieval problem.
The feed itself is retrieval.
Pulse learns from reading behavior:
- short reads are weak signals
- longer reads are stronger signals
- read categories update category weights
- article keywords update interest terms
- bookmarks and hidden articles affect what should appear
The engagement score is intentionally simple:
def engagement_signal(duration_seconds: int):
if duration_seconds < 5:
return None
if duration_seconds < 30:
return 0.2
if duration_seconds < 120:
return 0.5
return 1.0
No fake machine learning ceremony. No “neural preference engine” because I read one article for 14 seconds.
Category weights use an exponential moving average:
new_weight = old_weight + alpha * (signal - old_weight)
The feed score combines:
importance + category preference + recency + keyword overlap
Learning Features: RAG Was Only One Part Of The Loop
Once articles are cleaned, summarized, embedded, and ranked, other AI features become easier.
Pulse uses the same enriched corpus for:
1. Daily Digest
The digest selects recent high-importance enriched articles and asks Groq for a three-paragraph briefing.
This is not just summarization. It is scheduled synthesis.
2. Trends
Trend detection scans enriched entities from recent articles.
for entity in article.entities:
mentions[normalized_entity].add(article.id)
trends = [
entity for entity, article_ids in mentions.items()
if len(article_ids) >= 3
]
This lets the app show repeated topics like companies, models, tools, or research themes.
3. LangGraph Quiz Agent
For learning retention, Pulse generates three-question quizzes from an article summary and entities.
LangGraph is useful for modeling multi-step agent flows.
Pulse uses the quiz flow for:
Quiz sessions are stored server-side with expiry. The answer key is not trusted from the client.
Because yes, even in a personal app, the client should not grade itself.
The Product Rule: Retrieval Before Generation
The biggest design rule became:
Retrieve first. Generate second. Refuse when retrieval is weak.
That rule shows up everywhere:
- Search can run without Groq.
- Ask mode refuses unrelated questions before spending quota.
- Digest uses selected articles, not the entire database.
- Quiz generation only works on enriched articles.
- Feed ranking uses stored signals, not live model calls.
This made the system cheaper, faster, and less ridiculous.
LLMs are powerful. They are also expensive, rate-limited, and occasionally very committed to being wrong.
So Pulse uses them where they add value, and keeps boring deterministic code around them.
The Final Shape
The final RAG architecture looked like this:
That is more work than:
documents -> embeddings -> chatbot
Takeaway
RAG is easy when the input data is clean, the query is friendly, and nobody asks anything weird.
Useful RAG is different.
Useful RAG needs:
- clean source data
- validated enrichment
- exact search
- semantic search
- hybrid ranking
- relevance thresholds
- citations
- refusal paths
- personalization
The hard part is not putting vectors in a database.
The hard part is deciding when the vector result is not good enough.
The hard part is not calling the LLM.
The hard part is knowing when not to call it.
That is what made Pulse useful.
Not because it could answer everything.
Because it knew when it could not.









Top comments (0)