DEV Community: Dipro Bhowmik

Steal my code: I built a RAG agent for sales people

Dipro Bhowmik — Fri, 06 Feb 2026 05:00:00 +0000

Our sales team kept bugging us with the same questions: "How does the API handle rate limiting?" "Does the API support pagination?" "Can you explain embeddings to a customer?"

I thought, let me build them an AI agent instead.

I call him Allen - he searches our docs and answers questions instantly. No more Slack interruptions, no more stale wiki pages, no more "let me get back to you."

Source code here. Feel free to put this tutorial into Claude Code and see what it comes up with

Here's how I did it.

What is RAG?

If you've been living under a rock for the last 3 years, let me explain what RAG is.

RAG stands for Retrieval-Augmented Generation.

It is a fancy term for: "search your docs, feed results to an LLM, get a smart answer."

Without RAG, your vanilla agent:

Won't know your product
Will make things up about features you don't have

With RAG:

The agent searches your actual docs
Uses that context to answer questions

RAG is also really easy to implement these days.

The infra / stack

We're using a really common stack today -

Frontend: Next.js (streaming chat interface)
Agent: LangChain (handles the search→think→respond loop)
LLM: Claude Sonnet 4.5 (smart enough to know when to search)
Search: Vector database with semantic + keyword search

Total setup time was 1 day with Cursor (one morning and then another afternoon a few weeks later).

Most of that time was fussing with the frontend to get the chat interface to look nice.

Setting up the UI

Frontend is just Next.js. Messages go to /api/chat and responses stream back:

const response = await fetch("/api/chat", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ messages: apiMessages }),
});

By default the results will seem slow, because the client will wait for the whole response before rendering. I used Server-Sent Events so that words appear one-by-one like in ChatGPT or Cursor.

In light-mode, Allen is a random old white dude. In dark mode, Allen becomes someone trapped in a computer:

Building the agent

Wiring up to an LLM

I'm using Claude because I had an API key for it already:

const model = new ChatAnthropic({
  model: "claude-sonnet-4-5-20250929",
  apiKey: process.env.ANTHROPIC_API_KEY,
  thinking: {
    type: "enabled",
    budget_tokens: 10000,
  },
  maxTokens: 20000,
});

I initially did not have the "thinking" step there, but I realized it made my users trust the thing more. It's mainly filling the context window with reflection, but gives users a feeling that Allen is really trying.

Setting up LangChain agent

LangChain makes this easy:

const agent = createAgent({
  model,
  tools: [searchDocuments],
  systemPrompt: `
You are Allen (Al), a documentation assistant for Shaped. 
Help users find answers about the Shaped platform and API.

<basic_guidelines>
- Be concise. Prefer short, direct answers over long explanations.
- Use code examples when they clarify the answer. 
- Use search tools at your disposal. 
</basic_guidelines>

<prefer_search>
- Use search tools at your disposal. Run search at most 4 times per question.
- After retrieving search results, think carefully about whether the results are relevant to the user's query. If the results don't contain the information needed to answer the question, try searching again with a different query or search mode.
- After 4 searches, if the content is still not found or not relevant, tell the user: "I couldn't find information about this in the Shaped documentation. This topic may not be covered in the available documentation."
- When you have enough context, answer without extra searches.
- Prefer a single, focused search but use multiple when required.
</prefer_search>`,
});

The system prompt is where you teach the agent how to do things. I added the prefer_search section to prevent it from hallucinating.

Making a Search tool

LangChain allows agents to have "tools", which are basically functions that they can call. I wrote a search tool, which is where RAG actually happens.

The agent can call this search function whenever the context requires it:

import { tool } from "langchain";

export const searchDocuments = tool(
  async (input) => {
    // Search happens here
  },
  {
    name: "search_documents",
    description: "Search the Shaped documentation for relevant content about a given topic",
    schema: z.object({
      query: z.string().describe("The search query to find relevant documents"),
      mode: z.enum(["vector", "lexical", "hybrid"])
        .describe(`The search mode. 
          Choose "vector" for semantic search: to return docs containing similar semantic meaning or phrase content to the input. 
          Choose "lexical" for BM25 lexical search: to return docs with specific keywords or IDs.
          Choose "hybrid" for a mix of strategies: 50% vector and 50% lexical.`)
    }),
  }
);

The description tells Allen what the tool does, and the schema describes the inputs he will use.

Three Ways to Search

I implemented three different ways to search the documentation, to catch different ways that people may search. A single search tool handles all of them.

1. Vector Search (Semantic)

This is the "AI-powered" search everyone talks about. The items in your DB is turned into vectors using an embedding model, the query is also turned into a vector, and then the search engine compares the input to the items.

Good for: Natural language queries like "How do I authenticate?" (matches "authentication", "login", "API keys")

Bad for: "BM25" (may not find an exact match)

Since I'm using Shaped, I can do this with SQL:

SELECT * 
FROM text_search(
    query='$query', 
    text_embedding_ref='text_embedding', 
    mode='vector'
)
LIMIT 20

2. Keyword Search (Lexical)

This is old-school keyword search. Uses BM25 algorithm to find exact keyword matches.

Good for: "rate_limit parameter" (finds exact API names)

Bad for: "How do I log in?" (doesn't understand paraphrasing)

SELECT * 
FROM text_search(
    query='$query', 
    mode='lexical',
    fuzziness=0
)
LIMIT 20

3. Hybrid Search

This run both searches, and merges results. I explicitly weigh the results 50/50 between both approaches, but you can change this mix.

Good for: Almost everything.

SELECT *
FROM text_search(
       query='$query_text', 
       mode='vector',
       text_embedding_ref='text_embedding', 
       name='vector_search'
     ),
     text_search(
       query='$query_text', 
       mode='lexical', 
       name='lexical_search'
     )
ORDER BY score(expression='0.5 * retrieval.vector_search + 0.5 * retrieval.lexical_search')
LIMIT 20

I default to hybrid. Semantic search catches paraphrasing, keyword search ensures exact terms aren't missed.

Actually hitting the search API

The search tool hits your vector database API. I'm using Shaped because I work there and get a free account:

export async function getSearchResults(query, mode) {
  const res = await fetch(SEARCH_API_URL, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "x-api-key": API_KEY,
    },
    body: JSON.stringify({
      query: `SELECT *
FROM text_search(
       query='$query_text', 
       mode='vector',
       text_embedding_ref='text_embedding', 
       name='vector_search'
     ),
     text_search(
       query='$query_text', 
       mode='lexical', 
       name='lexical_search'
     )
ORDER BY score(expression='0.5 * retrieval.vector_search \
    + 0.5 * retrieval.lexical_search')
LIMIT 20`,a
      parameters: {
        query_text: query,
        mode: mode
      },        
      return_metadata: true,
    }),
  });
  return await res.json();
}

Lessons Learned

1. Chunk Size Matters

I started with tiny 100-token chunks. Allen retrieved single sentences that didn't give it enough context to answer meaningfully.

The sweet spot I found was chunk that are around a paragraph. Small enough to be specific, large enough to have context.

2. Metadata Is Clutch

Add metadata to every chunk so you can display it to the user without an additional hop:

{
  "content": "...",
  "source": "API Reference",
  "section": "Tables",
  "h1": "Create Table",
  "h2": "Parameters",
  "last_updated": "2024-01-15"
}

It also lets you do filtered searches: "What changed in the API recently?" filters by last_updated.

3. Don't underestimate the system prompts

My first version was garbage, too simple. Key insights:

Tell it to search before answering (otherwise it guesses)
Set a max search limit (or it goes infinite loop on hard questions)
Tell it to admit when docs don't have the answer
Make it cite sources (sales team loves this)
It's better to be more verbose and specific

4. Monitor what is retrieved and cited

If the agent retrieves docs but doesn't cite them, your retrieval is not great. I log every search and check if the results actually appear in the response.

Stuff I'd add next

Re-ranking: After initial search, run results through another model to score on semantic relevance. You can use an algorithm like ColBERT for this

Feedback loop: Track which answers get thumbs-down, use that to improve chunking.

Multi-modal search: Our docs have diagrams, would be cool to search those too.

Boring stuff: ROI

This was fun to build, but what makes it appealing to my manager is:

Makes pre-sales team feel powerful
Less engineering Slack questions
Sales can answer questions without raising with a technical person

Setup cost: <2 days of dev time, Shaped usage is within the $100 free tier, plus LLM API calls.

Build it yourself

I'm 90% sure if you give Cursor this tutorial and the Github repo, it could build you the same thing in less than 1 hour.

The actual code is pretty simple:

Convert documentation into a chunked JSON file
Upload JSON file to Shaped
Wire up LangChain agent with search tool
Deploy to Vercel or wherever
Send to your boss

The hardest part isn't the code, it's figuring out how to chunk your docs appropriately. But once it clicks, you've got an agent that actually knows your product.

Roast my approach in the comments lol

How I built a movie suggestion app with zero ML experience

Dipro Bhowmik — Mon, 27 Oct 2025 15:29:49 +0000

In the last week or so, I've been building an app to generate real-time movie recommendations based on a user's activity. I chose this simple use case to learn more about how recommender systems work.

Shaped is a relevance database: storage optimized to fetch data based on user behaviour, not just static rules.

It makes building personalized applications easy for developers by packaging three distinct layers into a single API:

A data layer that can host your data or connect to an external source
An ML layer that indexes on your data and supports the latest recommender models and architectures
A query layer to interface with client applications and power real-time recommendations

In this article, I will show you how I used Shaped to build a movie recommendation system. Click here to check out my final demo application.

Here is what the architecture of my app looks like:

Uploading my data

Any machine learning system is only as good as the data it is trained on.

For data, I started with a public dataset called movielens that is well-known in the machine learning industry. It contains 100,000 ratings of 9000 movies, ranging from the early 1900s to 2018.

My suggestion system will be built with two data sources from movielens:

Movies: a catalog of 9000 movies
Ratings: a list of user-generated ratings

The first step was to load this data into Shaped. This was a relatively easy process; movielens data is very clean so the only step was convert the data files to jsonl format.

Shaped also supports automated import from systems like Postgres, MySQL, S3, Apache, and more.

Enriching my dataset with semantic information

To give my model more to work with, I wrote a small Python script to get metadata from the IMDb API. This enrichment step is crucial to enable semantic search on my dataset.

I added columns for description, cast, writers, etc, so my model can respond to searches like - Movies written by Paul Thomas Anderson.

# Load movies from JSONL file
movies = []
with open('movies.jsonl', 'r') as f:
  for line in f:
    movies.append(json.loads(line))
# Process each movie with API enrichment
for i, movie in enumerate(movies):
  imdb_id = movie.get('imdbId')

  try:
    response = requests.get(url, headers=headers, timeout=30)
    result = response.json()

    # Extract and process movie data
    directors = result.get("directors", [])
    directors_string = ','.join([d.get('fullName', '') for d in directors])
    writers = result.get("writers", [])
    writers_string = ','.join([x.get('fullName', '') for x in writers])
    cast = result.get("cast", [])
    cast_string = ','.join([x.get('fullName', '') for x in cast])

    # Update movie with enriched data
    movies[i].update({
      "description": result.get("description"),
      "interests": result.get("interests"),
      "release_date": result.get("releaseDate"),
      "directors": directors_string,
      "cast": cast_string,
      "writers": writers_string,
    })
# Save enriched movies to JSONL file
with open('enriched_movies.jsonl', 'w') as f:
  for movie in movies:
    f.write(json.dumps(movie) + '\n')

The full enrichment script is in /model/scripts/enrich-movies.py

Defining my model

After my data was loaded, it was time to configure my model. Shaped makes it very easy to set up your first model: just upload a YAML file.

There are three config components to know: connectors, fetch, and model.

connectors: Defines which datasets to connect to my model.
fetch: to define the SQL that Shaped will use to get my training data. For this model, I configured an items table (movies) and an events table (user behaviour like ratings and clicks).
model: Declare how the model will actually score and rank items. It exposes two important fields:
- policy_config: Define the ranking algorithm and how the model learns
- inference_config: Tweak how your model serves results at runtime (inputs, retrieval methods, diversity, etc)

I'll save the details for another blog post, but here's the full model config for your reference: model.yaml

Building the frontend

Since I'm creating this demo from scratch, I spent some time building a Next.js app to showcase the model.

I built some generic components to start:

Carousel to show a category of movies Search bar
Card when you click on a movie that shows further details
Similar movies

I also wrote some logic to track which items a user clicks on. These are sent back to Shaped as new events in the “events” dataset.

Here's what the first version looked like, with dummy data:

Writing my model to the Shaped API

After building my frontend and training my model, it was time to wire my app to the Shaped API.

The benefit of using Shaped is its single-model versatility. A single model can serve multiple use cases across my app. I don’t have to train a recommendation model, a separate semantic search one, and then a third similarity model.

As you’ll see, the same model will be used to get personalized recommendations, run semantic search, and get trending movies, similar movies that other people liked, and recommendations in a specific category.

This dramatically reduces complexity and ensures consistent ranking logic across my application.

Feature 1: Personalized "For you" carousel

The topmost carousel should show a personalized “For you” feed of movies that the user may like. To do this, we call the Shaped /rank endpoint, which returns a personalized list of rankings based on user IDs, interactions, a text query, and anything else you want to pass it.

For this carousel, we want rankings that are:

Conditioned on the current user’s unique ID
Conditioned on any recent interactions that the model may not have been trained on, but do not return these items
Include item metadata (title, genre, etc) to save a trip to the server
Include some less-relevant items to prompt exploration

The final call to the /rank endpoint looks like this. Notice we include interactions, user_id, and an exploration_factor to adjust the flavour of our results set:

const forYouMovies = await fetch("/models/movie_recs/rank", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "x-api-key": token,
    },
    body: JSON.stringify({
      return_metadata: true,
      limit: 20,
      user_id: userId,
      interactions: stringInteractions,
      config: {
        filter_interaction_iids: true,
        exploration_factor: 0.2,
      },
    }),
  });
return (
 <MovieList movies={forYouMovies} />
)

Feature 2: Semantic search using the same model

As mentioned before, we’ve trained this ranking model and get semantic search for free. In this case, we use the /retrieve endpoint with a text query. This returns a set of relevant results with no personalization. This is important because a search should be agnostic to a user’s preferences.

const getMovies = async (searchQuery) => {
  try {
    const response = await fetch("/models/movie_recs/retrieve", {
      method: "POST", headers,
      body: JSON.stringify({
        return_metadata: true,
        explain: true,
        text_query: searchQuery,
        config: {
          exploration_factor: 0,
          diversity_factor: 0,
          diversity_attributes: [],
          limit: 50
        }
      }),
    });

    const searchResults = await response.json();
    setMovies(searchResults?.data?.metadata || []);
  } catch (error) { ... }
};
const handleInputChange = (event) => {
  const searchQuery = event.target.value;
  setQuery(searchQuery);
  getMovies(searchQuery);
};
return (
  <div>
    <Input 
      value={query}
      onChange={handleInputChange} 
      placeholder="Search for movies..."
    />
    <MovieList movies={movies} />
  </div>
);

Feature 3: Powering a “People also liked…” section

If we pass the model a movie ID, it will show us similar movies. To do this, we call the /similar_items endpoint with an item_id parameter. This returns the movies that are most similar to the selected one.

const similarMovies = await fetch("/models/movie_recs/similar_items", {
  method: "POST",
  headers,
  body: JSON.stringify({ item_id: item_id }),
});
return (
  <MovieList movies={similarMovies} />
)

Feature 4: Adding genre filters

Again we can support a new use case with our same model. I can add carousels for a specific genre, with personalized recommendations based on the user’s activity. I use a similar API call as the first example, but filtered for only a specific genre. For this, I use the /rank endpoint with a filter_predicate attribute:

const actionMovies = await fetch("/models/movie_recs/rank", {
        method: "POST", headers,
        body: JSON.stringify({
          filter_predicate: `array_has_any(genres, ['Action'])`,
          user_id: userId,
          interactions: stringInteractions,
          limit: 20,
          return_metadata: true,
        }),
      });
return (
  <MovieList movies={actionMovies} />
)

Feature 5: Adding real-time interactions

Finally, we can make our model better over time by inserting the interactions back to our events table, using /datasets/{name}/insert:

const trackClick = () => {
  await fetch("/datasets/events_table/insert", {
    method: "POST",headers,
    body: JSON.stringify({
      data: [
        {
          event_value: payload.event_value,
          movieId: payload.movieId,
          timestamp: payload.timestamp,
          userId: payload.userId
        }
      ]
    }),
  })
};
return (
    <button type="button" onClick={trackClick} className="text-left w-full"> <MovieCard />
</button> )

Conclusion

I built a real-time, production-ready movie recommendation system without deep machine learning expertise.

Shaped abstracts the complex training and deployment pipeline, allowing me to go from raw data to a fully functional application quickly. I powered personalized ranking, semantic search, and item similarity using a single model and without touching any infrastructure.

If you're curious to train your own models, sign up for a 14-day free trial and test it yourself.

The full code for this project (including data and model config) is available on GitHub.