<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Madhesh T</title>
    <description>The latest articles on DEV Community by Madhesh T (@madhesh00).</description>
    <link>https://dev.to/madhesh00</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3797600%2Fd69512c2-987d-4bc4-83ba-9da2a2bb15ba.png</url>
      <title>DEV Community: Madhesh T</title>
      <link>https://dev.to/madhesh00</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/madhesh00"/>
    <language>en</language>
    <item>
      <title>What Drives User Success?</title>
      <dc:creator>Madhesh T</dc:creator>
      <pubDate>Sun, 15 Mar 2026 10:53:39 +0000</pubDate>
      <link>https://dev.to/madhesh00/what-drives-user-success-2hn9</link>
      <guid>https://dev.to/madhesh00/what-drives-user-success-2hn9</guid>
      <description>&lt;h1&gt;
  
  
  I Built an ML Dashboard on Zerve to Find Out
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;A complete walkthrough of the idea, the models, the dashboard, and the finding that changed how I think about user retention.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem No One Talks About Honestly
&lt;/h2&gt;

&lt;p&gt;Every SaaS platform has a churn problem. But most of the time, the conversation around churn is reactive — someone cancels, someone stops logging in, a renewal gets missed. By the time the signal arrives, the user is already gone.&lt;/p&gt;

&lt;p&gt;The real question is earlier than that. Not &lt;em&gt;why did they leave&lt;/em&gt; — but &lt;em&gt;who is about to leave, and what do we do right now?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That was the challenge Zerve put in front of me. They provided a dataset of user event data — every action a user takes on the platform generates an event — and asked a single, open-ended question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which user behaviours are most predictive of long-term success?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No predefined answer. No suggested approach. Just the data and the question.&lt;/p&gt;

&lt;p&gt;Here is exactly what I built, how I built it, and what the data actually said.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Zerve Is and Why It Made This Possible
&lt;/h2&gt;

&lt;p&gt;Zerve is a notebook-based data platform with built-in AI capabilities. Users create canvases, write and run code, call AI agents, use generative AI tools, and collaborate with teammates — all inside one workspace.&lt;/p&gt;

&lt;p&gt;What makes Zerve genuinely different as a development environment is the &lt;code&gt;variable()&lt;/code&gt; function. Any output computed in a Zerve notebook — a dataframe, a model result, a summary table — can be called directly into a Streamlit application with a single line of code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;archetype_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;variable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kmeans_user_archetypes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;archetype_cluster_df&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means the model and the dashboard are one connected system. No exporting CSVs. No manual uploads. No stale data. Every time the dashboard loads, it pulls live results directly from the notebook. That is not a minor convenience — it is a fundamental architectural advantage that made this entire project possible as a single coherent pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Dataset
&lt;/h2&gt;

&lt;p&gt;Each row in the dataset is one user. Across each user there are 43 behavioural features — signals describing what that user has done on the platform. The most important ones:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;events_first_7d&lt;/code&gt;&lt;/strong&gt; — Total activity in the first seven days. The single most intuitive early signal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;days_active_first_7d&lt;/code&gt;&lt;/strong&gt; — How many of those seven days did they return? Frequency matters as much as volume.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;agent_tool_calls&lt;/code&gt;&lt;/strong&gt; — How many times did they call an AI agent? This is the platform's core capability. Usage here is the strongest predictor of everything that follows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;genai_events&lt;/code&gt;&lt;/strong&gt; — Broader generative AI engagement. Prompting, AI-assisted code, model interactions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;credits_used_total&lt;/code&gt;&lt;/strong&gt; — Credits are the platform's primary resource. Zero credit usage means zero real output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;feature_adoption_breadth&lt;/code&gt;&lt;/strong&gt; — Did they explore multiple features or stay in one corner of the platform?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;collaboration_index&lt;/code&gt;&lt;/strong&gt; — How much did they work with others? Collaborative users embed themselves in team workflows, which dramatically increases retention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;execution_success_rate&lt;/code&gt;&lt;/strong&gt; — When they ran code, did it work? A proxy for whether they are getting real value or hitting frustration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;target_success&lt;/code&gt;&lt;/strong&gt; — The outcome variable. One or zero. Succeeded or did not. This is what both models are built around.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Approach — Two Models, One System
&lt;/h2&gt;

&lt;p&gt;I built two completely separate models. Understanding why both are necessary is the core of the whole project.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model One — XGBoost Gradient Boosting Machine
&lt;/h3&gt;

&lt;p&gt;A supervised learning model trained on all 43 features against the &lt;code&gt;target_success&lt;/code&gt; label. The model learns which combinations of behaviour predict success and produces a probability for every user — which I scale into a zero to one hundred success likelihood score.&lt;/p&gt;

&lt;p&gt;I chose XGBoost because user behaviour data is non-linear and messy. The relationship between agent calls and success is not a straight line — it interacts with credit usage, with feature breadth, with early engagement patterns. GBM handles these interactions naturally.&lt;/p&gt;

&lt;p&gt;I used &lt;strong&gt;SHAP values&lt;/strong&gt; — SHapley Additive exPlanations — to make the model interpretable. SHAP tells you not just what the model predicts, but which features drove each individual prediction up or down. This is how I can tell you with confidence what the top predictive signals are, rather than just reporting an accuracy score.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Two — KMeans Clustering, k=4
&lt;/h3&gt;

&lt;p&gt;An unsupervised model — no success label involved. It looks purely at behavioural similarity and finds natural groups in the data. The number four was chosen because it produces four genuinely distinct archetypes without over-fragmenting the user base.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why run clustering separately from prediction?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because a score alone is not actionable. If a customer success manager sees a user scored 24, the immediate question is — what do I do? The score tells you &lt;em&gt;how urgent&lt;/em&gt; the situation is. The cluster tells you &lt;em&gt;what action to take&lt;/em&gt;. A Casual Explorer scoring 24 needs a re-engagement tutorial. An At-Risk User scoring 24 needs free credits to try AI features for the first time. Same score. Completely different response. You need both.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Archetypes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Power Coders&lt;/strong&gt;&lt;br&gt;
Heavy agent usage, high credit consumption, scripting and deploying to production. These users found the platform's core value immediately. They have the highest success rate of any group and respond to capability-led upsells — more compute, advanced APIs, dedicated support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Collaborators&lt;/strong&gt;&lt;br&gt;
Socially active builders working in shared canvases and team workflows. Their viral potential is enormous — every collaborator is a gateway to additional team seats. The right intervention is a frictionless team upgrade prompt when their collaboration score crosses a threshold.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Casual Explorers&lt;/strong&gt;&lt;br&gt;
They signed up, explored, created some things — but never formed a consistent habit. There is genuine intent here. It just has not converted. The thirty-day window is critical. A well-timed tutorial showing them a relevant use case reactivates more than twelve percent of this group.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At-Risk Users&lt;/strong&gt;&lt;br&gt;
The largest group. They browse canvases, perform basic actions, appear active — but they have never touched the AI features that drive real retention. Without a catalyst, they churn silently. The highest-leverage intervention is a time-sensitive free credit grant that removes the cost barrier to first AI use.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Dashboard
&lt;/h2&gt;

&lt;p&gt;Built in Streamlit, connected directly to Zerve notebook variables, and designed to serve five completely different business audiences from one tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform Overview&lt;/strong&gt; — The executive layer. Success rate, score distribution across tiers, cluster breakdown, archetype success rate comparison, and score distribution by cluster. The full picture in one screen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User Lookup&lt;/strong&gt; — For the customer success team. Paste any user ID and get their score, cluster, raw probability, and a plain-English recommended action. A comparison chart shows their behavioural profile against the dataset median — you can see exactly which dimensions they are underperforming on and why they scored the way they did.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster Recommendations&lt;/strong&gt; — For the growth team. Each archetype has a specific recommended intervention with a trigger condition written as an executable rule — ready to drop directly into a CRM or marketing automation workflow. No manual review required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ROI Table&lt;/strong&gt; — For leadership and finance. Cluster size multiplied by expected lift gives estimated users impacted. Multiply by average contract value and you have a business case, not just a data science result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster Narratives&lt;/strong&gt; — For everyone else. Plain-English portraits of each archetype written for product managers, marketers, and executives who need the story rather than the statistics. This is the layer that makes the intelligence portable — it travels into strategy documents, board decks, and onboarding playbooks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Finding
&lt;/h2&gt;

&lt;p&gt;After all of it — the modelling, the clustering, the SHAP analysis — the answer to Zerve's question is this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Early, deep engagement with AI features is the single strongest predictor of long-term user success.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not tenure. Not the number of canvases created. Not account age. Whether a user called an agent, consumed credits, and engaged with generative AI in their first seven days is what separates the users who succeed from the ones who quietly disappear.&lt;/p&gt;

&lt;p&gt;The users who stay on the surface — active but never crossing into the AI core - are not disengaged. They are one well-timed intervention away from becoming Power Coders. The platform's job is to make that crossing happen before day thirty.&lt;/p&gt;

&lt;p&gt;That is what this dashboard is built to do.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Zerve Made This the Right Tool for the Job
&lt;/h2&gt;

&lt;p&gt;I could have built this in any notebook environment. But Zerve's &lt;code&gt;variable()&lt;/code&gt; system meant the model outputs flowed directly into the dashboard without any intermediate steps. The clustering results, the feature importances, the scored dataframe — all live in the notebook, all accessible in the app with one line.&lt;/p&gt;

&lt;p&gt;That is not a workflow convenience. It is the reason the system works as a single coherent product rather than a collection of disconnected scripts. For anyone building production-grade data applications, that kind of tight integration between computation and presentation is genuinely rare.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;#MachineLearning #DataScience #Streamlit #Python #XGBoost #KMeans #SHAP #ProductAnalytics #UserSuccess #ChurnPrediction #Zerve #BuildInPublic #MLEngineering #HackathonProject #DataDriven&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>showdev</category>
    </item>
    <item>
      <title>RAG — Building Reliable AI Pipelines</title>
      <dc:creator>Madhesh T</dc:creator>
      <pubDate>Sat, 28 Feb 2026 17:46:53 +0000</pubDate>
      <link>https://dev.to/madhesh00/rag-building-reliable-ai-pipelines-4kfi</link>
      <guid>https://dev.to/madhesh00/rag-building-reliable-ai-pipelines-4kfi</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;AUTHOR INTRO&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I am Madhesh, a passionate developer with a strong interest in Agentic AI and DevOps. I enjoy learning new things, and I have always wanted to start writing blogs to connect with people. I chose to work on RAG because large language models (LLMs) are everywhere, and RAG adds significant power to them by providing proper context for user queries.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ABSTRACT&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;LLMs often hallucinate on domain-specific or recent data because they don’t have the proper context for user queries. Traditional LLM outputs rely solely on trained data, which may not contain up-to-date or domain-specific information. RAG overcomes these problems with strong retrieval pipelines. In this blog, I walk through designing and implementing a complete RAG pipeline using Elastic as the vector database. From ingesting documents to semantic retrieval and LLM augmentation, discover how Elastic’s vector capabilities deliver accurate, hallucination-resistant AI applications.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;NAIVE SEARCH (KEYWORD SEARCH)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The naive way to search for relevant content in a document or database is by using a basic keyword search.&lt;/p&gt;

&lt;p&gt;Example - search in a file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"keyword"&lt;/span&gt; file.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example - SQL keyword search in a database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;column_name&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%keyword%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keyword search works by finding exact matches. But if the user uses different words with the same meaning, keyword search fails. That is where semantic search and vector embeddings become useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;TF-IDF&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;TF-IDF is a classic method to score how important a term is in a document relative to a corpus.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TF (Term Frequency)&lt;/strong&gt; looks at how many times a word appears in a specific document.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DF (Document Frequency)&lt;/strong&gt; is the number of documents where the word appears.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IDF (Inverse Document Frequency)&lt;/strong&gt; measures the importance of the word across the entire document set.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DF(t) = number of documents containing term t

IDF(t) = log(N / DF(t)),   where N = total number of documents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TF-IDF weights terms that are frequent in a document but rare in the corpus, giving more relevant ranking than pure keyword counts.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;BM25&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;BM25 is a ranking algorithm used in retrieval systems to determine the relevance of documents to a given user query. It is the default ranking algorithm used in systems like Elasticsearch and Whoosh. BM25 improves over TF-IDF by&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Normalizing for document length&lt;/li&gt;
&lt;li&gt;Saturating term frequency (more occurrences do not increase importance linearly)&lt;/li&gt;
&lt;li&gt;Producing better relevance scoring in practice&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compute BM25 in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rank_bm25&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BM25Okapi&lt;/span&gt;

&lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;machine learning is powerful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep learning uses neural networks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;machine learning and AI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;tokenized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;bm25&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BM25Okapi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;machine learning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bm25&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_scores&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;BM25 produces a score for each document based on the query and ranks them by relevance.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;VECTOR EMBEDDINGS&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When a user query uses a different word but similar meaning, keyword methods fail. This is where &lt;strong&gt;vector embeddings&lt;/strong&gt; solve the problem.&lt;/p&gt;

&lt;p&gt;Embeddings transform text into numerical vectors that capture semantic meaning. Similar texts have vectors close to each other in vector space.&lt;/p&gt;

&lt;p&gt;Generate embeddings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentenceTransformer&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;machine learning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep learning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# (2, 384)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;INTRO TO RAG PIPELINES&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;A RAG pipeline consists of several stages. The process of document ingestion occurs, and when an online query comes in, the retrieval of relevant documents and the generation of a response occur. Then, with the context it has, it augments and starts to generate an accurate response.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fmadhesh60%2FElastic-Blogathon%2Fblob%2Fmain%2FRAGArch1.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fmadhesh60%2FElastic-Blogathon%2Fblob%2Fmain%2FRAGArch1.png%3Fraw%3Dtrue" alt="RAG Architecture" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;RELEVANT CONTEXT AND PREPROCESSING&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;First, ingest raw data into the RAG system. To make it effective, choose proper preprocessing techniques:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Chunking&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Chunking breaks large documents into smaller pieces that are easier to index and retrieve. Good chunking balances context with retrieval efficiency.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;VECTOR DATABASE&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Once text is chunked and embedded into vectors, store it in a vector database (e.g., Elasticsearch). The vector DB stores embeddings and performs similarity search to match user queries with relevant chunks.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ELASTICSEARCH – SETUP &amp;amp; CODE&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Create index with vector field&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; PUT &lt;span class="s2"&gt;"localhost:9200/docs"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'
{
  "mappings": {
    "properties": {
      "text":       { "type": "text" },
      "vector":     { "type": "dense_vector", "dims": 384 }
    }
  }
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;2. Insert document with embedding&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"localhost:9200/docs/_doc"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'
{
  "text":   "machine learning is powerful",
  "vector": [0.12, -0.93, ...]   # real embedding vector
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;3. Query using BM25 (keyword search)&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; GET &lt;span class="s2"&gt;"localhost:9200/docs/_search"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'
{
  "query": {
    "match": {
      "text": "machine learning"
    }
  }
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;4. Query using Vector Similarity&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; GET &lt;span class="s2"&gt;"localhost:9200/docs/_search"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'
{
  "knn": {
    "field":        "vector",
    "query_vector": [0.12, -0.93, ...],
    "k":            3,
    "num_candidates": 10
  }
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;5. Hybrid Search (BM25 + Vector)&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; GET &lt;span class="s2"&gt;"localhost:9200/docs/_search"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'
{
  "query": {
    "bool": {
      "should": [
        { "match": { "text": "machine learning" }},
        {
          "knn": {
            "field":        "vector",
            "query_vector": [0.12, -0.93, ...],
            "k": 3,
            "num_candidates": 10
          }
        }
      ]
    }
  }
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hybrid search combines keyword ranking (BM25) and semantic ranking (vector similarity).&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;RERANKING&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Reranking is a post-processing step that improves result relevance by applying stronger scoring methods. It considers semantic relevance and similarity to reorder results for better quality. Reranking is more computationally expensive and is usually applied only to top results.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;INTEGRATING ELASTIC WITH LLMS&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Elastic can serve as the retrieval backend for a RAG system. When a user query arrives:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The query is embedded(converted to vector embeddings).&lt;/li&gt;
&lt;li&gt;Elastic retrieves the most similar chunks (vector search).&lt;/li&gt;
&lt;li&gt;The retrieved chunks are passed to the LLM.&lt;/li&gt;
&lt;li&gt;The LLM generates an answer grounded in retrieved context.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This integration reduces hallucination and increases response accuracy.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;PRODUCTION INSIGHT&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When building a RAG pipeline, most developers focus heavily on the LLM and ignore the retrieval layer. In practice, retrieval quality matters more than model size. If the retriever returns irrelevant chunks, even the best LLM will confidently generate incorrect answers. I realized this while experimenting with chunk sizes and indexing strategies and small changes in chunking and overlap significantly changed answer quality.&lt;/p&gt;

&lt;p&gt;Another important point is that hybrid search often performs better than pure vector search. Vector similarity is powerful for semantic understanding, but keyword signals still matter in production. In many cases, combining BM25 with vector search improved precision and reduced noise. Reranking also made a visible difference, especially when the initial retrieval returned loosely related results.&lt;/p&gt;

&lt;p&gt;Latency is another real-world factor that is often underestimated. Running embeddings, querying vectors, reranking, and then calling an LLM adds up quickly. In production systems, you must balance accuracy with response time. Tuning the top-K retrieval size, embedding model selection, and reranking depth directly impacts both performance and cost.&lt;/p&gt;

&lt;p&gt;Finally, data freshness matters. RAG systems must support continuous indexing. If documents are not updated properly, the system becomes stale and starts returning outdated context. In production, retrieval pipelines must be monitored just like any other backend service.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;DEPLOY RAG MODELS ON CLOUD&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Elastic Cloud provides a fully managed Elasticsearch environment with built-in scaling, security, and monitoring. Instead of managing nodes, shard allocation, replication, and cluster health manually, Elastic Cloud handles infrastructure operations. This allows developers to focus on indexing documents, embedding pipelines, hybrid retrieval, and LLM integration rather than maintaining search infrastructure.&lt;/p&gt;

&lt;p&gt;For a RAG pipeline, Elastic Cloud supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dense vector fields for storing embeddings&lt;/li&gt;
&lt;li&gt;kNN vector search for semantic retrieval&lt;/li&gt;
&lt;li&gt;BM25-based keyword search&lt;/li&gt;
&lt;li&gt;Hybrid search combining lexical and semantic signals&lt;/li&gt;
&lt;li&gt;Secure deployment with role-based access control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A production-ready RAG architecture on Elastic Cloud typically includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An embedding model (self-hosted or API-based)&lt;/li&gt;
&lt;li&gt;An Elastic Cloud deployment with vector-enabled indices&lt;/li&gt;
&lt;li&gt;A backend service that performs retrieval and prompt construction&lt;/li&gt;
&lt;li&gt;An LLM provider for generation&lt;/li&gt;
&lt;li&gt;Monitoring via Elastic Stack (metrics, logs, performance tracking)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As embeddings scale into millions of vectors, cluster sizing becomes critical. Elastic Cloud allows vertical and horizontal scaling by adjusting node size and instance count without downtime. This is essential when handling increasing search traffic or expanding document collections.&lt;/p&gt;

&lt;p&gt;Security is also a major factor. Elastic Cloud provides TLS encryption, API keys, and access controls out of the box. In AI applications dealing with private documents or enterprise data, this becomes non-negotiable.&lt;/p&gt;

&lt;p&gt;In real-world systems, RAG is not only about retrieval and generation quality. It is about cluster stability, index performance, scaling strategy, and operational visibility. Elastic Cloud provides the infrastructure layer that makes large-scale RAG systems stable, secure, and production-ready.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;CONCLUSION&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Engineers can over-engineer things. The true value of RAG lies in strengthening LLM responses with real context from scalable systems like Elasticsearch. RAG makes LLMs less prone to hallucination and vastly improves relevance and accuracy.&lt;/p&gt;

&lt;p&gt;If neither step 1 (retrieval) nor step 2 (generation) gives high-quality results, then consider improving both parts of a RAG pipeline and the retrieval components.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project Repository:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/madhesh60/Elastic-Blogathon" rel="noopener noreferrer"&gt;Github on RAG&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Note: The content of this blog is fully organic. AI was utilized solely for grammatical error correction and Structural alignment.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
