<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daniel Marques</title>
    <description>The latest articles on DEV Community by Daniel Marques (@daniel_marques_8020831cd9).</description>
    <link>https://dev.to/daniel_marques_8020831cd9</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3129608%2F39b2a56d-f2c9-45d0-992d-e7691b8e6681.jpeg</url>
      <title>DEV Community: Daniel Marques</title>
      <link>https://dev.to/daniel_marques_8020831cd9</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/daniel_marques_8020831cd9"/>
    <language>en</language>
    <item>
      <title>From 70K to 2K Tokens: Optimizing SQL Generation with RAG Architecture</title>
      <dc:creator>Daniel Marques</dc:creator>
      <pubDate>Tue, 28 Oct 2025 12:04:04 +0000</pubDate>
      <link>https://dev.to/daniel_marques_8020831cd9/from-70k-to-2k-tokens-optimizing-sql-generation-with-rag-architecture-3nki</link>
      <guid>https://dev.to/daniel_marques_8020831cd9/from-70k-to-2k-tokens-optimizing-sql-generation-with-rag-architecture-3nki</guid>
      <description>&lt;p&gt;My goal was to improve data insights for my company's business using RAG. I wanted to create a feature that would allow us to gain deeper insights into our business without needing to memorize every table created in the system.&lt;/p&gt;

&lt;p&gt;Currently, my company has more than 100 tables—that's a lot to remember, along with all the specific rules that apply to each one.&lt;br&gt;
To achieve this goal, I built a RAG solution using Amazon Bedrock (Claude 3) + Qdrant.&lt;/p&gt;

&lt;p&gt;Using only the LLM, my implementation consumed at least 70,584 tokens when passing the entire schema in each request to the agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ri515qpn7v7210cvlfp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ri515qpn7v7210cvlfp.png" alt=" " width="429" height="59"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, after implementing Qdrant with the RAG architecture, we reduced this number to just 1,906 tokens consumed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0hls82m6o59cwe05m0v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0hls82m6o59cwe05m0v.png" alt=" " width="429" height="59"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Considering the cost per request (using the model: anthropic.claude-3-haiku-20240307-v1:0):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before RAG: 70,584 ÷ 1,000,000 × $0.25 = $0.017646&lt;/li&gt;
&lt;li&gt;After RAG: 1,906 ÷ 1,000,000 × $0.25 = $0.0004765&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvs6niewo7rzz5wvewaof.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvs6niewo7rzz5wvewaof.png" alt=" " width="628" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This clearly demonstrates how significant the cost savings are when using RAG.&lt;/p&gt;



&lt;p&gt;What I Used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Bedrock&lt;/strong&gt;: To provide the LLM and create the RAG implementation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt;: To run Qdrant locally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qdrant&lt;/strong&gt;: To vectorize the large SQL schema, making queries cheaper and faster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ts-node&lt;/strong&gt;: &lt;a href="https://www.npmjs.com/package/ts-node" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/ts-node&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can find the code here: &lt;a href="https://github.com/DanielMarquesz/ai-query" rel="noopener noreferrer"&gt;https://github.com/DanielMarquesz/ai-query&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  1.  Get Your Database Schema
&lt;/h3&gt;

&lt;p&gt;First, you'll need your database schema. You can generate the DDL using tools like DBeaver, MySQL Workbench, or similar database management tools.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Set Up Qdrant Locally
&lt;/h3&gt;

&lt;p&gt;I used Qdrant locally by following the official setup documentation: &lt;a href="https://qdrant.tech/documentation/quickstart/" rel="noopener noreferrer"&gt;https://qdrant.tech/documentation/quickstart/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Install Required Dependencies
&lt;/h3&gt;

&lt;p&gt;You will need to install: &lt;code&gt;@aws-sdk/client-bedrock-runtime&lt;/code&gt;, &lt;code&gt;@qdrant/js-client-rest'&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Then create a file called collection.ts with the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { QdrantClient } from '@qdrant/js-client-rest'

const client = new QdrantClient({ url: 'http://localhost:6333' })

async function createCollection() {
  await client.createCollection('sql_schema_1', {
    vectors: {
      size: 1536, // embedding size based on amazon.titan-embed-text-v1. (check your model)
      distance: 'Cosine', // similarity metric
    },
  })
}

;(async () =&amp;gt; {
  try {
    await createCollection()
    console.log('Created')
  } catch (error) {
    console.log(error)
  }
})()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Create the Embeddings Handler
&lt;/h3&gt;

&lt;p&gt;Create a file called &lt;code&gt;embeddings.ts&lt;/code&gt; with the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { readFileSync } from 'fs'

import { BedrockRuntimeClient, InvokeModelCommand } from '@aws-sdk/client-bedrock-runtime'
import { QdrantClient } from '@qdrant/js-client-rest'

const bedrock = new BedrockRuntimeClient({ region: process.env.REGION || 'us-east-1' })
const client = new QdrantClient({ url: 'http://localhost:6333' })
const sqlSchema = readFileSync('database.sql', 'utf-8')

const chunks = sqlSchema
  .split(';')
  .map(c =&amp;gt; c.trim())
  .filter(Boolean)

async function getEmbedding(text: string) {
  const input = {
    inputText: text,
  }
  const command = new InvokeModelCommand({
    modelId: 'amazon.titan-embed-text-v1',
    contentType: 'application/json',
    accept: 'application/json',
    body: new TextEncoder().encode(JSON.stringify(input)),
  })
  const response = await bedrock.send(command)
  const raw = await response.body?.transformToString()
  const output = raw ? JSON.parse(raw) : {}
  return output.embedding
}

// Push embeddings to Qdrant
;(async () =&amp;gt; {
  for (const [i, chunk] of chunks.entries()) {
    const vector = await getEmbedding(chunk)
    await client.upsert('sql_schema_1', {
      points: [
        {
          id: i,
          vector,
          payload: { text: chunk },
        },
      ],
    })
  }
})()

export async function queryRAG(prompt: string) {
  const promptVector = await getEmbedding(prompt)

  const searchResult = await client.search('sql_schema_1', {
    vector: promptVector,
    limit: 5, // top 5 relevant chunks
  })

  const context = searchResult.map(p =&amp;gt; p.payload?.text).join('\n')

  return context
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Create the AI Handler
&lt;/h3&gt;

&lt;p&gt;Create a file called &lt;code&gt;ai.handler.ts&lt;/code&gt; with the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/* eslint-disable no-console */
import { APIGatewayProxyEventV2 } from 'aws-lambda'
// import { readFileSync } from 'fs'

import { BedrockRuntimeClient, InvokeModelCommand } from '@aws-sdk/client-bedrock-runtime'

import { queryRAG } from './embeddings'
// const sqlSchema = readFileSync('./database.sql', 'utf-8')

const REGION = process.env.REGION || 'us-east-1'
const MODEL_ID = 'anthropic.claude-3-haiku-20240307-v1:0'

const bedrock = new BedrockRuntimeClient({ region: REGION })

export const handler = async (event: APIGatewayProxyEventV2) =&amp;gt; {
  try {
    console.time('Tempo')
    const body = event.body ? JSON.parse(event.body) : {}
    const prompt =
      body?.prompt ||
      'How many payment orders for exam purchases exist in the system today? And are they for appointment payments?'

    const context = await queryRAG(prompt)

    if (!prompt) {
      return {
        statusCode: 400,
        body: JSON.stringify({ error: "missing 'prompt' in the request body." }),
      }
    }

    const command = new InvokeModelCommand({
      modelId: MODEL_ID,
      contentType: 'application/json',
      accept: 'application/json',
      body: JSON.stringify({
        messages: [
          {
            role: 'user',
            content: [
              { type: 'text', text: `Relevant Context:\n${context}` },
              // { type: 'text', text: `Here is the schema you should base your work on.: ${sqlSchema}` },
              { type: 'text', text: prompt },
            ],
          },
        ],
        system: `
          - You are an expert in relational databases who creates accurate and optimized SQL queries.
          - Return only the raw query, ready to be copied and executed in the database, in the most optimized and performative way.
          - Return the SQL without line breaks.
        `,
        max_tokens: 500, // sets the maximum number of output tokens
        temperature: 0.2, // optional, to control creativity
        anthropic_version: 'bedrock-2023-05-31', // mandatory for claude models
      }),
    })

    const response = await bedrock.send(command)

    const rawOutput = await response.body?.transformToString()
    const output = rawOutput ? JSON.parse(rawOutput) : {}

    console.log(output.content[0].text)
    console.timeEnd('Tempo')
    // console.log(output)
    return {
      statusCode: 200,
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        prompt,
        response: output.output_text || output.completion || output,
      }),
    }
  } catch (error: any) {
    console.error('Error invoking Bedrock:', error)

    return {
      statusCode: 500,
      body: JSON.stringify({
        error: 'Failed to invoke Bedrock',
        details: error.message || error,
      }),
    }
  }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Run locally
&lt;/h3&gt;

&lt;p&gt;By executing the &lt;code&gt;ai.handler.ts&lt;/code&gt; file you should get an output with a custom SQL related to your schema. &lt;/p&gt;

&lt;p&gt;I was planning to put this into a lambda to make easier to test locally you can create another file so simulate the event coming from a request and the trigger the code as well.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/* eslint-disable @typescript-eslint/no-empty-function */

import { APIGatewayProxyEventV2 } from 'aws-lambda'
import { handler } from './ai.handler'

const event = {}


handler(event as unknown as APIGatewayProxyEventV2)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Feel free to comment and reach me out: &lt;a href="https://www.linkedin.com/in/daniel-marque-dev/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/daniel-marque-dev/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>rag</category>
      <category>ai</category>
      <category>aws</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
