From 70K to 2K Tokens: Optimizing SQL Generation with RAG Architecture

Daniel Marques — Tue, 28 Oct 2025 12:04:04 +0000

My goal was to improve data insights for my company's business using RAG. I wanted to create a feature that would allow us to gain deeper insights into our business without needing to memorize every table created in the system.

Currently, my company has more than 100 tables—that's a lot to remember, along with all the specific rules that apply to each one.
To achieve this goal, I built a RAG solution using Amazon Bedrock (Claude 3) + Qdrant.

Using only the LLM, my implementation consumed at least 70,584 tokens when passing the entire schema in each request to the agent.

However, after implementing Qdrant with the RAG architecture, we reduced this number to just 1,906 tokens consumed.

Considering the cost per request (using the model: anthropic.claude-3-haiku-20240307-v1:0):

Before RAG: 70,584 ÷ 1,000,000 × $0.25 = $0.017646
After RAG: 1,906 ÷ 1,000,000 × $0.25 = $0.0004765

This clearly demonstrates how significant the cost savings are when using RAG.

What I Used:

Amazon Bedrock: To provide the LLM and create the RAG implementation.
Docker: To run Qdrant locally.
Qdrant: To vectorize the large SQL schema, making queries cheaper and faster.
ts-node: https://www.npmjs.com/package/ts-node

You can find the code here: https://github.com/DanielMarquesz/ai-query

1. Get Your Database Schema

First, you'll need your database schema. You can generate the DDL using tools like DBeaver, MySQL Workbench, or similar database management tools.

2. Set Up Qdrant Locally

I used Qdrant locally by following the official setup documentation: https://qdrant.tech/documentation/quickstart/

3. Install Required Dependencies

You will need to install: @aws-sdk/client-bedrock-runtime, @qdrant/js-client-rest'

Then create a file called collection.ts with the following code:

import { QdrantClient } from '@qdrant/js-client-rest'

const client = new QdrantClient({ url: 'http://localhost:6333' })

async function createCollection() {
  await client.createCollection('sql_schema_1', {
    vectors: {
      size: 1536, // embedding size based on amazon.titan-embed-text-v1. (check your model)
      distance: 'Cosine', // similarity metric
    },
  })
}

;(async () => {
  try {
    await createCollection()
    console.log('Created')
  } catch (error) {
    console.log(error)
  }
})()

4. Create the Embeddings Handler

Create a file called embeddings.ts with the following code:

import { readFileSync } from 'fs'

import { BedrockRuntimeClient, InvokeModelCommand } from '@aws-sdk/client-bedrock-runtime'
import { QdrantClient } from '@qdrant/js-client-rest'

const bedrock = new BedrockRuntimeClient({ region: process.env.REGION || 'us-east-1' })
const client = new QdrantClient({ url: 'http://localhost:6333' })
const sqlSchema = readFileSync('database.sql', 'utf-8')

const chunks = sqlSchema
  .split(';')
  .map(c => c.trim())
  .filter(Boolean)

async function getEmbedding(text: string) {
  const input = {
    inputText: text,
  }
  const command = new InvokeModelCommand({
    modelId: 'amazon.titan-embed-text-v1',
    contentType: 'application/json',
    accept: 'application/json',
    body: new TextEncoder().encode(JSON.stringify(input)),
  })
  const response = await bedrock.send(command)
  const raw = await response.body?.transformToString()
  const output = raw ? JSON.parse(raw) : {}
  return output.embedding
}

// Push embeddings to Qdrant
;(async () => {
  for (const [i, chunk] of chunks.entries()) {
    const vector = await getEmbedding(chunk)
    await client.upsert('sql_schema_1', {
      points: [
        {
          id: i,
          vector,
          payload: { text: chunk },
        },
      ],
    })
  }
})()

export async function queryRAG(prompt: string) {
  const promptVector = await getEmbedding(prompt)

  const searchResult = await client.search('sql_schema_1', {
    vector: promptVector,
    limit: 5, // top 5 relevant chunks
  })

  const context = searchResult.map(p => p.payload?.text).join('\n')

  return context
}

5. Create the AI Handler

Create a file called ai.handler.ts with the following code:

/* eslint-disable no-console */
import { APIGatewayProxyEventV2 } from 'aws-lambda'
// import { readFileSync } from 'fs'

import { BedrockRuntimeClient, InvokeModelCommand } from '@aws-sdk/client-bedrock-runtime'

import { queryRAG } from './embeddings'
// const sqlSchema = readFileSync('./database.sql', 'utf-8')

const REGION = process.env.REGION || 'us-east-1'
const MODEL_ID = 'anthropic.claude-3-haiku-20240307-v1:0'

const bedrock = new BedrockRuntimeClient({ region: REGION })

export const handler = async (event: APIGatewayProxyEventV2) => {
  try {
    console.time('Tempo')
    const body = event.body ? JSON.parse(event.body) : {}
    const prompt =
      body?.prompt ||
      'How many payment orders for exam purchases exist in the system today? And are they for appointment payments?'

    const context = await queryRAG(prompt)

    if (!prompt) {
      return {
        statusCode: 400,
        body: JSON.stringify({ error: "missing 'prompt' in the request body." }),
      }
    }

    const command = new InvokeModelCommand({
      modelId: MODEL_ID,
      contentType: 'application/json',
      accept: 'application/json',
      body: JSON.stringify({
        messages: [
          {
            role: 'user',
            content: [
              { type: 'text', text: `Relevant Context:\n${context}` },
              // { type: 'text', text: `Here is the schema you should base your work on.: ${sqlSchema}` },
              { type: 'text', text: prompt },
            ],
          },
        ],
        system: `
          - You are an expert in relational databases who creates accurate and optimized SQL queries.
          - Return only the raw query, ready to be copied and executed in the database, in the most optimized and performative way.
          - Return the SQL without line breaks.
        `,
        max_tokens: 500, // sets the maximum number of output tokens
        temperature: 0.2, // optional, to control creativity
        anthropic_version: 'bedrock-2023-05-31', // mandatory for claude models
      }),
    })

    const response = await bedrock.send(command)

    const rawOutput = await response.body?.transformToString()
    const output = rawOutput ? JSON.parse(rawOutput) : {}

    console.log(output.content[0].text)
    console.timeEnd('Tempo')
    // console.log(output)
    return {
      statusCode: 200,
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        prompt,
        response: output.output_text || output.completion || output,
      }),
    }
  } catch (error: any) {
    console.error('Error invoking Bedrock:', error)

    return {
      statusCode: 500,
      body: JSON.stringify({
        error: 'Failed to invoke Bedrock',
        details: error.message || error,
      }),
    }
  }
}

6. Run locally

By executing the ai.handler.ts file you should get an output with a custom SQL related to your schema.

I was planning to put this into a lambda to make easier to test locally you can create another file so simulate the event coming from a request and the trigger the code as well.

/* eslint-disable @typescript-eslint/no-empty-function */

import { APIGatewayProxyEventV2 } from 'aws-lambda'
import { handler } from './ai.handler'

const event = {}


handler(event as unknown as APIGatewayProxyEventV2)

Feel free to comment and reach me out: https://www.linkedin.com/in/daniel-marque-dev/