📦 All code examples in this article are available on GitHub:
github.com/andre-carbajal/sql-ai-database-solutions
Databases and AI used to live in separate worlds. Your PostgreSQL instance handled structured queries, and your ML pipeline ran somewhere else entirely. That separation is rapidly disappearing.
In 2025, your SQL database can store embeddings, answer natural language questions, power semantic search, and serve as the memory layer for autonomous AI agents — all without adding a separate vector database to your stack.
This article walks through four real-world patterns with working code:
- pgvector — storing and querying embeddings in PostgreSQL
- RAG pipelines — augmenting LLMs with your own database content
- NL2SQL / Text-to-SQL — letting users query your database in plain English
- AI Agents over SQL — autonomous agents that reason and act on your data
Prerequisites
pip install psycopg2-binary pgvector langchain langchain-openai openai sqlalchemy python-dotenv
# Docker: run PostgreSQL with pgvector support
docker run --name pgvector-demo \
-e POSTGRES_PASSWORD=secret \
-p 5432:5432 \
-d pgvector/pgvector:pg16
Set your environment variables:
# .env
OPENAI_API_KEY=sk-...
DATABASE_URL=postgresql://postgres:secret@localhost:5432/aidb
1. pgvector — Semantic Search Inside PostgreSQL
pgvector is a PostgreSQL extension that adds a native vector data type, allowing you to store high-dimensional embeddings and run similarity searches with standard SQL.
Why this matters
Instead of maintaining a separate vector database (Pinecone, Weaviate, Chroma), you keep your vectors beside your relational data. That means you can combine semantic search with SQL filters in a single query.
Setup
-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create a products table with an embedding column
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
category TEXT,
price DECIMAL(10, 2),
embedding VECTOR(1536) -- OpenAI text-embedding-3-small dimensions
);
-- Create an HNSW index for fast approximate nearest-neighbor search
CREATE INDEX ON products
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Inserting embeddings with Python
import os
import psycopg2
import numpy as np
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
conn = psycopg2.connect(os.environ["DATABASE_URL"])
cur = conn.cursor()
def get_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
products = [
("Wireless Noise-Cancelling Headphones", "Premium audio with 30hr battery", "electronics", 299.99),
("Ergonomic Office Chair", "Lumbar support, adjustable arms", "furniture", 449.00),
("Python Programming Book", "Comprehensive guide for beginners", "books", 39.99),
("Mechanical Keyboard", "Cherry MX switches, RGB backlit", "electronics", 129.00),
("Standing Desk", "Electric height-adjustable desk", "furniture", 599.00),
]
for name, description, category, price in products:
embedding = get_embedding(f"{name}: {description}")
cur.execute(
"""
INSERT INTO products (name, description, category, price, embedding)
VALUES (%s, %s, %s, %s, %s)
""",
(name, description, category, price, embedding)
)
conn.commit()
print("Products inserted with embeddings.")
Querying: hybrid semantic + structured SQL
def semantic_search(query: str, max_price: float = None, category: str = None, limit: int = 5):
query_embedding = get_embedding(query)
sql = """
SELECT
name,
description,
category,
price,
1 - (embedding <=> %s::vector) AS similarity
FROM products
WHERE 1=1
"""
params = [query_embedding]
if max_price:
sql += " AND price <= %s"
params.append(max_price)
if category:
sql += " AND category = %s"
params.append(category)
sql += " ORDER BY embedding <=> %s::vector LIMIT %s"
params += [query_embedding, limit]
cur.execute(sql, params)
return cur.fetchall()
# "I want something for my home office under $500"
results = semantic_search(
query="comfortable workspace setup for long work sessions",
max_price=500.0
)
for name, description, category, price, similarity in results:
print(f"[{similarity:.3f}] {name} (${price}) — {description}")
Output:
[0.891] Ergonomic Office Chair ($449.00) — Lumbar support, adjustable arms
[0.832] Mechanical Keyboard ($129.00) — Cherry MX switches, RGB backlit
[0.801] Standing Desk ($599.00) — Electric height-adjustable desk
The
<=>operator computes cosine distance. pgvector also supports<->(L2/Euclidean) and<#>(inner product).
2. RAG Pipeline — Your Database as an LLM Knowledge Base
Retrieval-Augmented Generation (RAG) lets you inject your own data into an LLM's context at query time. Instead of fine-tuning, you retrieve the most relevant chunks from your database and pass them as context.
The architecture
User Question
│
▼
Embed the Question
│
▼
Vector Search in PostgreSQL ──▶ Top-K relevant chunks
│
▼
Build LLM Prompt with context
│
▼
LLM generates answer grounded in your data
Full RAG pipeline with LangChain + pgvector
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import PGVector
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import TextLoader
# 1. Load and chunk your documents
loader = TextLoader("company_docs.txt")
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
chunks = splitter.split_documents(documents)
# 2. Store chunks + embeddings in PostgreSQL
CONNECTION_STRING = os.environ["DATABASE_URL"]
COLLECTION_NAME = "company_knowledge_base"
vector_store = PGVector.from_documents(
documents=chunks,
embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
collection_name=COLLECTION_NAME,
connection_string=CONNECTION_STRING,
)
# 3. Build a retrieval-augmented QA chain
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k": 4}
)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True
)
# 4. Ask questions
result = qa_chain.invoke({"query": "What is our refund policy?"})
print(result["result"])
print("\nSources:")
for doc in result["source_documents"]:
print(f" - {doc.metadata.get('source', 'unknown')}")
Adding metadata filters for multi-tenant RAG
# Store documents with tenant metadata
vector_store = PGVector.from_documents(
documents=chunks,
embedding=OpenAIEmbeddings(),
collection_name="docs",
connection_string=CONNECTION_STRING,
# Each chunk carries metadata you can filter on
)
# Retrieve only documents for a specific tenant
retriever = vector_store.as_retriever(
search_kwargs={
"k": 5,
"filter": {"tenant_id": "acme-corp"}
}
)
3. NL2SQL — Natural Language to SQL Queries
NL2SQL (also called Text-to-SQL) lets users query your database in plain English. An LLM reads your schema and translates a natural language question into a valid SQL query.
Naive approach: direct schema + prompt
from openai import OpenAI
client = OpenAI()
SCHEMA = """
Tables:
orders(id, customer_id, product_id, quantity, total_price, created_at, status)
customers(id, name, email, country, created_at)
products(id, name, category, price, stock)
"""
def nl_to_sql(question: str) -> str:
prompt = f"""You are a SQL expert. Given the following database schema, write a PostgreSQL query to answer the user's question.
Return ONLY the SQL query, no explanation.
Schema:
{SCHEMA}
Question: {question}
SQL:"""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0
)
return response.choices[0].message.content.strip()
# Real-world examples
questions = [
"What are the top 5 countries by total revenue this year?",
"Which products have never been ordered?",
"Show me customers who placed more than 3 orders in the last 30 days",
]
for q in questions:
print(f"Q: {q}")
print(f"SQL: {nl_to_sql(q)}")
print()
Output:
-- "What are the top 5 countries by total revenue this year?"
SELECT c.country, SUM(o.total_price) AS total_revenue
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE EXTRACT(YEAR FROM o.created_at) = EXTRACT(YEAR FROM CURRENT_DATE)
GROUP BY c.country
ORDER BY total_revenue DESC
LIMIT 5;
-- "Which products have never been ordered?"
SELECT p.id, p.name, p.category
FROM products p
LEFT JOIN orders o ON p.id = o.product_id
WHERE o.id IS NULL;
Production-grade NL2SQL with validation
Never run LLM-generated SQL without validation. Here's a safer pattern:
import sqlparse
import re
ALLOWED_STATEMENTS = {"SELECT"}
def is_safe_sql(sql: str) -> bool:
"""Reject anything that isn't a SELECT statement."""
parsed = sqlparse.parse(sql)
if not parsed:
return False
statement_type = parsed[0].get_type()
return statement_type in ALLOWED_STATEMENTS
def execute_nl_query(question: str, conn) -> list[dict]:
sql = nl_to_sql(question)
# Strip markdown fences if LLM added them
sql = re.sub(r"```
sql|
```", "", sql).strip()
if not is_safe_sql(sql):
raise ValueError(f"Unsafe SQL rejected: {sql}")
cur = conn.cursor()
cur.execute(sql)
columns = [desc[0] for desc in cur.description]
rows = cur.fetchall()
return [dict(zip(columns, row)) for row in rows]
Few-shot prompting for better accuracy
Adding examples dramatically improves query accuracy on complex schemas:
FEW_SHOT_EXAMPLES = """
Example 1:
Question: How many orders were placed last month?
SQL: SELECT COUNT(*) FROM orders WHERE created_at >= DATE_TRUNC('month', NOW() - INTERVAL '1 month') AND created_at < DATE_TRUNC('month', NOW());
Example 2:
Question: What is the average order value by product category?
SQL: SELECT p.category, AVG(o.total_price) AS avg_order_value FROM orders o JOIN products p ON o.product_id = p.id GROUP BY p.category ORDER BY avg_order_value DESC;
"""
def nl_to_sql_fewshot(question: str) -> str:
prompt = f"""You are a SQL expert for a PostgreSQL e-commerce database.
Schema:
{SCHEMA}
{FEW_SHOT_EXAMPLES}
Question: {question}
SQL:"""
# ... same API call as before
4. AI Agents over SQL — Autonomous Database Reasoning
The most powerful pattern: an AI agent that can autonomously decide which queries to run, inspect the results, and iterate to answer complex multi-step questions.
Building a SQL agent with LangChain + custom tools
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI
from langchain.tools import tool
from langchain import hub
import psycopg2
conn = psycopg2.connect(os.environ["DATABASE_URL"])
@tool
def query_database(sql: str) -> str:
"""
Execute a read-only SQL SELECT query against the e-commerce database.
Use this to retrieve data needed to answer user questions.
Never use INSERT, UPDATE, DELETE, or DROP.
"""
if not is_safe_sql(sql):
return "Error: Only SELECT queries are allowed."
try:
cur = conn.cursor()
cur.execute(sql)
columns = [desc[0] for desc in cur.description]
rows = cur.fetchmany(50) # cap results
result = [dict(zip(columns, row)) for row in rows]
return str(result)
except Exception as e:
return f"Query error: {e}"
@tool
def get_schema() -> str:
"""
Returns the database schema — table names, columns, and types.
Call this first to understand what data is available before writing queries.
"""
cur = conn.cursor()
cur.execute("""
SELECT table_name, column_name, data_type
FROM information_schema.columns
WHERE table_schema = 'public'
ORDER BY table_name, ordinal_position;
""")
rows = cur.fetchall()
schema = {}
for table, column, dtype in rows:
schema.setdefault(table, []).append(f"{column} ({dtype})")
return "\n".join(
f"{table}: {', '.join(cols)}"
for table, cols in schema.items()
)
@tool
def semantic_product_search(query: str) -> str:
"""
Search for products using semantic/natural language similarity.
Use this when the user describes what they want rather than specifying exact product names.
"""
results = semantic_search(query, limit=5)
return str([{"name": r[0], "description": r[1], "price": r[3]} for r in results])
# Assemble the agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [get_schema, query_database, semantic_product_search]
prompt = hub.pull("hwchase17/openai-tools-agent")
agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
max_iterations=10
)
# Complex multi-step question
response = agent_executor.invoke({
"input": """
I'm looking for ergonomic office products.
Find the top 3 most relevant products, then check if we have customers
from Germany who bought office furniture in the past 6 months.
Give me a summary of both findings.
"""
})
print(response["output"])
The agent will autonomously:
- Call
semantic_product_search("ergonomic office products") - Call
get_schema()to understand the tables - Write and run a SQL query for German office furniture buyers
- Synthesize both results into a coherent answer
Putting It All Together: Architecture Overview
┌─────────────────────────────────────────────────────┐
│ PostgreSQL Database │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ products │ │ embeddings │ │ orders │ │
│ │ customers │ │ (pgvector) │ │ ... │ │
│ └─────────────┘ └──────────────┘ └───────────┘ │
└───────────────────────┬─────────────────────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Semantic │ │ NL2SQL │ │ RAG │
│ Search │ │ Agent │ │ Pipeline │
└──────────┘ └──────────┘ └──────────┘
│ │ │
└────────────┼────────────┘
▼
┌──────────┐
│ LLM │
│ (GPT-4o) │
└──────────┘
│
▼
User Response
Performance Tips
Index strategy for pgvector:
-- HNSW: faster queries, slower inserts — best for production read-heavy workloads
CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- IVFFlat: faster inserts, slightly less accurate — good for bulk loading
CREATE INDEX ON products USING ivfflat (embedding vector_l2_ops)
WITH (lists = 100);
-- Rule of thumb: lists ≈ sqrt(number_of_rows)
Batching embeddings:
# Don't call the embeddings API one row at a time
texts = [f"{p['name']}: {p['description']}" for p in products]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts # batch up to 2048 inputs at once
)
embeddings = [item.embedding for item in response.data]
Caching repeated NL2SQL queries:
import hashlib
import json
query_cache: dict[str, str] = {}
def cached_nl_to_sql(question: str) -> str:
key = hashlib.md5(question.lower().strip().encode()).hexdigest()
if key not in query_cache:
query_cache[key] = nl_to_sql(question)
return query_cache[key]
Security Checklist
Before shipping any of these patterns to production:
- ✅ Allowlist only SELECT — reject DDL and DML from LLM-generated SQL
- ✅ Use parameterized queries when executing with user-supplied values
- ✅ Create a read-only DB user for your AI layer
- ✅ Set query timeouts to prevent expensive accidental full-table scans
- ✅ Cap result set sizes from agent tool calls
- ✅ Log all LLM-generated queries for auditing
- ✅ Never expose raw error messages — they leak schema information
-- Create a restricted role for your AI application
CREATE ROLE ai_readonly;
GRANT CONNECT ON DATABASE aidb TO ai_readonly;
GRANT USAGE ON SCHEMA public TO ai_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO ai_readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO ai_readonly;
Public Repository
All the code from this article — including Docker Compose setup, schema migrations, and ready-to-run examples — is in the companion repository:
The repo structure:
sql-ai-database-solutions/
├── docker-compose.yml # PostgreSQL + pgvector, ready to go
├── schema/
│ └── 001_init.sql # Tables, pgvector extension, indexes
├── examples/
│ ├── 01_pgvector_search.py # Semantic search
│ ├── 02_rag_pipeline.py # RAG with LangChain
│ ├── 03_nl2sql.py # Natural language to SQL
│ └── 04_sql_agent.py # Autonomous SQL agent
├── .env.example
└── README.md
What's Next?
The patterns here are just the start. Some directions worth exploring:
- DB-GPT — an open-source, full-stack AI database framework
- Vanna.AI — RAG-based NL2SQL with self-learning capabilities
- pgai — Timescale's extension for running ML directly inside PostgreSQL
- Multi-agent SQL workflows — agents that collaborate across multiple databases
The line between "database" and "AI system" is getting thinner every month. If you're already running PostgreSQL, you're closer to a production AI stack than you might think.
Have questions or want to share what you've built? Drop a comment below or open an issue on the repo!
Top comments (0)