Khushi Singla

Posted on Dec 19, 2025

Building a Safe AI Database Assistant with Azure OpenAI, LangChain & Function Calling

#azure #openai #datascience #ai

From raw CSVs to a production-ready AI assistant that queries data safely — without hallucinating SQL.

In this post, I’ll walk through how I built an AI-powered data analyst using:

Azure OpenAI
LangChain
LangGraph
Function Calling
SQLite

The assistant can:

Analyze CSV data using pandas
Query a SQL database safely
Choose predefined backend functions automatically
Explain results clearly
Avoid hallucinations and unsafe SQL

🧩 Problem Statement

When working with AI models and databases, common problems include:

❌ Hallucinated SQL queries
❌ Unsafe eval or raw SQL execution
❌ No control over what the model can access
❌ No explanation of how results were computed

Goal:
Build an AI assistant that:

Answers analytical questions about COVID data
Uses only allowed tools
Never guesses
Explains every answer

📊 Dataset

We use the COVID all-states history dataset, which includes:

state
date
hospitalizedIncrease
positiveIncrease
…and more

The dataset is first used as:

A pandas DataFrame
A SQLite database

🧱 Architecture Overview

User Question
     ↓
Azure OpenAI (Assistant / LangChain)
     ↓
Tool Selection (Function / SQL / DataFrame)
     ↓
Safe Backend Execution
     ↓
Result
     ↓
Final Explanation

Key idea:

The model decides WHAT to do.
Your backend decides HOW it is done.

🔹 Part 1: Talking to Azure OpenAI via LangChain

We start by connecting to Azure OpenAI using AzureChatOpenAI:

llm = AzureChatOpenAI(
    azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/",
    api_key="YOUR_API_KEY",
    api_version="2024-12-01-preview",
    model="gpt-4o-mini"
)

A simple sanity check:

response = llm.invoke([
    HumanMessage(content="Hello, Azure OpenAI via LangChain!")
])
print(response.content)

🔹 Part 2: DataFrame Agent (CSV Analysis)

We load the CSV into pandas and expose controlled computation via a tool.

DataFrame Tool

@tool
def run_df(query: str) -> str:
    """Run Python code on the global dataframe `df` and return the result."""
    return str(eval(query))

⚠️ Note: In production, replace eval with a restricted execution layer.

Enforcing Tool Usage

llm_with_tools = llm.bind_tools([run_df])

The prompt forces the model to:

Use the tool
Perform actual pandas calculations
Explain results

🔹 Part 3: Moving from CSV → SQL (SQLite)

We convert the CSV into SQLite:

engine = create_engine("sqlite:///./db/test.db")

df.to_sql(
    name="all_states_history",
    con=engine,
    if_exists="replace",
    index=False
)

Now the same dataset can be queried via SQL.

🔹 Part 4: SQL Agent with LangGraph

Using LangGraph’s ReAct agent:

agent_executor_SQL = create_react_agent(
    model=llm,
    tools=toolkit.get_tools()
)

The system prompt enforces:

Only valid tables
Only specific columns
No hallucinated values
Markdown-only output

🔹 Part 5: Function Calling (No Raw SQL)

Instead of letting the model generate SQL, we define pre-approved backend functions.

Example Functions

def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date):
    ...

def get_positive_cases_for_state_on_date(state_abbr, specific_date):
    ...

Function Registry (Critical!)

FUNCTION_MAP = {
    "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date,
    "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date,
}

This ensures:

✅ Only allowed functions run
❌ No arbitrary code execution

🔹 Part 6: Azure OpenAI Function Calling (No Assistant API)

Using Chat Completions + functions:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    functions=functions,
    function_call="auto"
)

If the model calls a function:

Extract arguments
Route via FUNCTION_MAP
Execute backend logic
Send result back
Get final grounded answer

🔹 Part 7: Assistant API (Persistent Context)

Now we level up.

Creating the Assistant

assistant = client.beta.assistants.create(
    name="Covid Data Assistant",
    model="gpt-4o-mini",
    tools=[{"type": "function", "function": fn} for fn in functions]
)

Assistant Loop (Key Concept)

while True:
    run_status = client.beta.threads.runs.retrieve(...)

    if run_status.status == "requires_action":
        # extract function name
        # dispatch via FUNCTION_MAP
        # submit tool output

    elif run_status.status == "completed":
        break

The assistant remembers conversation context,
but never caches database results.

🧠 Key Takeaways

✅ What This Design Solves

Prevents SQL hallucinations
Enforces backend safety
Keeps AI answers grounded in data
Scales cleanly as tools grow

🧩 Mental Model

Layer	Responsibility
LLM	Reasoning & intent
Assistant	Tool selection
Backend	Data access
Function Map	Security

🎯 When to Use What?

Use Case	Best Choice
One-shot queries	Chat + function calling
Multi-turn analysis	Assistant API
CSV exploration	DataFrame tools
Production DB	Predefined SQL functions

🚀 Final Thoughts

This approach mirrors how real production AI systems are built:

AI decides what
Backend controls how
Data remains authoritative
Explanations remain transparent

Connect With Me

Let’s learn and build cool data science and AI projects together!

💼 LinkedIn: https://www.linkedin.com/in/singla-khushi/
🔗 GitHub: https://github.com/KhushiSingla-tech
📩 Comments below are always welcome!

DEV Community