DEV Community

Khushi Singla
Khushi Singla

Posted on

Building a Safe AI Database Assistant with Azure OpenAI, LangChain & Function Calling

From raw CSVs to a production-ready AI assistant that queries data safely — without hallucinating SQL.

In this post, I’ll walk through how I built an AI-powered data analyst using:

  • Azure OpenAI
  • LangChain
  • LangGraph
  • Function Calling
  • SQLite

The assistant can:

  • Analyze CSV data using pandas
  • Query a SQL database safely
  • Choose predefined backend functions automatically
  • Explain results clearly
  • Avoid hallucinations and unsafe SQL

🧩 Problem Statement

When working with AI models and databases, common problems include:

  • ❌ Hallucinated SQL queries
  • ❌ Unsafe eval or raw SQL execution
  • ❌ No control over what the model can access
  • ❌ No explanation of how results were computed

Goal:
Build an AI assistant that:

  • Answers analytical questions about COVID data
  • Uses only allowed tools
  • Never guesses
  • Explains every answer

📊 Dataset

We use the COVID all-states history dataset, which includes:

  • state
  • date
  • hospitalizedIncrease
  • positiveIncrease
  • …and more

The dataset is first used as:

  • A pandas DataFrame
  • A SQLite database

🧱 Architecture Overview

User Question
     ↓
Azure OpenAI (Assistant / LangChain)
     ↓
Tool Selection (Function / SQL / DataFrame)
     ↓
Safe Backend Execution
     ↓
Result
     ↓
Final Explanation
Enter fullscreen mode Exit fullscreen mode

Key idea:

The model decides WHAT to do.
Your backend decides HOW it is done.


🔹 Part 1: Talking to Azure OpenAI via LangChain

We start by connecting to Azure OpenAI using AzureChatOpenAI:

llm = AzureChatOpenAI(
    azure_endpoint="https://<your-endpoint>.cognitiveservices.azure.com/",
    api_key="YOUR_API_KEY",
    api_version="2024-12-01-preview",
    model="gpt-4o-mini"
)
Enter fullscreen mode Exit fullscreen mode

A simple sanity check:

response = llm.invoke([
    HumanMessage(content="Hello, Azure OpenAI via LangChain!")
])
print(response.content)
Enter fullscreen mode Exit fullscreen mode

🔹 Part 2: DataFrame Agent (CSV Analysis)

We load the CSV into pandas and expose controlled computation via a tool.

DataFrame Tool

@tool
def run_df(query: str) -> str:
    """Run Python code on the global dataframe `df` and return the result."""
    return str(eval(query))
Enter fullscreen mode Exit fullscreen mode

⚠️ Note: In production, replace eval with a restricted execution layer.

Enforcing Tool Usage

llm_with_tools = llm.bind_tools([run_df])
Enter fullscreen mode Exit fullscreen mode

The prompt forces the model to:

  • Use the tool
  • Perform actual pandas calculations
  • Explain results

🔹 Part 3: Moving from CSV → SQL (SQLite)

We convert the CSV into SQLite:

engine = create_engine("sqlite:///./db/test.db")

df.to_sql(
    name="all_states_history",
    con=engine,
    if_exists="replace",
    index=False
)
Enter fullscreen mode Exit fullscreen mode

Now the same dataset can be queried via SQL.


🔹 Part 4: SQL Agent with LangGraph

Using LangGraph’s ReAct agent:

agent_executor_SQL = create_react_agent(
    model=llm,
    tools=toolkit.get_tools()
)
Enter fullscreen mode Exit fullscreen mode

The system prompt enforces:

  • Only valid tables
  • Only specific columns
  • No hallucinated values
  • Markdown-only output

🔹 Part 5: Function Calling (No Raw SQL)

Instead of letting the model generate SQL, we define pre-approved backend functions.

Example Functions

def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date):
    ...
Enter fullscreen mode Exit fullscreen mode
def get_positive_cases_for_state_on_date(state_abbr, specific_date):
    ...
Enter fullscreen mode Exit fullscreen mode

Function Registry (Critical!)

FUNCTION_MAP = {
    "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date,
    "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date,
}
Enter fullscreen mode Exit fullscreen mode

This ensures:

  • ✅ Only allowed functions run
  • ❌ No arbitrary code execution

🔹 Part 6: Azure OpenAI Function Calling (No Assistant API)

Using Chat Completions + functions:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    functions=functions,
    function_call="auto"
)
Enter fullscreen mode Exit fullscreen mode

If the model calls a function:

  1. Extract arguments
  2. Route via FUNCTION_MAP
  3. Execute backend logic
  4. Send result back
  5. Get final grounded answer

🔹 Part 7: Assistant API (Persistent Context)

Now we level up.

Creating the Assistant

assistant = client.beta.assistants.create(
    name="Covid Data Assistant",
    model="gpt-4o-mini",
    tools=[{"type": "function", "function": fn} for fn in functions]
)
Enter fullscreen mode Exit fullscreen mode

Assistant Loop (Key Concept)

while True:
    run_status = client.beta.threads.runs.retrieve(...)

    if run_status.status == "requires_action":
        # extract function name
        # dispatch via FUNCTION_MAP
        # submit tool output

    elif run_status.status == "completed":
        break
Enter fullscreen mode Exit fullscreen mode

The assistant remembers conversation context,
but never caches database results.


🧠 Key Takeaways

✅ What This Design Solves

  • Prevents SQL hallucinations
  • Enforces backend safety
  • Keeps AI answers grounded in data
  • Scales cleanly as tools grow

🧩 Mental Model

Layer Responsibility
LLM Reasoning & intent
Assistant Tool selection
Backend Data access
Function Map Security

🎯 When to Use What?

Use Case Best Choice
One-shot queries Chat + function calling
Multi-turn analysis Assistant API
CSV exploration DataFrame tools
Production DB Predefined SQL functions

🚀 Final Thoughts

This approach mirrors how real production AI systems are built:

  • AI decides what
  • Backend controls how
  • Data remains authoritative
  • Explanations remain transparent

Connect With Me

Let’s learn and build cool data science and AI projects together!

Top comments (0)