Fabio Hiroki

Posted on Dec 27 • Originally published at Medium

Chat With Your Database for Free: Build a Private AI Agent in Python

#ai #tutorial #python #mcp

You know the scenario. You are deep in flow state, solving a complex backend problem, when a Slack notification pops up from a Product Manager:

“Hey, can you quickly pull a list of all users who signed up in Germany last week? I need it for a meeting in 10 minutes.”

You know SQL. You could write that query in your sleep. But you shouldn’t have to.

Every minute you spend writing ad-hoc reporting queries is a minute stolen from building actual product features.

The solution is not to build another complex Admin Dashboard. The solution is to give your team a tool that speaks their language.

In this tutorial, we will build a “Chat with your Database” application. It allows non-technical users to ask questions in plain English, while an AI writes and executes the safe, read-only SQL under the hood.

Best of all? It can run 100% locally.

No API Keys: It costs $0.00.
No Data Privacy Risks: Your production schema never goes to OpenAI.
No “Glue Code” Nightmares: We will use the new Model Context Protocol (MCP) to keep the architecture clean.

By the end of this guide, you will have a deployable Docker container that turns “Hey, can you pull this data?” into “Sure, just ask the bot.”

Let’s reclaim your time.

The Blueprint

To keep this tutorial digestible, I will focus on the core logic of the AI integration. I won’t bore you with 50 lines of Python import statements.

You can grab the complete code (including the Docker setup and Seed Data generator) here:
👉 github.com/fabiothiroki/mcp-local-analyst

Clone it, run docker-compose up, and follow along.

The Architecture: “The USB-C for AI”

Before we write a single line of code, we need to understand why we are using the Model Context Protocol (MCP).

In the early days of LLMs, connecting a model to a database meant writing messy “glue code.” You had to manually inject table definitions into prompts, parse messy regex to find SQL, and handle every error edge-case yourself. It was brittle and hard to maintain.

MCP solves this by standardizing the connection. Think of it like a USB-C port for AI:

Resources (Context): The AI asks, “What data do I have?” Your server replies with the database schema.
Tools (Action): The AI decides, “I need to run a query.” Your server executes the function safely.

We are decoupling the Brain (the LLM) from the Hands (the Database execution).

Here is the stack we will build:

The Brain: Mistral 7B running on Ollama. (We chose Mistral because it follows strict JSON instructions better than many larger models).
The Interface: A Streamlit app where the user chats.
The Server: A Python script using FastMCP that holds the SQLite connection.

Step 1: The Environment (Docker is Your Friend)

We will use Docker to avoid “it works on my machine” syndrome. We need a container that holds our UI and our Database, but can still talk to the Ollama instance running on your host machine.
Become a member

Create a docker-compose.yml file. This is your “one-click” setup:

services:
  mcp-reporter:
    build: .
    container_name: pocket_analyst
    ports:
      - "8501:8501"
    volumes:
      - ./data:/app/data
    environment:
      # Crucial: This lets the container talk to Ollama on your laptop
      - OLLAMA_HOST=http://host.docker.internal:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"

Step 2: The Server (Giving the AI “Hands”)

We use the FastMCP library to create a server. This script does two things: exposes the database schema (Context) and provides a function to query it (Tool).

Note: I’ve omitted code for brevity. Check src/server.py in the repo for the full context.

# ... standard imports and DB setup (see repo) ...

@mcp.resource("sqlite://schema")
def get_schema() -> str:
    """Reads the database schema and returns it as context."""
    # In a real app, you would query sqlite_master dynamically
    return "CREATE TABLE transactions..."

@mcp.tool()
def query_database(sql: str) -> str:
    """
    Executes a read-only SQL query against the database.
    Args:
        sql: The SELECT statement to execute.
    """

    # Connect to the database and execute the query (see repo)

    # Return results as clean JSON dictionaries
    return json.dumps([dict(row) for row in cursor.fetchall()])

Step 3: The Brain (Why Mistral Wins)

Now we need the “Host” — the Streamlit app that manages the conversation.

We are using Ollama to run the model. But here is a critical insight from my testing: Use Mistral 7B, not Llama 3.

While Llama 3 is powerful, smaller models often struggle with strict JSON formatting. You ask for { “sql”: “…” } and Llama might reply with “Sure! Here is your JSON: { … }”. That extra conversational text breaks your parser.

Mistral 7B is remarkably disciplined. It shuts up and outputs the JSON we need to trigger our tool.

Here is the logic for src/app.py:

# ... imports and Streamlit page config omitted ...

# The System Prompt is where we "program" the behavior
# We explicitly tell the model about the schema and the output format
SYSTEM_PROMPT = """
You are a Data Analyst. 
1. Output ONLY a JSON object with the tool call.
2. The format must be: { "tool": "query_database", "sql": "SELECT..." }
3. Always convert cents to main currency units in the final answer.
"""

# ... inside the main chat loop ...

if prompt := st.chat_input("Ask a question..."):
    # 1. Send the user's request to Ollama
    # Note: We use format='json' to force structured output
    response = client.chat(
        model="mistral",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": prompt}
        ],
        format="json" 
    )

    # 2. Parse the tool call
    # (Full error handling and JSON parsing logic is in the repo)
    tool_call = json.loads(response['message']['content'])

    if tool_call['tool'] == 'query_database':
        # 3. Execute the SQL via our MCP Server
        result = call_mcp_tool(tool_call['sql'])

        # 4. Feed the data back to Mistral to summarize
        final_answer = client.chat(
            model="mistral",
            messages=[..., {"role": "user", "content": f"Data: {result}"}]
        )
        st.write(final_answer['message']['content'])

Step 4: Talking to Your Data

Once you run docker-compose up, your container spins up. It automatically runs a seed_data.py script (included in the repo) that fills your SQLite database with 2,000 realistic “Mock Stripe” transactions.

Now, let’s see why this is better than a static dashboard.

Test 1: The “Complex Filter”

In a traditional dashboard, if you wanted to see “Failed payments from Germany,” you’d need a filter for Status and a filter for Country.

The Prompt:

How many failed credit card payments did we receive from Germany yesterday?

The Logic:
Mistral analyzes the schema. It sees country_code, status, payment_method, and created_at. It generates:

SELECT count(*) FROM transactions 
WHERE country_code = 'DE' 
AND status = 'failed' 
AND payment_method = 'card'
AND created_at > date('now', '-1 day');

Test 2: The “Ad-Hoc Transformation”

This is the killer feature. Product Managers often want data formatted in a specific way for their own reports.

The Prompt:

Calculate the total EUR payments per country in the last month, format in CSV

The Result:
The MCP tool returns raw JSON. The LLM then takes that JSON and re-writes it as:

"Country Code","Total EUR Payments"
"US",2873932
"DE",1093878
"GB",533471
"BR",492770
"FR",454523
"JP",348649

The fatal flaw of traditional dashboards is that they only answer the questions you predicted users would ask; this architecture solves the problems you never saw coming, replacing a finite set of hard-coded buttons with the infinite flexibility of language.

Let’s keep growing.

Subscribe to level up your engineering career with confidence.

DEV Community