DEV Community

shashank ms
shashank ms

Posted on

Leveraging LLM for Business Intelligence

We are building a conversational BI agent that turns plain English questions into SQL, runs them against a local SQLite database, and summarizes the results for stakeholders. This helps teams get insights from structured data without maintaining a complex semantic layer. I shipped a version of this for an internal ops dashboard, and the code below is exactly what I used.

What you'll need

  • Python 3.10 or newer
  • The OpenAI SDK: pip install openai
  • An Oxlo.ai API key from https://portal.oxlo.ai
  • A local SQLite database (we will create one in Step 1)

Step 1: Create a sample database

I will start with a standalone SQLite file so you can run this without any external database infrastructure. The schema mimics a simple sales pipeline with orders, products, and regions.

import sqlite3

def init_db():
    conn = sqlite3.connect("sales.db")
    cursor = conn.cursor()
    cursor.executescript("""
        CREATE TABLE IF NOT EXISTS regions (
            region_id INTEGER PRIMARY KEY,
            region_name TEXT
        );
        CREATE TABLE IF NOT EXISTS products (
            product_id INTEGER PRIMARY KEY,
            product_name TEXT,
            category TEXT
        );
        CREATE TABLE IF NOT EXISTS orders (
            order_id INTEGER PRIMARY KEY,
            region_id INTEGER,
            product_id INTEGER,
            quantity INTEGER,
            unit_price REAL,
            order_date TEXT,
            FOREIGN KEY (region_id) REFERENCES regions(region_id),
            FOREIGN KEY (product_id) REFERENCES products(product_id)
        );
        INSERT INTO regions VALUES (1, 'North'), (2, 'South'), (3, 'East'), (4, 'West');
        INSERT INTO products VALUES 
            (1, 'Widget A', 'Hardware'), 
            (2, 'Widget B', 'Hardware'), 
            (3, 'SaaS Basic', 'Software');
        INSERT INTO orders VALUES 
            (1, 1, 1, 10, 99.0, '2024-01-15'),
            (2, 1, 3, 5, 199.0, '2024-02-10'),
            (3, 2, 1, 7, 99.0, '2024-01-20'),
            (4, 3, 2, 12, 149.0, '2024-03-05'),
            (5, 4, 3, 5, 199.0, '2024-01-30'),
            (6, 2, 2, 3, 149.0, '2024-02-25');
    """)
    conn.commit()
    conn.close()

if __name__ == "__main__":
    init_db()
    print("Created sales.db")

Step 2: Define the system prompt

The system prompt is the most important part of this build. It tells the model exactly what tables exist, what dialect to use, and how to format its answer so we can parse the SQL reliably.

SYSTEM_PROMPT = """You are a senior business intelligence analyst. Your job is to answer user questions by writing SQLite queries against the provided schema.

Database schema:
{schema}

Rules:
- Use only the tables and columns listed above.
- Write valid SQLite syntax.
- When calculating revenue, use quantity * unit_price.
- Return dates in YYYY-MM-DD format.
- Respond in this exact format:
  SQL: <the raw SQL query>
  ANALYSIS: <one sentence describing what the query does>

Do not include markdown code fences around the SQL."""

Step 3: Introspect the schema

Hardcoding schema strings breaks as soon as the data model changes. I pull the CREATE statements directly from SQLite so the prompt always stays current.

import sqlite3

def get_schema():
    conn = sqlite3.connect("sales.db")
    cursor = conn.cursor()
    cursor.execute(
        "SELECT sql FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%'"
    )
    rows = cursor.fetchall()
    conn.close()
    return "\n".join([r[0] for r in rows if r[0]])

Step 4: Generate SQL with Oxlo.ai

Now I wire the schema and the user question into an Oxlo.ai chat completion. I use Llama 3.3 70B because it follows structured formatting instructions reliably for code generation tasks. Because Oxlo.ai uses request-based pricing, these iterative BI calls stay predictable even when the schema prompt grows long. You can review pricing at https://oxlo.ai/pricing.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

def generate_sql(question: str, schema: str) -> str:
    prompt = SYSTEM_PROMPT.format(schema=schema)
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": prompt},
            {"role": "user", "content": question},
        ],
    )
    return response.choices[0].message.content

Step 5: Execute SQL safely

The model returns a block of text with SQL and an analysis line. I parse that out, run it against the local database, and guard against destructive operations by allowing only SELECT statements.

import re

def run_query(generated: str):
    match = re.search(r"SQL:\s*(.+?)(?:\nANALYSIS:|$)", generated, re.DOTALL)
    if not match:
        raise ValueError("Could not parse SQL from model output.")
    sql = match.group(1).strip()

    if not sql.upper().startswith("SELECT"):
        raise ValueError("Only SELECT statements are allowed.")

    conn = sqlite3.connect("sales.db")
    conn.row_factory = sqlite3.Row
    cursor = conn.cursor()
    cursor.execute(sql)
    rows = cursor.fetchall()
    columns = [description[0] for description in cursor.description]
    conn.close()

    lines = ["| " + " | ".join(columns) + " |"]
    lines.append("| " + " | ".join(["---"] * len(columns)) + " |")
    for row in rows:
        lines.append("| " + " | ".join(str(cell) for cell in row) + " |")

    return sql, "\n".join(lines)

Step 6: Summarize results

Raw tables are not enough for stakeholders. I send the results back to Oxlo.ai, this time using Qwen 3 32B for concise reasoning, and ask for a one sentence business insight.

def summarize(question: str, sql: str, table: str) -> str:
    summary_prompt = f"""You are a BI analyst presenting findings to a non-technical executive.
Question: {question}
SQL used: {sql}
Results:
{table}

Provide a one sentence business insight. Be specific about numbers."""

    response = client.chat.completions.create(
        model="qwen-3-32b",
        messages=[
            {"role": "system", "content": "You summarize data clearly and concisely."},
            {"role": "user", "content": summary_prompt},
        ],
    )
    return response.choices[0].message.content.strip()

Run it

Here is the complete entry point. I ask for the top region by revenue in Q1 2024.

if __name__ == "__main__":
    init_db()
    schema = get_schema()

    question = "What was the top region by revenue in Q1 2024?"

    raw = generate_sql(question, schema)
    print("=== Raw LLM Output ===")
    print(raw)
    print()

    sql, table = run_query(raw)
    print("=== Result Table ===")
    print(table)
    print()

    insight = summarize(question, sql, table)
    print("=== Business Insight ===")
    print(insight)

Example output:

=== Raw LLM Output ===
SQL: SELECT r.region_name, SUM(o.quantity * o.unit_price) as revenue
FROM orders o
JOIN regions r ON o.region_id = r.region_id
WHERE o.order_date >= '2024-01-01' AND o.order_date < '2024-04-01'
GROUP BY r.region_name
ORDER BY revenue DESC
LIMIT 1
ANALYSIS: Finds the top region by total revenue in Q1 2024.

=== Result Table ===
| region_name | revenue |
| --- | --- |
| North | 1985.0 |

=== Business Insight ===
North region led Q1 2024 with $1,985 in total revenue, driven by strong software subscription sales.

Next steps

Swap SQLite for your data warehouse by replacing the connection string and schema query with Postgres, BigQuery, or Snowflake equivalents. You can also add a second Oxlo.ai agent using DeepSeek V3.2 to generate matplotlib chart code from the result table and render visualizations automatically.

Top comments (0)