DEV Community

shashank ms
shashank ms

Posted on

Building Business Intelligence Tools with LLM

Business intelligence is shifting from static dashboards to interactive, language-driven interfaces. Instead of learning SQL or navigating drag-and-drop builders, analysts and operators can ask questions in plain English and receive structured answers, generated charts, and narrative summaries. Large language models make this possible, but building a reliable BI agent requires careful prompt engineering, structured output constraints, and an inference backend that handles long schemas and multi-step reasoning without unpredictable costs.

Architecture of an LLM-Powered BI Assistant

A modern BI assistant typically combines three layers: a semantic layer that maps business terms to database schemas, a reasoning layer that plans queries and validates results, and a presentation layer that formats outputs into tables, charts, or summaries. The reasoning layer is where the LLM lives.

For the reasoning layer, you need models that handle long context windows and structured reasoning. Oxlo.ai offers DeepSeek V4 Flash with a 1M context window, which is useful when you need to fit entire database schemas, sample rows, and business logic into a single prompt. Llama 3.3 70B and Qwen 3 32B are strong options for agent workflows that require tool use and multi-turn planning.

Natural Language to Structured Queries

The core of most BI tools is translating natural language into SQL or API calls. This requires providing the model with schema context, relationship definitions, and business rules.

Here is an example using Oxlo.ai with JSON mode to enforce structured output:

from openai import OpenAI
import json

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key="YOUR_OXLO_API_KEY"
)

schema_context = """
Table: orders
- id (int)
- user_id (int)
- total (decimal)
- created_at (timestamp)

Table: users
- id (int)
- region (varchar)
- segment (varchar)
"""

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a BI assistant. Convert user questions into SQL. Respond in JSON with keys: sql, explanation, chart_type."},
        {"role": "user", "content": f"Schema:\n{schema_context}\n\nQuestion: What is the average order value by region last quarter?"}
    ],
    response_format={"type": "json_object"},
    stream=False
)

result = json.loads(response.choices[0].message.content)
print(result["sql"])
Enter fullscreen mode Exit fullscreen mode

Using JSON mode ensures the frontend can reliably parse the SQL and metadata. With Oxlo.ai, you can also enable streaming for the explanation while sending the structured query through a separate function call.

Multi-Step Reasoning and Data Validation

Complex BI questions often require breaking a question into subqueries, executing them, and synthesizing results. This is where reasoning models excel. DeepSeek R1 671B MoE and Kimi K2.6 are designed for advanced chain-of-thought reasoning and agentic coding, making them suitable for tasks that require join detection, aggregation validation, or anomaly explanation.

You can implement a ReAct-style loop where the LLM generates a thought, calls a tool to execute SQL, observes the result, and decides whether to refine the query or present the answer. Because Oxlo.ai uses request-based pricing, running multiple tool calls and reasoning steps in a single session does not inflate costs based on token volume. For BI workloads that involve large schemas and lengthy system prompts, this can be significantly more predictable than token-based billing. See Oxlo.ai's pricing details at https://oxlo.ai/pricing.

Handling Large Schemas and Documentation

Enterprise data warehouses often contain hundreds of tables with thousands of columns. Fitting this context into a prompt requires a model with a large context window. DeepSeek V4 Flash supports up to 1M tokens, allowing you to include comprehensive schema documentation, data dictionaries, and even sample data within the prompt.

Because Oxlo.ai charges per request rather than per token, sending a full schema dump in every prompt does not increase the inference cost. This encourages more accurate, context-rich queries instead of forcing developers to aggressively compress or truncate schema metadata to save on token expenses.

Streaming Narratives and Real-Time Dashboards

Beyond SQL generation, LLMs can generate executive summaries, anomaly narratives, and trend explanations. Streaming these responses improves perceived latency in the UI.

stream = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "Summarize the following sales data for an executive audience."},
        {"role": "user", "content": json.dumps(sales_data)}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Enter fullscreen mode Exit fullscreen mode

Oxlo.ai supports streaming across its LLM catalog with no cold starts on popular models, so BI dashboards can remain responsive even under variable load.

Vision-Enabled Reporting

Some BI workflows involve interpreting charts, screenshots of legacy reports, or scanned financial documents. Models like Kimi K2.6 and Gemma 3 27B support vision inputs, allowing your tool to accept an image of a chart and generate a structured analysis or updated SQL to reproduce it.

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {"role": "user", "content": [
            {"type": "text", "text": "Recreate the chart in this dashboard image as a SQL query against our warehouse."},
            {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}}
        ]}
    ]
)
Enter fullscreen mode Exit fullscreen mode

Schema Retrieval with Embeddings

When schemas are too large for even long-context windows, use embeddings to retrieve relevant tables. Oxlo.ai provides embedding models like BGE-Large and E5-Large through the same OpenAI-compatible API.

embedding = client.embeddings.create(
    model="bge-large",
    input="monthly recurring revenue by cohort"
)

# Use embedding to fetch relevant tables from a vector store,
# then inject only those tables into the LLM prompt.
Enter fullscreen mode Exit fullscreen mode

This retrieval-augmented approach keeps prompts focused while maintaining accuracy.

Why Request-Based Pricing Fits BI Workloads

BI tools are inherently agentic and context-heavy. A single user question might trigger schema retrieval, multi-step SQL generation, result interpretation, and narrative generation. Under token-based pricing, costs scale with schema length, conversation history, and reasoning verbosity.

Oxlo.ai uses flat per-request pricing. This means a request containing a 100,000-token schema costs the same as a short greeting. For teams building BI agents, this removes the penalty for rich context and makes budgeting straightforward. The platform also offers a free tier with 60 requests per day, which is sufficient for prototyping, and paid plans that scale to thousands of requests daily. Details are available at https://oxlo.ai/pricing.

Putting It Together

Building a BI tool with an LLM requires more than a chat interface. It demands reliable structured outputs, long-context support for schemas, multi-step reasoning for complex queries, and predictable costs that do not punish rich context. Oxlo.ai provides the model variety, OpenAI SDK compatibility, and request-based pricing structure that align with these requirements. Whether you are prototyping a natural language SQL interface or deploying an enterprise analytics agent, Oxlo.ai offers the inference backend to support it.

Top comments (0)