Introduction
Building dynamic reports often requires writing SQL queries, which can be inefficient and difficult to scale in a production environment. To streamline this process, I built a solution using Large Language Models (LLMs) that generates reports from natural language queries.
I implemented this in Ruby on Rails 💎, my preferred language, but the approach can be applied in any language like Python 🐍, Java ☕, or JavaScript 📜.
The Problem
Users need to retrieve data insights from databases without writing SQL queries manually. A system should:
- Understand user intent 🧠 – Identify the relevant database table.
- Generate the appropriate SQL query 📝 – Convert natural language queries into executable SQL.
- Fetch and return results in a structured format 📊 – Output the data in an easy-to-consume format.
This approach ensures a seamless experience for non-technical users while leveraging the power of LLMs to dynamically generate reports.
The Solution
I built a modular system that consists of three primary components:
1. Llm::Chat - The OpenAI Chat API Wrapper
This class handles communication with OpenAI's Chat API.
require 'net/http'
require 'json'
class Llm::Chat
OPENAI_API_URL = "https://api.openai.com/v1/chat/completions"
API_KEY = ENV['OPENAI_API_KEY']
def initialize(payload:)
@payload = payload
end
def call
response = request_openai
parse_response(response)
end
private
def request_openai
uri = URI(OPENAI_API_URL)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri.path, headers)
request.body = @payload.to_json
http.request(request)
end
def headers
{
"Content-Type" => "application/json",
"Authorization" => "Bearer #{API_KEY}"
}
end
def parse_response(response)
JSON.parse(response.body)["choices"]&.first["message"]["content"].strip
rescue
nil
end
end
2. Identifying the Relevant Table
The first step is to determine the database table that best matches the user's query. This is handled by TableIdentifier
.
class TableIdentifier
def initialize(query:)
@query = query
end
def call
chat
end
private
def chat
Llm::Chat.new(payload: chat_payload).call
end
def chat_payload
{
"model": "gpt-4",
"messages": [
{ "role": "system", "content": "Given a user query, determine the most relevant table or tables from [users, departments, tickets]. If the query involves multiple tables (e.g., grouping users by department), return a comma-separated list of table names. Only return the table name(s) with no extra text." },
{ "role": "user", "content": "#{@query}" }
],
"max_tokens": 100
}
end
end
Example Table Structures
For reference, here are example table structures:
Users Table 👥
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255),
status VARCHAR(50),
department_id INT,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
Departments Table 🏢
CREATE TABLE departments (
id INT PRIMARY KEY,
name VARCHAR(255),
manager_id INT,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
Tickets Table 🎫
CREATE TABLE tickets (
id INT PRIMARY KEY,
user_id INT,
subject VARCHAR(255),
status VARCHAR(50),
created_at TIMESTAMP,
updated_at TIMESTAMP
);
3. Generating and Executing the SQL Query
Once the table is identified, the next step is generating a valid SQL query. This is done using ReportGenerator
.
class ReportGenerator
require "#{Rails.root}/lib/rts/sender"
require "#{Rails.root}/lib/llm"
def initialize(query:)
@query = query
end
def call
report
end
private
def report
[
{ type: "text", data: "Here is your report" },
{ type: "table", data: ActiveRecord::Base.connection.select_all(query).to_a }
]
end
def table_structure
ActiveRecord::Base.connection.execute("SHOW CREATE TABLE #{table_name}").first[1]
end
def table_name
TableIdentifier.new(query: @query).call
end
def query
Llm::Chat.new(payload: query_payload).call
end
def query_payload
{
"model": "gpt-4",
"messages": [
{ "role": "system", "content": "Generate a MySQL query based on the table structure: #{table_structure}. Support queries involving multiple tables where applicable (e.g., grouping users by department). Only return the SQL query as plain text with no formatting, explanations, or markdown." },
{ "role": "user", "content": "#{@query}" }
],
"max_tokens": 1000
}
end
end
Example Usage 🛠️
With this setup, generating a report is as simple as making a method call:
ReportGenerator.new(query: "count of inactive users").call
ReportGenerator.new(query: "list of active users").call
ReportGenerator.new(query: "number of users per department").call
For a query like "count of inactive users", the system will generate:
SELECT COUNT(*) FROM users WHERE status = 'inactive';
For a query like "number of users per department", the system will generate:
SELECT d.name, COUNT(u.id)
FROM users u
JOIN departments d ON u.department_id = d.id
GROUP BY d.name;
Disclaimer ⚠️
The LLM-generated prompts may require tweaking based on trial and error to achieve optimal results. You may need to fine-tune them based on your database schema and specific reporting requirements.
Benefits of This Approach 🚀
- No manual SQL writing ✅ – Users can generate reports using natural language.
- Adaptability 🔄 – The model can be fine-tuned to support new tables or complex queries.
- Security 🔒 – The system ensures only relevant tables and safe queries are executed.
- Scalability 📈 – Works across multiple datasets without custom development for each request.
Conclusion 🎯
By leveraging LLMs, we can automate the process of translating user intent into SQL queries, making data retrieval seamless and efficient. This approach eliminates the need for manual query writing while ensuring accuracy and adaptability.
Would you implement a similar solution in your application? Let me know your thoughts!
Top comments (0)