angu10

Posted on Oct 9

Building Self-Correcting Database Agents with Meta's Llama-4-Scout: From Natural Language to SQL Insights

#llm #ai #agenticai #python

Introduction

The ability to query databases using natural language represents one of the most practical applications of large language models in enterprise settings. However, most text-to-SQL implementations suffer from a critical flaw: they generate queries once and hope for the best. When a query fails—due to incorrect table names, misunderstood schema relationships, or logical errors—the system simply returns an error message, leaving users frustrated.

In this technical guide, we'll build a sophisticated database agent using Meta's Llama-4-Scout that doesn't just generate SQL queries—it thinks through the problem, validates its approach, and most importantly, learns from its mistakes to self-correct. This agent implements a five-phase cognitive framework: Understand → Plan → Generate → Validate → Execute, with intelligent retry logic when queries fail.

Why Database Agents Matter

Business analysts, product managers, and non-technical stakeholders frequently need data insights but lack SQL expertise. Traditional BI tools require pre-built dashboards, creating bottlenecks when ad-hoc questions arise. A well-designed database agent democratizes data access while maintaining query safety and accuracy.

Understanding Llama-4-Scout for Structured Data Tasks

Llama-4-Scout excels at structured reasoning tasks for several reasons:

Schema comprehension: Understands relational database concepts including foreign keys, joins, and aggregations
Error diagnosis: Can parse SQL error messages and identify root causes
Iterative refinement: Maintains context across retry attempts to avoid repeating mistakes
Safety awareness: Can be prompted to avoid destructive operations

Implementation Architecture

Our implementation consists of four layers:

Database Layer: SQLite database with realistic employee/department data
Agent Cognitive Layer: Five-phase reasoning pipeline
Safety & Validation Layer: SQL injection prevention and query validation
User Interface Layer: Streamlit application for interaction

Let's build this step by step.

Step 1: Environment Setup and Dependencies

`python

Required packages

pip install streamlit together sqlite3 pandas

import streamlit as st
import sqlite3
import pandas as pd
from together import Together
import re
from datetime import datetime
import json
`

Step 2: Database Schema and Sample Data Creation

First, we'll create a realistic database with three interconnected tables representing a company's organizational structure:

`python
def create_sample_database():
"""Create a sample SQLite database with employee, department, and project tables"""
conn = sqlite3.connect('company.db')
cursor = conn.cursor()

# Drop existing tables if they exist
cursor.execute('DROP TABLE IF EXISTS projects')
cursor.execute('DROP TABLE IF EXISTS employees')
cursor.execute('DROP TABLE IF EXISTS departments')

# Create departments table
cursor.execute('''
CREATE TABLE departments (
    department_id INTEGER PRIMARY KEY,
    department_name TEXT NOT NULL,
    budget DECIMAL(12, 2),
    location TEXT
)
''')

# Create employees table with self-referential manager relationship
cursor.execute('''
CREATE TABLE employees (
    employee_id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT UNIQUE,
    hire_date DATE,
    salary DECIMAL(10, 2),
    department_id INTEGER,
    manager_id INTEGER,
    FOREIGN KEY (department_id) REFERENCES departments(department_id),
    FOREIGN KEY (manager_id) REFERENCES employees(employee_id)
)
''')

# Create projects table
cursor.execute('''
CREATE TABLE projects (
    project_id INTEGER PRIMARY KEY,
    project_name TEXT NOT NULL,
    department_id INTEGER,
    start_date DATE,
    end_date DATE,
    status TEXT,
    budget DECIMAL(10, 2),
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
)
''')

# Insert sample departments
departments = [
    (1, 'Engineering', 2500000.00, 'San Francisco'),
    (2, 'Sales', 1500000.00, 'New York'),
    (3, 'Marketing', 800000.00, 'Austin'),
    (4, 'Human Resources', 600000.00, 'San Francisco'),
    (5, 'Finance', 700000.00, 'New York')
]
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?, ?)', departments)

# Insert sample employees (managers first, then their reports)
employees = [
    # Managers (no manager_id)
    (1, 'Sarah', 'Johnson', 'sarah.j@company.com', '2018-01-15', 150000, 1, None),
    (2, 'Michael', 'Chen', 'michael.c@company.com', '2017-03-20', 145000, 2, None),
    (3, 'Emily', 'Rodriguez', 'emily.r@company.com', '2019-06-10', 125000, 3, None),
    (4, 'David', 'Kim', 'david.k@company.com', '2018-09-05', 130000, 4, None),
    (5, 'Lisa', 'Patel', 'lisa.p@company.com', '2017-11-12', 140000, 5, None),

    # Engineering team
    (6, 'James', 'Wilson', 'james.w@company.com', '2020-02-14', 115000, 1, 1),
    (7, 'Anna', 'Martinez', 'anna.m@company.com', '2021-05-18', 95000, 1, 1),
    (8, 'Robert', 'Taylor', 'robert.t@company.com', '2019-08-22', 120000, 1, 1),
    (9, 'Jennifer', 'Anderson', 'jennifer.a@company.com', '2022-01-10', 85000, 1, 6),
    (10, 'William', 'Thomas', 'william.t@company.com', '2021-11-30', 90000, 1, 6),

    # Sales team
    (11, 'Jessica', 'Moore', 'jessica.m@company.com', '2020-04-05', 95000, 2, 2),
    (12, 'Christopher', 'Jackson', 'chris.j@company.com', '2021-07-19', 88000, 2, 2),
    (13, 'Amanda', 'White', 'amanda.w@company.com', '2019-12-01', 105000, 2, 2),
    (14, 'Daniel', 'Harris', 'daniel.h@company.com', '2022-03-15', 75000, 2, 11),

    # Marketing team
    (15, 'Michelle', 'Martin', 'michelle.m@company.com', '2020-09-08', 82000, 3, 3),
    (16, 'Kevin', 'Thompson', 'kevin.t@company.com', '2021-10-20', 78000, 3, 3),
    (17, 'Rachel', 'Garcia', 'rachel.g@company.com', '2022-02-28', 72000, 3, 15),

    # HR team
    (18, 'Brian', 'Martinez', 'brian.m@company.com', '2019-05-14', 85000, 4, 4),
    (19, 'Nicole', 'Robinson', 'nicole.r@company.com', '2021-08-03', 75000, 4, 4),

    # Finance team
    (20, 'Andrew', 'Clark', 'andrew.c@company.com', '2018-07-22', 95000, 5, 5),
    (21, 'Stephanie', 'Lewis', 'stephanie.l@company.com', '2020-11-11', 88000, 5, 5),
    (22, 'Joshua', 'Lee', 'joshua.l@company.com', '2022-04-06', 80000, 5, 20)
]
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?, ?, ?, ?)', employees)

# Insert sample projects
projects = [
    (1, 'Cloud Migration', 1, '2024-01-01', '2024-12-31', 'In Progress', 500000),
    (2, 'Mobile App Redesign', 1, '2024-03-15', '2024-09-30', 'In Progress', 300000),
    (3, 'Q4 Sales Campaign', 2, '2024-10-01', '2024-12-31', 'Planning', 200000),
    (4, 'Enterprise Sales Initiative', 2, '2024-02-01', '2024-11-30', 'In Progress', 150000),
    (5, 'Brand Refresh', 3, '2024-05-01', '2024-08-31', 'Completed', 180000),
    (6, 'Social Media Strategy', 3, '2024-07-01', '2025-06-30', 'In Progress', 120000),
    (7, 'HR System Upgrade', 4, '2024-04-01', '2024-10-31', 'In Progress', 100000),
    (8, 'Financial Audit 2024', 5, '2024-01-01', '2024-03-31', 'Completed', 80000)
]
cursor.executemany('INSERT INTO projects VALUES (?, ?, ?, ?, ?, ?, ?)', projects)

conn.commit()
conn.close()

return "Database created successfully!"

def get_database_schema():
"""Extract and format database schema for the agent"""
conn = sqlite3.connect('company.db')
cursor = conn.cursor()

schema_info = []

# Get all tables
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()

for table in tables:
    table_name = table[0]
    schema_info.append(f"\n--- Table: {table_name} ---")

    # Get column information
    cursor.execute(f"PRAGMA table_info({table_name})")
    columns = cursor.fetchall()

    for col in columns:
        col_name = col[1]
        col_type = col[2]
        is_pk = " (PRIMARY KEY)" if col[5] else ""
        schema_info.append(f"  - {col_name}: {col_type}{is_pk}")

    # Get foreign key information
    cursor.execute(f"PRAGMA foreign_key_list({table_name})")
    fks = cursor.fetchall()

    if fks:
        schema_info.append("  Foreign Keys:")
        for fk in fks:
            schema_info.append(f"    - {fk[3]} -> {fk[2]}({fk[4]})")

conn.close()
return "\n".join(schema_info)`

Step 3: The Five-Phase Agent Cognitive Framework

Now we implement the core intelligence of our agent. Each phase has a specific responsibility in the query generation pipeline:

`
python

Initialize Together AI client

together_client = Together(api_key="your_api_key_here")

def phase_1_understand(question, schema):
"""
PHASE 1: UNDERSTAND
Analyze the natural language question and understand what data is needed
"""
instruction = f"""
You are a database expert analyzing a natural language question.

DATABASE SCHEMA:
{schema}

USER QUESTION: "{question}"

TASK: Analyze this question to understand:

What specific information is the user requesting?
Which tables contain this information?
What columns are relevant?
Are any calculations, aggregations, or groupings needed?
Are there any ambiguities in the question?

Provide a clear analysis of what data needs to be retrieved.
"""

try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=400,
        temperature=0.2
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Error in understanding phase: {str(e)}"

def phase_2_plan(question, understanding, schema, previous_error=None):
"""
PHASE 2: PLAN
Create a detailed plan for constructing the SQL query
"""
error_context = ""
if previous_error:
error_context = f"\n\nPREVIOUS ATTEMPT FAILED WITH ERROR:\n{previous_error}\n\nAdjust your plan to avoid this error."

instruction = f"""

You are a database expert creating a query plan.

DATABASE SCHEMA:
{schema}

USER QUESTION: "{question}"

YOUR UNDERSTANDING: {understanding}
{error_context}

TASK: Create a detailed SQL query plan:

Which tables need to be queried?
What JOIN operations are required (if any)?
What WHERE conditions are needed?
Are GROUP BY or aggregate functions needed?
Should results be sorted or limited?
What specific columns should be selected?

Provide a step-by-step plan in numbered format.
"""

try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=500,
        temperature=0.2
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Error in planning phase: {str(e)}"

def phase_3_generate(question, understanding, plan, schema, previous_error=None, previous_sql=None):
"""
PHASE 3: GENERATE
Generate the actual SQL query based on the plan
"""
error_context = ""
if previous_error and previous_sql:
error_context = f"""
\n\nPREVIOUS SQL ATTEMPT:
{previous_sql}

ERROR RECEIVED:
{previous_error}

IMPORTANT: Fix the specific error above. Common issues:

Incorrect table or column names (check schema carefully)
Missing JOIN conditions
Syntax errors
Incorrect aggregate functions
"""

instruction = f"""
You are a SQL expert generating a query.

DATABASE SCHEMA:
{schema}

USER QUESTION: "{question}"

YOUR UNDERSTANDING: {understanding}

YOUR PLAN: {plan}
{error_context}

TASK: Generate a valid SQLite SQL query that:

Follows the plan exactly
Uses correct table and column names from the schema
Includes proper JOIN syntax if needed
Has correct WHERE, GROUP BY, and ORDER BY clauses
Is properly formatted and includes comments

CRITICAL RULES:

Only use tables and columns that exist in the schema above
For SQLite, use proper syntax (e.g., no TOP keyword, use LIMIT instead)
Always use table aliases for clarity in JOINs
Double-check column names match the schema exactly

Respond with ONLY the SQL query, nothing else. Do not include markdown code blocks or explanations.
"""

try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=600,
        temperature=0.1
    )
    sql_query = response.choices[0].message.content.strip()

    # Clean up the SQL (remove markdown code blocks if present)
    sql_query = re.sub(r'^```sql\s*', '', sql_query)
    sql_query = re.sub(r'^```\s*', '', sql_query)
    sql_query = re.sub(r'\s*```$', '', sql_query)

    return sql_query.strip()
except Exception as e:
    return f"Error in generation phase: {str(e)}"

def phase_4_validate(sql_query, schema):
"""
PHASE 4: VALIDATE
Validate the SQL query for safety and correctness before execution
"""
validation_results = {
"is_safe": True,
"is_valid": True,
"warnings": [],
"errors": []
}

## Check for dangerous operations
dangerous_keywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER', 'TRUNCATE', 'GRANT', 'REVOKE']
sql_upper = sql_query.upper()

for keyword in dangerous_keywords:
    if keyword in sql_upper:
        validation_results["is_safe"] = False
        validation_results["errors"].append(f"Dangerous operation detected: {keyword}")

# Check for SELECT statement
if not sql_upper.strip().startswith('SELECT'):
    validation_results["is_safe"] = False
    validation_results["errors"].append("Only SELECT queries are allowed")

# Check for semicolon (potential SQL injection)
if sql_query.count(';') > 1:
    validation_results["warnings"].append("Multiple statements detected - potential security risk")

# Basic syntax check
if 'FROM' not in sql_upper:
    validation_results["is_valid"] = False
    validation_results["errors"].append("Invalid SQL: Missing FROM clause")

return validation_results

def phase_5_execute(sql_query):
"""
PHASE 5: EXECUTE
Execute the SQL query and return results
"""
try:
conn = sqlite3.connect('company.db')
df = pd.read_sql_query(sql_query, conn)
conn.close()
return df, None
except Exception as e:
return None, str(e)
`

Step 4: Self-Correction and Retry Logic

This is where our agent becomes truly intelligent. When a query fails, the agent analyzes the error and attempts to fix it:

`
python
def diagnose_error(sql_query, error_message, schema):
"""
Use the LLM to diagnose what went wrong and provide guidance
"""
instruction = f"""
You are a SQL debugging expert.

DATABASE SCHEMA:
{schema}

FAILED SQL QUERY:
{sql_query}

ERROR MESSAGE:
{error_message}

TASK: Diagnose the specific problem:

What caused this error?
Which part of the query is incorrect?
What should be changed to fix it?

Be specific and concise. Focus on the exact fix needed.
"""

try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=300,
        temperature=0.2
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Error in diagnosis: {str(e)}"

def execute_query_with_retry(question, schema, max_attempts=3):
"""
Complete pipeline with retry logic
"""
attempt_history = []

for attempt in range(max_attempts):
    attempt_info = {
        "attempt_number": attempt + 1,
        "understanding": "",
        "plan": "",
        "sql": "",
        "validation": {},
        "result": None,
        "error": None,
        "diagnosis": ""
    }

    # Get previous error context if this is a retry
    previous_error = attempt_history[-1]["error"] if attempt_history else None
    previous_sql = attempt_history[-1]["sql"] if attempt_history else None

    # Phase 1: Understand
    understanding = phase_1_understand(question, schema)
    attempt_info["understanding"] = understanding

    # Phase 2: Plan (with error context if retrying)
    plan = phase_2_plan(question, understanding, schema, previous_error)
    attempt_info["plan"] = plan

    # Phase 3: Generate SQL (with error context if retrying)
    sql_query = phase_3_generate(question, understanding, plan, schema, previous_error, previous_sql)
    attempt_info["sql"] = sql_query

    # Phase 4: Validate
    validation = phase_4_validate(sql_query, schema)
    attempt_info["validation"] = validation

    if not validation["is_safe"] or not validation["is_valid"]:
        attempt_info["error"] = "Query failed validation: " + ", ".join(validation["errors"])
        attempt_history.append(attempt_info)
        continue

    # Phase 5: Execute
    result, error = phase_5_execute(sql_query)
    attempt_info["result"] = result
    attempt_info["error"] = error

    if error:
        # Diagnose the error
        diagnosis = diagnose_error(sql_query, error, schema)
        attempt_info["diagnosis"] = diagnosis
        attempt_history.append(attempt_info)
        continue

    # Success!
    attempt_history.append(attempt_info)
    return result, sql_query, attempt_history

# All attempts failed
return None, None, attempt_history

def format_natural_language_response(question, df, sql_query):
"""
Generate a natural language explanation of the results
"""
if df is None or df.empty:
return "No results found for your query."

result_preview = df.head(10).to_string()
row_count = len(df)

instruction = f"""

You are a data analyst explaining query results to a business user.

USER QUESTION: "{question}"

SQL QUERY EXECUTED:
{sql_query}

RESULTS ({row_count} rows total):
{result_preview}

TASK: Provide a clear, business-friendly explanation of these results:

Directly answer the user's question
Highlight key insights from the data
Mention the total number of results if relevant
Keep it concise but informative

Do not show the SQL query or technical details. Focus on the business meaning.
"""

try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=400,
        temperature=0.3
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Results retrieved successfully. {row_count} rows returned."

Step 5: Building the Streamlit User Interface

Now let's create an interactive interface that showcases the agent's cognitive process:

`
python
def main():
st.set_page_config(page_title="SQL Agent with Llama-4-Scout", layout="wide")

st.title("🤖 Self-Correcting Database Agent")
st.markdown("*Powered by Meta's Llama-4-Scout*")

# Initialize session state
if 'db_initialized' not in st.session_state:
    st.session_state.db_initialized = False
if 'query_history' not in st.session_state:
    st.session_state.query_history = []

# Sidebar for database setup
with st.sidebar:
    st.header("⚙️ Database Setup")

    if not st.session_state.db_initialized:
        if st.button("Initialize Sample Database", type="primary"):
            with st.spinner("Creating database..."):
                result = create_sample_database()
                st.session_state.db_initialized = True
                st.success(result)
                st.rerun()
    else:
        st.success("✅ Database ready")

        if st.button("Reset Database"):
            create_sample_database()
            st.session_state.query_history = []
            st.success("Database reset!")
            st.rerun()

    st.markdown("---")
    st.header("📊 Database Schema")

    if st.session_state.db_initialized:
        with st.expander("View Schema", expanded=False):
            schema = get_database_schema()
            st.code(schema, language="text")

    st.markdown("---")
    st.header("💡 Example Questions")
    st.markdown("""
    **Simple:**
    - How many employees do we have?
    - Show all departments

    **Medium:**
    - What's the average salary by department?
    - Who are the managers?

    **Complex:**
    - Which employees earn more than their manager?
    - Show departments with highest average salaries

    **Advanced:**
    - Compare project budgets to department budgets
    - Find managers with most direct reports
    """)

# Main content area
if not st.session_state.db_initialized:
    st.info("👈 Please initialize the database using the sidebar to get started.")
    return

st.header("💬 Ask Questions About Your Data")

# Query input
col1, col2 = st.columns([3, 1])

with col1:
    user_question = st.text_input(
        "Your Question:",
        placeholder="e.g., Who are the top 5 highest paid employees?",
        key="question_input"
    )

with col2:
    st.write("")  # Spacing
    st.write("")  # Spacing
    show_thinking = st.checkbox("Show reasoning", value=True)

if st.button("🔍 Get Answer", type="primary", use_container_width=True):
    if not user_question:
        st.warning("Please enter a question.")
        return

    # Get schema
    schema = get_database_schema()

    # Execute query with retry logic
    with st.spinner("🧠 Agent is thinking..."):
        result_df, final_sql, attempt_history = execute_query_with_retry(
            user_question, 
            schema, 
            max_attempts=3
        )

    # Display results
    if result_df is not None:
        st.success("✅ Query successful!")

        # Show the agent's thinking process
        if show_thinking:
            with st.expander("🧠 Agent's Cognitive Process", expanded=True):
                for i, attempt in enumerate(attempt_history):
                    if len(attempt_history) > 1:
                        if i < len(attempt_history) - 1:
                            st.error(f"❌ Attempt {attempt['attempt_number']} (Failed)")
                        else:
                            st.success(f"✅ Attempt {attempt['attempt_number']} (Success)")

                    col1, col2 = st.columns(2)

                    with col1:
                        st.markdown("**1️⃣ UNDERSTAND**")
                        st.info(attempt['understanding'])

                    with col2:
                        st.markdown("**2️⃣ PLAN**")
                        st.info(attempt['plan'])

                    st.markdown("**3️⃣ GENERATE SQL**")
                    st.code(attempt['sql'], language="sql")

                    if attempt['validation']['warnings']:
                        st.warning("Validation Warnings: " + ", ".join(attempt['validation']['warnings']))

                    if attempt['error']:
                        st.error(f"**Error:** {attempt['error']}")

                        if attempt['diagnosis']:
                            st.markdown("**🔍 Diagnosis:**")
                            st.warning(attempt['diagnosis'])

                    if i < len(attempt_history) - 1:
                        st.markdown("---")
                        st.markdown("↻ **Retrying with corrections...**")
                        st.markdown("---")

        # Display natural language response
        st.markdown("### 📝 Answer")
        with st.spinner("Generating explanation..."):
            explanation = format_natural_language_response(user_question, result_df, final_sql)
        st.markdown(explanation)

        # Display results table
        st.markdown("### 📊 Data")
        st.dataframe(result_df, use_container_width=True)

        # Show final SQL
        with st.expander("View SQL Query"):
            st.code(final_sql, language="sql")

        # Add to history
        st.session_state.query_history.append({
            "question": user_question,
            "sql": final_sql,
            "rows": len(result_df),
            "attempts": len(attempt_history),
            "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        })

    else:
        st.error("❌ Query failed after all retry attempts")

        if show_thinking:
            with st.expander("🧠 All Attempts", expanded=True):
                for i, attempt in enumerate(attempt_history):
                    st.error(f"**Attempt {attempt['attempt_number']}**")
                    st.code(attempt['sql'], language="sql")
                    st.error(f"Error: {attempt['error']}")
                    if attempt['diagnosis']:
                        st.warning(f"Diagnosis: {attempt['diagnosis']}")
                    st.markdown("---")

# Query history
if st.session_state.query_history:
    st.markdown("---")
    st.header("📜 Query History")

    history_df = pd.DataFrame(st.session_state.query_history)
    st.dataframe(history_df, use_container_width=True)

if name == "main":
main()
`

Technical Implementation Insights

1. The Power of Multi-Phase Reasoning

The five-phase approach significantly improves query accuracy compared to single-shot generation. Here's why:

Phase Separation Benefits:

Understand: Forces the model to explicitly identify required data before coding
Plan: Creates a logical roadmap, reducing the chance of structural errors
Generate: Focuses solely on syntax, with context from previous phases
Validate: Catches dangerous operations before execution
Execute: Provides real-world feedback for learning

This separation of concerns mirrors how human SQL experts approach complex queries—they don't immediately write code; they think through the problem first.

2. Self-Correction Through Error Analysis

The retry mechanism is sophisticated because it provides the agent with:

`
python
error_context = f"""
PREVIOUS SQL ATTEMPT:
{previous_sql}

ERROR RECEIVED:
{previous_error}

IMPORTANT: Fix the specific error above.
"""
`

This context allows the model to:

See exactly what it tried before
Understand why it failed
Adjust its approach specifically

Most text-to-SQL systems fail silently. Our agent learns and adapts.

3. Safety Through Validation

The validation phase prevents dangerous operations:

python dangerous_keywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER', 'TRUNCATE']
This is critical for production systems where a malicious or misunderstood prompt could corrupt data. The agent is constrained to read-only operations, making it safe for business users.

4. Schema-Aware Query Generation

By providing the complete schema to the agent at each phase, we eliminate a common failure mode: queries that reference non-existent tables or columns. The schema serves as ground truth that the agent must respect.

Conclusion

Building a self-correcting database agent with Llama-4-Scout demonstrates the power of structured agentic reasoning. Unlike simple prompt-and-hope approaches, our five-phase cognitive framework—Understand → Plan → Generate → Validate → Execute—creates a system that thinks through problems methodically and learns from mistakes.

The key innovations in this implementation are:

Multi-phase reasoning: Each phase has a specific cognitive responsibility
Intelligent retry logic: The agent learns from failures and adapts
Safety-first design: Validation prevents destructive operations
Natural language interface: Business users can access data without SQL knowledge
Transparent reasoning: Users can inspect the agent's thought process

This approach can be extended beyond SQL to other structured tasks: API query generation, configuration file creation, or any domain where precision matters and mistakes have consequences. The think-plan-execute-validate-retry pattern is broadly applicable to agentic systems.

As LLMs continue to improve, we can expect even more sophisticated error recovery, better schema understanding, and more nuanced query optimization. The combination of powerful base models like Llama-4-Scout with well-designed cognitive frameworks opens up new possibilities for making complex technical systems accessible to everyone.

Ready to build your own database agent? The complete code is provided in this article. Start with the sample database, test with the example questions, and then adapt it to your own data infrastructure. The future of data access is conversational, safe, and intelligent.

DEV Community

Building Self-Correcting Database Agents with Meta's Llama-4-Scout: From Natural Language to SQL Insights

Introduction

Why Database Agents Matter

Understanding Llama-4-Scout for Structured Data Tasks

Implementation Architecture

Step 1: Environment Setup and Dependencies

Required packages

pip install streamlit together sqlite3 pandas

Step 2: Database Schema and Sample Data Creation

Step 3: The Five-Phase Agent Cognitive Framework

Initialize Together AI client

Step 4: Self-Correction and Retry Logic

Step 5: Building the Streamlit User Interface

Technical Implementation Insights

1. The Power of Multi-Phase Reasoning

2. Self-Correction Through Error Analysis

3. Safety Through Validation

4. Schema-Aware Query Generation

Conclusion

Top comments (0)