Karl Weinmeister

Posted on Nov 12 • Originally published at Medium on Nov 11

The Six Failures of Text-to-SQL (And How to Fix Them with Agents)

#googleadk #relationaldatabases #sql #agents

I’ve written countless SQL queries over the years. Unfortunately, like my golf game, I don’t write SQL enough to be a pro at it. Outside of straightforward SELECT statements, I approach SQL queries iteratively. I’ll inspect the tables, draft a query, and hope for the best. If there are any errors, I’ll go through this loop again.

While AI models are much better than me at SQL, they aren’t perfect. And that loop I described is just as important for automated approaches to be effective. Text-to-SQL is a deceptively difficult problem with challenges including linguistic ambiguity and rare SQL operations.

This is where a multi-agent architecture, built with a framework like Google’s Agent Development Kit (ADK), becomes essential. We can build a “virtual data analyst” by composing a team of specialized agents. A SchemaExtractor can find the right tables, a SqlGenerator can write the draft, and a SqlCorrector can critique and fix it. A SequentialAgent acts as the manager, ensuring the process is followed, every single time.

In this guide, we’ll walk through the six most common failure points for Text-to-SQL and show how to solve each one by building out our team of agents, moving from a simple script to a full-fledged agentic system. We’ll use the sample project kweinmeister/text-to-sql-agent to illustrate these solutions.

Problem 1: Agent Order Issues

Here’s the issue with a single LlmAgent that holds all the tools: it decides the order of operations. It might confidently skip fetching the schema and invent a table name. Or it might try to run a query before validating it. A single LLM is deciding what to do next, and it can (and will) make mistakes. That’s not a reliable process.

Solution: SequentialAgent for Order Control

The ADK gives us “Workflow Agents” for this. These specialized agents don’t use an LLM for flow control. They’re deterministic.

The SequentialAgent is the simplest and most powerful one to start with. It runs its sub-agents in the exact order you list them. Using a sequential agent also separates the concerns of “what to do” (our specialized agents) from “the order to do it in” (the workflow agent).

The SequentialAgent also acts as a guardrail. It turns our best practices (“always get the schema first,” “always validate before running”) into enforced infrastructure, not just suggestions in a prompt.

Code Example: Defining the Workflow Manager

Let’s define our root agent. Instead of a single LlmAgent, our root_agent will be a SequentialAgent. We’ll start by defining the specialists as stubs (we’ll build them out in the next sections):

from google.adk.agents import SequentialAgent

from .agents import (
    schema_extractor_agent,
    sql_correction_loop,
    sql_generator_agent,
)
from .callbacks import capture_user_message

root_agent = SequentialAgent(
    name="TextToSqlRootAgent",
    before_agent_callback=capture_user_message,
    sub_agents=[
        schema_extractor_agent,
        sql_generator_agent,
        sql_correction_loop,
    ],
)

Problem 2: LLM Schema Hallucinations

This is the classic failure mode. The LLM just doesn’t know your schema.

A common but flawed fix is to dump the entire database schema into the prompt. This backfires for two reasons. First, huge enterprise schemas won’t even fit in the context window. Second, even if they did, giving the LLM 100 irrelevant tables to find the 2 relevant ones just drowns it in noise and leads to worse results.

Solution: Dedicated Schema-Retrieval Tool

The answer is dynamic retrieval. Don’t give the agent a static block of schema; give it a tool to fetch schema. This lets the LLM reason about what it needs first, and then request only that specific information.

We can build a simple Python function for this. The ADK makes it easy to turn any function into an agent-callable tool with FunctionTool. The agent automatically figures out how to use it from its docstring, a best practice you’ll see in projects like gabrielpreda/adk-sql-agent.

Code Example: The Schema Tool

💡 In the kweinmeister/text-to-sql-agent project, the functions are not wrapped as tools, since they are directly called by a deterministic agent. They are provided centrally in a tools.py file, so that they can be easily leveraged as tools in a future LlmAgent.

import logging
from typing import Any

from .config import DB_URI
from .dialects.dialect import DatabaseDialect

logger = logging.getLogger( __name__ )

def load_schema_into_state(state: dict[str, Any], dialect: DatabaseDialect) -> None:
    """
    Loads the DDL and SQLGlot schema into the state dictionary.
    This function relies on the caching mechanism within the dialect object.
    """
    logger.info(f"Loading schema for dialect: {dialect.name}")

    db_uri = DB_URI
    # Error handling code omitted

    try:
        logger.info(f"Loading schema from database: {db_uri}")
        # The dialect object handles its own caching.
        # The first call to get_ddl will trigger the DB query and cache the DDL.
        logger.info("Calling dialect.get_ddl...")
        state["schema_ddl"] = dialect.get_ddl(db_uri)
        logger.info("DDL loaded successfully")

        # The call to get_sqlglot_schema will use the cached DDL if available,
        # then parse it and cache the result.
        logger.info("Calling dialect.get_sqlglot_schema...")
        state["sqlglot_schema"] = dialect.get_sqlglot_schema(db_uri)
        logger.info("SQLGlot schema loaded successfully")
        logger.info(f"SQLGlot schema keys: {list(state['sqlglot_schema'].keys())}")

    except Exception as e:
        error_msg = f"Error extracting schema: {e}"
        logger.error(error_msg, exc_info=True)
        state["schema_ddl"] = f"Error loading schema: {error_msg}"
        state["sqlglot_schema"] = {"error": error_msg}

Problem 3: Query Logic Errors

Even with the right schema, the LLM can still make logical mistakes with complex joins or aggregations. A human analyst would spot the error, critique it (“That join is wrong, you need to use user_id”), and refine it.

Our SequentialAgent is too simple for this. It’s a waterfall. It can’t go backwards and iterate.

Solution: LoopAgent for Iterative Refinement

The ADK has another workflow agent for this: the LoopAgent. This agent runs its sub-agents iteratively until a condition is met. It’s perfect for a “generate-and-critique” pattern.

We don’t have to replace our SequentialAgent. We can enhance it by nesting workflow agents. We’ll replace the single query generation step inside our SequentialAgent with a new LoopAgent. This loop will contain a team of two specialists:

A Writer Agent: An LlmAgent that writes the SQL draft.
A Critic Agent: A second LlmAgent with a different prompt, whose only job is to correct the writer’s SQL.

This is a powerful way to get LLMs to self-correct, which improves the quality of the final query.

Code Example: Building a “Generate-and-Critique” Loop

sql_generator_agent = Agent(
    name="sql_generator_agent",
    model=MODEL_NAME,
    description="Generates an initial SQL query from a natural language question.",
    instruction=get_generator_instruction,
    output_key="sql_query",
    after_model_callback=clean_sql_query,
)

sql_corrector_agent = Agent(
    name="sql_corrector_agent",
    model=MODEL_NAME,
    description="Corrects a failed SQL query.",
    instruction=get_corrector_instruction,
    output_key="sql_query",
    tools=[],
    after_model_callback=clean_sql_query,
)

sql_correction_loop = LoopAgent(
    name="SQLCorrectionLoop",
    sub_agents=[
        sql_processor_agent,
        sql_corrector_agent,
    ],
    max_iterations=3,
)

Problem 4: Agent Performance and Cost

We’re now using three LLM-powered agents. This is great for quality, but it’s slow and costs money with every API call.

What about simple, deterministic steps? Things like validating SQL syntax, formatting data, or cleaning up LLM output. Using a powerful LLM for these jobs is like using a sledgehammer to hang a picture. It’s slow, expensive, and surprisingly unreliable.

Solution: Custom Agents for Code-Based Logic

The ADK isn’t just for LLMs. You can create a “Custom Agent” by inheriting from BaseAgent and implementing the _run_async_impl method.

This agent has no LLM. It runs pure Python code. It’s fast and 100% deterministic. We’ll create a custom agent for our next problem: validation.

Code Example: Building a Non-LLM ValidationAgent

This agent will use the sqlglot library (which we’ll discuss in detail next) and will be a custom BaseAgent.

class SQLProcessor(BaseAgent):
    """
    Agent that handles the mechanical steps of:
    1. Validating the current SQL.
    2. Executing it ONLY if validation passed.
    3. Escalating to exit the loop on successful execution.
    """

    async def _run_async_impl(self, ctx: InvocationContext) -> AsyncGenerator[Event]:
        logger.info(f"[{self.name}] Starting SQL processing.")
        # ...

Problem 5: Dangerous Query Execution

This is the big one. You can’t execute LLM-generated code directly against your database. Ever. It’s a massive security and stability risk.

We need a fast, reliable check for syntax errors. What if the LLM produces a query that’s syntactically invalid? Or for the wrong SQL dialect?

Solution: Non-Destructive Dry Run with sqlglot

This is where our custom SqlValidationAgent shines. We’ll use the sqlglot library, a pure-Python SQL parser and transpiler.

Why sqlglot? It’s fast and local, building a real Abstract Syntax Tree (AST) which is infinitely more reliable than regex. It’s also dialect-aware, so it can catch syntax errors specific to, say, PostgreSQL.

We can just wrap sqlglot.parse_one(sql) in a try…except block. If it parses, the syntax is valid. If it throws a ParseError, it’s not. This gives us a fast and cheap validation signal.

Code Example: Full ValidationAgent Implementation

Here is the full implementation of the SqlValidationAgent we previewed with sqlglot validation.

from google.adk.agents import BaseAgent
from google.adk.core import InvocationContext, Event
from google.genai.types import Content, Part
import sqlglot
import sqlglot.expressions as exp
import asyncio
from typing import AsyncGenerator

class SQLProcessor(BaseAgent):
    """
    Agent that handles the mechanical steps of:
    1. Validating the current SQL.
    2. Executing it ONLY if validation passed.
    3. Escalating to exit the loop on successful execution.
    """

    async def _run_async_impl(self, ctx: InvocationContext) -> AsyncGenerator[Event]:
        logger.info(f"[{self.name}] Starting SQL processing.")
        state = ctx.session.state
        dialect = get_dialect()

        val_result: dict[str, Any] = run_sql_validation(state, dialect)
        yield Event(
            author=self.name,
            invocation_id=ctx.invocation_id,
            custom_metadata={"validation_result": val_result},
        )

        if val_result.get("status") == "success":
            exec_result: dict[str, Any] = run_sql_execution(state, dialect)

            result_event = Event(
                author=self.name,
                invocation_id=ctx.invocation_id,
                custom_metadata={"execution_result": exec_result},
            )

            # If execution succeeds, this is the final answer.
            # Escalate to exit the loop and provide the final content.
            if exec_result.get("status") == "success":
                logger.info(
                    f"[{self.name}] SQL execution successful. Escalating to exit loop."
                )
                result_event.actions.escalate = True

                final_query: str | None = state.get("sql_query")
                state["final_sql_query"] = final_query

                if final_query:
                    result_event.content = Content(
                        role="model", parts=[Part(text=final_query)]
                    )

            yield result_event
        else:
            logger.info(f"[{self.name}] Skipping execution due to validation failure.")
            state["execution_result"] = {
                "status": "skipped",
                "reason": "validation_failed",
            }

Problem 6: Messy LLM Output

One last thing. LLMs are trained to be helpful conversationalists. So when you ask for a SQL query, you often get this:

“Sure! Here is the SQL query you asked for: SELECT * FROM users;”

That conversational fluff will break our SqlValidationAgent every single time. We need a way to programmatically clean the LLM’s output before it’s passed to the next agent.

Solution: Callbacks for Post-Processing

We could add another CustomAgent just to strip the text, but that feels a bit heavy for such a simple task.

The ADK offers a more elegant solution: Callbacks.

An AfterAgentCallback is a function you attach to an agent that’s guaranteed to run immediately after the agent finishes. It can even modify the agent’s final output.

Code Example: Attaching a Cleanup Callback

import re
from google.adk.core import InvocationContext, Content

def cleanup_sql_output(
    context: InvocationContext,
    agent_output: Content
) -> Content:
    """
    This callback runs *after* the agent and cleans its output.
    """
    raw_text = agent_output.parts.text

    # Simple regex to find content within ```
{% endraw %}
sql...
{% raw %}

match = re.search(r"```

sql\s*(.?)\s

```", raw_text, re.DOTALL | re.IGNORECASE)

cleaned_text = raw_text
if match:
    cleaned_text = match.group(1)
else:
    # Fallback: simple stripping
    cleaned_text = raw_text.strip().strip("`").strip()

# Add a semicolon if it's missing (another common cleanup)
if not cleaned_text.endswith(";"):
    cleaned_text += ";"

# Return a *new* Content object to *replace* the original output
return Content.from_text(cleaned_text)




### Final Architecture

We’ve systematically tackled the six hardest problems in Text-to-SQL, evolving a brittle script into an extensible multi-agent system.

Our final root\_agent is a [SequentialAgent](https://google.github.io/adk-docs/api-reference/python/google.adk.agents.html#google.adk.agents.SequentialAgent) that orchestrates a team of specialists: a schema-fetching agent, a looping agent for iterative query improvement (with its own writer and critic), and a fast, deterministic validation agent using sqlglot.

The point is that modern agent development is about _composition_. You have to choose the right ADK construct for the right task. This table is a cheat sheet for making that decision.

### Agent Design: The “Right Tool for the Job”

![](https://cdn-images-1.medium.com/max/1024/1*OuHQ_Jb7kQ0IaBrlWKVICg.png)

### Conclusion: Building Reliable AI Systems

This pattern of **Specialization** , **Orchestration** , and **Safeguards** is the future of building production-ready AI. It’s not just for SQL, either. You can use this same architecture for autonomous code generation, document analysis, and much more.

So stop trying to build one “super-prompt” and start building teams of specialized agents. Welcome to the world of reliable, agentic systems.

What’s next? Get started in 3 simple steps in the [sample repository](https://github.com/kweinmeister/text-to-sql-agent). If you want a hands-on lab exercise, check out [Build Multi-Agent Systems with ADK](https://codelabs.developers.google.com/codelabs/production-ready-ai-with-gc/3-developing-agents/build-a-multi-agent-system-with-adk?hl=en#0&utm_campaign=CDR_0x2b6f3004_default_b459252462&utm_medium=external&utm_source=blog). To learn about powerful, built-in natural language capabilities in AlloyDB, try out the [AlloyDB AI NL SQL](https://codelabs.developers.google.com/alloydb-ai-nl-sql?hl=en#0&utm_campaign=CDR_0x2b6f3004_default_b459252462&utm_medium=external&utm_source=blog) codelab.

Want to keep the discussion going about multi-agent systems? Connect with me on [LinkedIn](https://www.linkedin.com/in/karlweinmeister/), [X](https://x.com/kweinmeister), or [Bluesky](https://bsky.app/profile/kweinmeister.bsky.social).

* * *

DEV Community