Cristian Soto

Posted on Jul 2

Building an AI SQL Validation Agent with Codex and a REST API

#ai #database #sql

Most AI database demos start with natural language.

A user asks, "Which customers bought something last month?", the application generates SQL, and the query is sent to a database. That is useful, but it skips a problem many developers face every day: they already have SQL, and they need to know whether it is correct, safe, and compatible with the database engine they are using.

This article builds a small educational project that focuses on that second problem.

We are going to create an AI-style SQL validation agent. It receives SQL statements, analyzes them before execution, detects common mistakes, and returns a structured explanation that feels closer to a helpful code review than a raw database error.

The project does not connect to a database. It does not use an ORM. It does not call OpenAI. The "AI" behavior is simulated with deterministic validation rules and an explanation module, which makes the project easy to run, test, and extend.

Source code:

View the complete project on GitHub

By the end, we will have:

a Node.js and TypeScript REST API
a POST /api/validate endpoint
Zod request validation
rule-based SQL validation
engine-specific checks for ANSI, MySQL, PostgreSQL, Oracle, and SQL Server
an AI explanation module
Vitest tests
a reusable Codex Skill for improving the validator

The problem with SQL error messages

SQL errors are accurate, but they are not always helpful.

Imagine writing this query:

SELECT * FORM users;

A database parser might return an error near FORM. That is technically correct, but the developer still has to understand the actual problem:

FORM is not a valid keyword here. The intended keyword is probably FROM.

That is a tiny typo. The experience gets worse with missing commas, unclosed quotes, multiple statements, unsafe commands, or dialect-specific syntax.

For example:

SELECT TOP 5 * FROM users;

That can make sense in SQL Server, but it is not the normal PostgreSQL syntax. PostgreSQL expects something like:

SELECT * FROM users LIMIT 5;

Traditional validation often answers:

Can the database parse this?

An AI-style validation agent can answer:

What is wrong, why is it wrong, and what should the developer try next?

That difference is the heart of this project.

Why AI Agents are useful for SQL validation

AI agents are useful when they coordinate steps, inspect context, explain results, and suggest actions. SQL validation fits that pattern nicely.

A validation agent can:

inspect SQL before it reaches a database
block unsafe statements in read-only workflows
detect simple syntax mistakes
warn about dialect mismatches
explain errors in developer-friendly language
integrate with an IDE, pull request bot, or CI pipeline

The important boundary is that validation should happen before execution.

In this project, SQL is treated as untrusted input. The API receives the query, validates the request shape, runs rule-based checks, generates an explanation, and returns JSON. No database connection is required.

The architecture looks like this:

Developer
   |
   v
REST API
   |
   v
SQL Validator
   |
   v
Validation Rules
   |
   v
AI Explanation Agent
   |
   v
Corrected SQL
   |
   v
JSON Response

This is small enough for a tutorial, but the structure is close to what a larger SQL review agent could use.

Project Architecture

The repository is organized around a few clear responsibilities:

sql-ai-validator-agent/
  src/
    agent/
      explanationAgent.ts
    api/
      routes.ts
      schema.ts
    types/
      validation.ts
    validation/
      rules.ts
      utils.ts
      validator.ts
    app.ts
    server.ts
  tests/
    validator.test.ts
  docs/
  skill/
    SKILL.md
  examples/
  README.md
  ARTICLE_DEVTO.md

The request flow is:

HTTP request
   |
   v
Zod schema
   |
   v
validateSql()
   |
   v
runValidationRules()
   |
   v
explainValidation()
   |
   v
JSON response

This separation keeps the project flexible. The API layer does not know the details of SQL validation. The rule engine does not care about Express. The explanation agent receives structured errors and turns them into readable feedback.

That gives us room to improve one piece without rewriting the whole project.

Running the project locally

Clone the repository and install dependencies:

git clone https://github.com/cs2026086510-a11y/sql-ai-validator-agent.git
cd sql-ai-validator-agent
npm install

Run the development server:

npm run dev

The API starts on:

http://localhost:3000

You can check that the server is alive with:

curl http://localhost:3000/api/health

Expected response:

{
  "status": "ok"
}

REST API Design

The main endpoint is:

POST /api/validate
Content-Type: application/json

The request body has two fields:

{
  "engine": "postgresql",
  "query": "SELECT * FORM users;"
}

The engine field tells the validator which SQL dialect to use. Supported values are:

ansi
mysql
postgresql
oracle
sqlserver

The response is intentionally structured:

{
  "valid": false,
  "engine": "postgresql",
  "errors": [
    {
      "code": "TYPO_FORM",
      "message": "Possible typo detected.",
      "suggestion": "FROM",
      "severity": "error"
    }
  ],
  "explanation": "The keyword FORM is not valid SQL in this position. It looks like you meant FROM.",
  "correctedSql": "SELECT * FROM users;"
}

This shape is useful because humans can read it, and tools can consume it. An IDE extension could display errors[0].message. A CI job could fail when valid is false. A pull request bot could post the explanation as a review comment.

The request is validated with Zod:

import { z } from "zod";
import { sqlEngines } from "../types/validation.js";

export const validateSqlSchema = z.object({
  engine: z.enum(sqlEngines),
  query: z
    .string()
    .trim()
    .min(1, "query is required")
    .max(10_000, "query must be smaller than 10,000 characters")
});

The Express route stays small:

apiRouter.post("/validate", (request, response) => {
  const parsed = validateSqlSchema.safeParse(request.body);

  if (!parsed.success) {
    return response.status(400).json({
      valid: false,
      errors: parsed.error.flatten()
    });
  }

  return response.json(validateSql(parsed.data));
});

That is the whole HTTP boundary. Everything else belongs to the validator.

Building the Validation Engine

The validator detects:

FORM instead of FROM
missing commas in simple SELECT lists
unclosed quotes
unclosed parentheses
multiple SQL statements
forbidden write or schema statements
SELECT without columns
engine-specific syntax problems

The orchestration function is short:

export function validateSql(request: ValidationRequest): ValidationResult {
  const query = normalizeSql(request.query);
  const errors = runValidationRules(query, request.engine);
  const { explanation, correctedSql } = explainValidation(query, request.engine, errors);

  return {
    valid: errors.every((error) => error.severity !== "error"),
    engine: request.engine,
    errors,
    explanation,
    correctedSql
  };
}

Each rule produces a structured error:

if (/\bFORM\b/i.test(literalSafeSql)) {
  errors.push({
    code: "TYPO_FORM",
    message: "Possible typo detected.",
    suggestion: "FROM",
    severity: "error"
  });
}

One important detail is that the validator strips string literals before checking keywords. This prevents false positives in queries like:

SELECT 'FORM is mentioned inside a string' AS note;

The project also blocks multiple statements:

SELECT * FROM users; DROP TABLE users;

That query returns an error before any database could execute it. The same read-only policy blocks statements like DROP, DELETE, UPDATE, INSERT, ALTER, and TRUNCATE.

This is not because those commands are always wrong. They are blocked because this validator is designed as a safe educational review layer.

Creating an AI Explanation Agent

The explanation agent does not call a language model. Instead, it uses validation codes to generate helpful explanations.

function explainError(error: ValidationError, engine: SqlEngine): string {
  switch (error.code) {
    case "TYPO_FORM":
      return "The keyword FORM is not valid SQL in this position. It looks like you meant FROM.";
    case "MISSING_COMMA":
      return "The SELECT list contains adjacent identifiers without a comma. SQL expects each selected column expression to be separated clearly.";
    case "ENGINE_SPECIFIC":
      return `The SQL uses syntax that does not match ${formatEngine(engine)}.`;
    default:
      return error.message;
  }
}

This gives us an agent-like response without the cost or unpredictability of a real LLM. It is deterministic, testable, and easy to review.

For corrections, the agent only changes SQL when the fix is obvious:

if (error.code === "TYPO_FORM") {
  corrected = replaceWord(corrected, "FORM", "FROM");
}

That restraint matters. A validation agent should not invent table names, guess missing columns, or rewrite business logic. It should explain the detected issue and suggest a safe next step.

Real API Examples

Here are examples that show how the API behaves.

Typo: FORM instead of FROM

Request:

curl -s http://localhost:3000/api/validate \
  -H "Content-Type: application/json" \
  -d '{"engine":"postgresql","query":"SELECT * FORM users;"}'

Response:

{
  "valid": false,
  "engine": "postgresql",
  "errors": [
    {
      "code": "TYPO_FORM",
      "message": "Possible typo detected.",
      "suggestion": "FROM",
      "severity": "error"
    }
  ],
  "explanation": "The keyword FORM is not valid SQL in this position. It looks like you meant FROM.",
  "correctedSql": "SELECT * FROM users;"
}

Missing commas

Request:

curl -s http://localhost:3000/api/validate \
  -H "Content-Type: application/json" \
  -d '{"engine":"ansi","query":"SELECT id name email FROM users;"}'

Response:

{
  "valid": false,
  "engine": "ansi",
  "errors": [
    {
      "code": "MISSING_COMMA",
      "message": "Possible missing comma in SELECT column list.",
      "suggestion": "Separate selected columns with commas.",
      "severity": "error"
    }
  ],
  "explanation": "The SELECT list contains adjacent identifiers without a comma. SQL expects each selected column expression to be separated clearly."
}

Forbidden statement

Request:

curl -s http://localhost:3000/api/validate \
  -H "Content-Type: application/json" \
  -d '{"engine":"postgresql","query":"DROP TABLE users;"}'

Response:

{
  "valid": false,
  "engine": "postgresql",
  "errors": [
    {
      "code": "FORBIDDEN_STATEMENT",
      "message": "Forbidden statement detected: DROP.",
      "suggestion": "Use read-only SELECT queries in this educational validator.",
      "severity": "error"
    }
  ],
  "explanation": "This validator is intentionally read-only. Write, schema, and execution statements are blocked before any database could run them."
}

PostgreSQL dialect issue

Request:

curl -s http://localhost:3000/api/validate \
  -H "Content-Type: application/json" \
  -d '{"engine":"postgresql","query":"SELECT TOP 5 * FROM users;"}'

Response:

{
  "valid": false,
  "engine": "postgresql",
  "errors": [
    {
      "code": "ENGINE_SPECIFIC",
      "message": "TOP is not valid PostgreSQL SELECT syntax.",
      "suggestion": "Use LIMIT in PostgreSQL.",
      "severity": "error"
    }
  ],
  "explanation": "The SQL uses syntax that does not match PostgreSQL. Use LIMIT in PostgreSQL."
}

Valid query

Request:

curl -s http://localhost:3000/api/validate \
  -H "Content-Type: application/json" \
  -d '{"engine":"postgresql","query":"SELECT id, name FROM users WHERE active = true;"}'

Response:

{
  "valid": true,
  "engine": "postgresql",
  "errors": [],
  "explanation": "The SQL statement looks valid for PostgreSQL based on the educational rule set."
}

Creating a reusable Codex Skill

The repository includes a reusable Codex Skill in:

skill/SKILL.md

The skill describes how Codex should work with the validator:

what the validator is for
what inputs it accepts
what outputs it should return
which validation workflow to follow
which security constraints matter
how to add new rules safely

The workflow section looks like this:

1. Normalize whitespace without changing SQL meaning.
2. Validate request shape before analyzing SQL.
3. Reject multiple executable statements.
4. Reject forbidden write, schema, or execution statements for read-only workflows.
5. Check mechanical syntax issues.
6. Run engine-specific checks for the selected dialect.
7. Generate an explanation from detected errors.
8. Suggest corrected SQL only when the correction is deterministic.

This is useful because future improvements can be guided by the same operating instructions. For example, a developer could ask Codex to add a new rule for Oracle syntax, update the tests, and keep the API response contract unchanged.

The skill turns the project into something more reusable than a one-off tutorial.

Testing

The project uses Vitest:

npm test

One test checks the typo flow:

it("detects FORM instead of FROM and suggests corrected SQL", () => {
  const result = validateSql({
    engine: "postgresql",
    query: "SELECT * FORM users;"
  });

  expect(result.valid).toBe(false);
  expect(result.errors[0]).toMatchObject({
    code: "TYPO_FORM",
    suggestion: "FROM"
  });
  expect(result.correctedSql).toBe("SELECT * FROM users;");
});

Testing is especially important for validators because every rule can create false positives. A good workflow is:

Add a failing test for the SQL issue.
Implement the smallest rule that catches it.
Add at least one test that should not be flagged.
Keep the API response stable.

That keeps the validator useful instead of noisy.

How this could be used in real projects

This educational API can fit into several workflows.

In an IDE, an extension could call /api/validate when a developer saves a .sql file. The response could become editor diagnostics with a short explanation.

In CI, a script could scan SQL files and fail the build if the response contains blocking errors.

In pull requests, a review bot could post comments like:

SQL validation failed.

Issue: TOP is not valid PostgreSQL SELECT syntax.
Suggestion: Use LIMIT in PostgreSQL.
Explanation: The SQL uses syntax that does not match PostgreSQL.

In a larger AI system, a model could use this validator as a tool. The model would not need to guess whether a query is safe. It could call the API, read the structured response, and explain the result to the developer.

Future Improvements

This project is intentionally small, but the architecture leaves room to grow.

The next step would be adding a real SQL parser for deeper dialect support. Regular expressions are fine for a tutorial, but parser-backed validation is more reliable for nested queries, functions, joins, and complex expressions.

Another improvement would be line and column positions. IDEs and pull request comments are much more useful when they can point to the exact location of the issue.

Policy configuration would also help. Some teams may want to block all write statements in dashboards but allow INSERT or ALTER in migration folders.

Finally, the explanation module could become a real LLM-backed agent. If that happens, the deterministic rules should still run first. The model should explain structured findings, not freely invent validation results. It should also follow prompt-injection protections: treat SQL as untrusted input, separate instructions from user content, and validate model output before returning it.

Conclusion

Text-to-SQL is popular, but SQL validation is just as important.

Developers need tools that review SQL before execution, explain mistakes clearly, catch unsafe statements, and teach dialect differences. This project shows one way to build that kind of tool with a small TypeScript REST API and an agent-style explanation layer.

The result is simple, but useful:

no database connection
no ORM
no frontend
deterministic validation rules
structured JSON responses
AI-style explanations
a reusable Codex Skill

It is a good foundation for an IDE helper, CI validator, pull request reviewer, or larger SQL code review agent.

Repository:

https://github.com/cs2026086510-a11y/sql-ai-validator-agent.git

References

OpenAI Codex documentation: https://developers.openai.com/codex
Codex CLI repository: https://github.com/openai/codex
Codex Skills documentation: https://developers.openai.com/codex/concepts/customization#skills
OWASP LLM Prompt Injection Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/LLM_Prompt_Injection_Prevention_Cheat_Sheet.html
ISO SQL standard overview: https://www.iso.org/standard/76583.html
PostgreSQL SELECT documentation: https://www.postgresql.org/docs/current/sql-select.html
MySQL SELECT documentation: https://dev.mysql.com/doc/refman/8.4/en/select.html
Oracle SELECT documentation: https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/SELECT.html
SQL Server Transact-SQL reference: https://learn.microsoft.com/en-us/sql/t-sql/language-reference?view=sql-server-ver17

Top comments (1)

GIAN FRANCO AROCUTIPA AROCUTIPA • Jul 2

Great article, Cristian! It's very clear how you walk through building an AI-powered SQL validation agent — from the REST API layer built with Codex, to how the agent reviews SQL statements and returns structured feedback before anything touches a real database. I really liked how you explained the reasoning behind each technical decision, not just the implementation itself. Nice work!