Most AI database demos start with natural language.
A user asks, "Which customers bought something last month?", the application generates SQL, and the query is sent to a database. That is useful, but it skips a problem many developers face every day: they already have SQL, and they need to know whether it is correct, safe, and compatible with the database engine they are using.
This article builds a small educational project that focuses on that second problem.
We are going to create an AI-style SQL validation agent. It receives SQL statements, analyzes them before execution, detects common mistakes, and returns a structured explanation that feels closer to a helpful code review than a raw database error.
The project does not connect to a database. It does not use an ORM. It does not call OpenAI. The "AI" behavior is simulated with deterministic validation rules and an explanation module, which makes the project easy to run, test, and extend.
Source code:
View the complete project on GitHub
By the end, we will have:
- a Node.js and TypeScript REST API
- a
POST /api/validateendpoint - Zod request validation
- rule-based SQL validation
- engine-specific checks for ANSI, MySQL, PostgreSQL, Oracle, and SQL Server
- an AI explanation module
- Vitest tests
- a reusable Codex Skill for improving the validator
The problem with SQL error messages
SQL errors are accurate, but they are not always helpful.
Imagine writing this query:
SELECT * FORM users;
A database parser might return an error near FORM. That is technically correct, but the developer still has to understand the actual problem:
FORM is not a valid keyword here. The intended keyword is probably FROM.
That is a tiny typo. The experience gets worse with missing commas, unclosed quotes, multiple statements, unsafe commands, or dialect-specific syntax.
For example:
SELECT TOP 5 * FROM users;
That can make sense in SQL Server, but it is not the normal PostgreSQL syntax. PostgreSQL expects something like:
SELECT * FROM users LIMIT 5;
Traditional validation often answers:
Can the database parse this?
An AI-style validation agent can answer:
What is wrong, why is it wrong, and what should the developer try next?
That difference is the heart of this project.
Why AI Agents are useful for SQL validation
AI agents are useful when they coordinate steps, inspect context, explain results, and suggest actions. SQL validation fits that pattern nicely.
A validation agent can:
- inspect SQL before it reaches a database
- block unsafe statements in read-only workflows
- detect simple syntax mistakes
- warn about dialect mismatches
- explain errors in developer-friendly language
- integrate with an IDE, pull request bot, or CI pipeline
The important boundary is that validation should happen before execution.
In this project, SQL is treated as untrusted input. The API receives the query, validates the request shape, runs rule-based checks, generates an explanation, and returns JSON. No database connection is required.
The architecture looks like this:
Developer
|
v
REST API
|
v
SQL Validator
|
v
Validation Rules
|
v
AI Explanation Agent
|
v
Corrected SQL
|
v
JSON Response
This is small enough for a tutorial, but the structure is close to what a larger SQL review agent could use.
Project Architecture
The repository is organized around a few clear responsibilities:
sql-ai-validator-agent/
src/
agent/
explanationAgent.ts
api/
routes.ts
schema.ts
types/
validation.ts
validation/
rules.ts
utils.ts
validator.ts
app.ts
server.ts
tests/
validator.test.ts
docs/
skill/
SKILL.md
examples/
README.md
ARTICLE_DEVTO.md
The request flow is:
HTTP request
|
v
Zod schema
|
v
validateSql()
|
v
runValidationRules()
|
v
explainValidation()
|
v
JSON response
This separation keeps the project flexible. The API layer does not know the details of SQL validation. The rule engine does not care about Express. The explanation agent receives structured errors and turns them into readable feedback.
That gives us room to improve one piece without rewriting the whole project.
Running the project locally
Clone the repository and install dependencies:
git clone https://github.com/cs2026086510-a11y/sql-ai-validator-agent.git
cd sql-ai-validator-agent
npm install
Run the development server:
npm run dev
The API starts on:
http://localhost:3000
You can check that the server is alive with:
curl http://localhost:3000/api/health
Expected response:
{
"status": "ok"
}
REST API Design
The main endpoint is:
POST /api/validate
Content-Type: application/json
The request body has two fields:
{
"engine": "postgresql",
"query": "SELECT * FORM users;"
}
The engine field tells the validator which SQL dialect to use. Supported values are:
ansi
mysql
postgresql
oracle
sqlserver
The response is intentionally structured:
{
"valid": false,
"engine": "postgresql",
"errors": [
{
"code": "TYPO_FORM",
"message": "Possible typo detected.",
"suggestion": "FROM",
"severity": "error"
}
],
"explanation": "The keyword FORM is not valid SQL in this position. It looks like you meant FROM.",
"correctedSql": "SELECT * FROM users;"
}
This shape is useful because humans can read it, and tools can consume it. An IDE extension could display errors[0].message. A CI job could fail when valid is false. A pull request bot could post the explanation as a review comment.
The request is validated with Zod:
import { z } from "zod";
import { sqlEngines } from "../types/validation.js";
export const validateSqlSchema = z.object({
engine: z.enum(sqlEngines),
query: z
.string()
.trim()
.min(1, "query is required")
.max(10_000, "query must be smaller than 10,000 characters")
});
The Express route stays small:
apiRouter.post("/validate", (request, response) => {
const parsed = validateSqlSchema.safeParse(request.body);
if (!parsed.success) {
return response.status(400).json({
valid: false,
errors: parsed.error.flatten()
});
}
return response.json(validateSql(parsed.data));
});
That is the whole HTTP boundary. Everything else belongs to the validator.
Building the Validation Engine
The validator detects:
-
FORMinstead ofFROM - missing commas in simple
SELECTlists - unclosed quotes
- unclosed parentheses
- multiple SQL statements
- forbidden write or schema statements
-
SELECTwithout columns - engine-specific syntax problems
The orchestration function is short:
export function validateSql(request: ValidationRequest): ValidationResult {
const query = normalizeSql(request.query);
const errors = runValidationRules(query, request.engine);
const { explanation, correctedSql } = explainValidation(query, request.engine, errors);
return {
valid: errors.every((error) => error.severity !== "error"),
engine: request.engine,
errors,
explanation,
correctedSql
};
}
Each rule produces a structured error:
if (/\bFORM\b/i.test(literalSafeSql)) {
errors.push({
code: "TYPO_FORM",
message: "Possible typo detected.",
suggestion: "FROM",
severity: "error"
});
}
One important detail is that the validator strips string literals before checking keywords. This prevents false positives in queries like:
SELECT 'FORM is mentioned inside a string' AS note;
The project also blocks multiple statements:
SELECT * FROM users; DROP TABLE users;
That query returns an error before any database could execute it. The same read-only policy blocks statements like DROP, DELETE, UPDATE, INSERT, ALTER, and TRUNCATE.
This is not because those commands are always wrong. They are blocked because this validator is designed as a safe educational review layer.
Creating an AI Explanation Agent
The explanation agent does not call a language model. Instead, it uses validation codes to generate helpful explanations.
function explainError(error: ValidationError, engine: SqlEngine): string {
switch (error.code) {
case "TYPO_FORM":
return "The keyword FORM is not valid SQL in this position. It looks like you meant FROM.";
case "MISSING_COMMA":
return "The SELECT list contains adjacent identifiers without a comma. SQL expects each selected column expression to be separated clearly.";
case "ENGINE_SPECIFIC":
return `The SQL uses syntax that does not match ${formatEngine(engine)}.`;
default:
return error.message;
}
}
This gives us an agent-like response without the cost or unpredictability of a real LLM. It is deterministic, testable, and easy to review.
For corrections, the agent only changes SQL when the fix is obvious:
if (error.code === "TYPO_FORM") {
corrected = replaceWord(corrected, "FORM", "FROM");
}
That restraint matters. A validation agent should not invent table names, guess missing columns, or rewrite business logic. It should explain the detected issue and suggest a safe next step.
Real API Examples
Here are examples that show how the API behaves.
Typo: FORM instead of FROM
Request:
curl -s http://localhost:3000/api/validate \
-H "Content-Type: application/json" \
-d '{"engine":"postgresql","query":"SELECT * FORM users;"}'
Response:
{
"valid": false,
"engine": "postgresql",
"errors": [
{
"code": "TYPO_FORM",
"message": "Possible typo detected.",
"suggestion": "FROM",
"severity": "error"
}
],
"explanation": "The keyword FORM is not valid SQL in this position. It looks like you meant FROM.",
"correctedSql": "SELECT * FROM users;"
}
Missing commas
Request:
curl -s http://localhost:3000/api/validate \
-H "Content-Type: application/json" \
-d '{"engine":"ansi","query":"SELECT id name email FROM users;"}'
Response:
{
"valid": false,
"engine": "ansi",
"errors": [
{
"code": "MISSING_COMMA",
"message": "Possible missing comma in SELECT column list.",
"suggestion": "Separate selected columns with commas.",
"severity": "error"
}
],
"explanation": "The SELECT list contains adjacent identifiers without a comma. SQL expects each selected column expression to be separated clearly."
}
Forbidden statement
Request:
curl -s http://localhost:3000/api/validate \
-H "Content-Type: application/json" \
-d '{"engine":"postgresql","query":"DROP TABLE users;"}'
Response:
{
"valid": false,
"engine": "postgresql",
"errors": [
{
"code": "FORBIDDEN_STATEMENT",
"message": "Forbidden statement detected: DROP.",
"suggestion": "Use read-only SELECT queries in this educational validator.",
"severity": "error"
}
],
"explanation": "This validator is intentionally read-only. Write, schema, and execution statements are blocked before any database could run them."
}
PostgreSQL dialect issue
Request:
curl -s http://localhost:3000/api/validate \
-H "Content-Type: application/json" \
-d '{"engine":"postgresql","query":"SELECT TOP 5 * FROM users;"}'
Response:
{
"valid": false,
"engine": "postgresql",
"errors": [
{
"code": "ENGINE_SPECIFIC",
"message": "TOP is not valid PostgreSQL SELECT syntax.",
"suggestion": "Use LIMIT in PostgreSQL.",
"severity": "error"
}
],
"explanation": "The SQL uses syntax that does not match PostgreSQL. Use LIMIT in PostgreSQL."
}
Valid query
Request:
curl -s http://localhost:3000/api/validate \
-H "Content-Type: application/json" \
-d '{"engine":"postgresql","query":"SELECT id, name FROM users WHERE active = true;"}'
Response:
{
"valid": true,
"engine": "postgresql",
"errors": [],
"explanation": "The SQL statement looks valid for PostgreSQL based on the educational rule set."
}
Creating a reusable Codex Skill
The repository includes a reusable Codex Skill in:
skill/SKILL.md
The skill describes how Codex should work with the validator:
- what the validator is for
- what inputs it accepts
- what outputs it should return
- which validation workflow to follow
- which security constraints matter
- how to add new rules safely
The workflow section looks like this:
1. Normalize whitespace without changing SQL meaning.
2. Validate request shape before analyzing SQL.
3. Reject multiple executable statements.
4. Reject forbidden write, schema, or execution statements for read-only workflows.
5. Check mechanical syntax issues.
6. Run engine-specific checks for the selected dialect.
7. Generate an explanation from detected errors.
8. Suggest corrected SQL only when the correction is deterministic.
This is useful because future improvements can be guided by the same operating instructions. For example, a developer could ask Codex to add a new rule for Oracle syntax, update the tests, and keep the API response contract unchanged.
The skill turns the project into something more reusable than a one-off tutorial.
Testing
The project uses Vitest:
npm test
One test checks the typo flow:
it("detects FORM instead of FROM and suggests corrected SQL", () => {
const result = validateSql({
engine: "postgresql",
query: "SELECT * FORM users;"
});
expect(result.valid).toBe(false);
expect(result.errors[0]).toMatchObject({
code: "TYPO_FORM",
suggestion: "FROM"
});
expect(result.correctedSql).toBe("SELECT * FROM users;");
});
Testing is especially important for validators because every rule can create false positives. A good workflow is:
- Add a failing test for the SQL issue.
- Implement the smallest rule that catches it.
- Add at least one test that should not be flagged.
- Keep the API response stable.
That keeps the validator useful instead of noisy.
How this could be used in real projects
This educational API can fit into several workflows.
In an IDE, an extension could call /api/validate when a developer saves a .sql file. The response could become editor diagnostics with a short explanation.
In CI, a script could scan SQL files and fail the build if the response contains blocking errors.
In pull requests, a review bot could post comments like:
SQL validation failed.
Issue: TOP is not valid PostgreSQL SELECT syntax.
Suggestion: Use LIMIT in PostgreSQL.
Explanation: The SQL uses syntax that does not match PostgreSQL.
In a larger AI system, a model could use this validator as a tool. The model would not need to guess whether a query is safe. It could call the API, read the structured response, and explain the result to the developer.
Future Improvements
This project is intentionally small, but the architecture leaves room to grow.
The next step would be adding a real SQL parser for deeper dialect support. Regular expressions are fine for a tutorial, but parser-backed validation is more reliable for nested queries, functions, joins, and complex expressions.
Another improvement would be line and column positions. IDEs and pull request comments are much more useful when they can point to the exact location of the issue.
Policy configuration would also help. Some teams may want to block all write statements in dashboards but allow INSERT or ALTER in migration folders.
Finally, the explanation module could become a real LLM-backed agent. If that happens, the deterministic rules should still run first. The model should explain structured findings, not freely invent validation results. It should also follow prompt-injection protections: treat SQL as untrusted input, separate instructions from user content, and validate model output before returning it.
Conclusion
Text-to-SQL is popular, but SQL validation is just as important.
Developers need tools that review SQL before execution, explain mistakes clearly, catch unsafe statements, and teach dialect differences. This project shows one way to build that kind of tool with a small TypeScript REST API and an agent-style explanation layer.
The result is simple, but useful:
- no database connection
- no ORM
- no frontend
- deterministic validation rules
- structured JSON responses
- AI-style explanations
- a reusable Codex Skill
It is a good foundation for an IDE helper, CI validator, pull request reviewer, or larger SQL code review agent.
Repository:
https://github.com/cs2026086510-a11y/sql-ai-validator-agent.git
References
- OpenAI Codex documentation: https://developers.openai.com/codex
- Codex CLI repository: https://github.com/openai/codex
- Codex Skills documentation: https://developers.openai.com/codex/concepts/customization#skills
- OWASP LLM Prompt Injection Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/LLM_Prompt_Injection_Prevention_Cheat_Sheet.html
- ISO SQL standard overview: https://www.iso.org/standard/76583.html
- PostgreSQL SELECT documentation: https://www.postgresql.org/docs/current/sql-select.html
- MySQL SELECT documentation: https://dev.mysql.com/doc/refman/8.4/en/select.html
- Oracle SELECT documentation: https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/SELECT.html
- SQL Server Transact-SQL reference: https://learn.microsoft.com/en-us/sql/t-sql/language-reference?view=sql-server-ver17
Top comments (1)
Great article, Cristian! It's very clear how you walk through building an AI-powered SQL validation agent — from the REST API layer built with Codex, to how the agent reviews SQL statements and returns structured feedback before anything touches a real database. I really liked how you explained the reasoning behind each technical decision, not just the implementation itself. Nice work!