MD. Zariful Huq

Posted on Mar 16

How AST Made AI-Generated Functions Actually Reliable

#ai #llm #webdev #programming

When I stopped asking the AI to write code and started asking it to describe logic, everything changed — security, predictability, even multi-language support.

I needed AI to generate small, dynamic pieces of logic inside a running system. Nothing exotic — simple calculations, input validations, conditional rules, data transformations. The kind of thing you'd normally write a quick function for, except the rules kept changing and I wanted users to define them without deploying new code.

The obvious approach was to let the model generate a function and execute it. And at first, it worked:

function calculateTotal(price, quantity) {
  return price * quantity;
}

Clean. Correct. Exactly what was needed. So I kept going. And that's when the cracks started to show.

Sometimes the syntax was slightly wrong. Sometimes two runs of the same prompt produced wildly different implementations. Sometimes the model referenced globals it assumed would exist. And occasionally — the part that should make you pause — it reached for eval, fetch, or fs.

Once you're executing AI-generated code inside a running system, you're trusting the model not to do anything dangerous. The model has no idea what your system looks like. It pattern-matches on codebases it trained on. That's not a foundation you want to build on.

I didn't actually want the AI to generate code. I wanted it to generate logic structure. That distinction turned out to be everything.

What AST actually is

AST stands for Abstract Syntax Tree — a structured, tree-shaped representation of a program's logic. Instead of treating code as text, it breaks every operation down into nodes with explicit relationships.

Take a simple function:

function sum(a, b) {
  return a + b;
}

Internally, that's not a string. It's a tree:

FunctionDeclaration
  ├── name: "sum"
  ├── params: ["a", "b"]
  └── body
        └── ReturnStatement
              └── BinaryExpression
                    ├── operator: "+"
                    ├── left:  Identifier("a")
                    └── right: Identifier("b")

Every modern language toolchain works this way. Compilers, linters, formatters, refactoring tools, security scanners — they all convert source text into an AST as early as possible, because structure is vastly easier to reason about than text.

Security scanners walk the tree looking for dangerous patterns. Type checkers resolve variables through scope chains. Formatters rewrite the tree and emit clean code regardless of the original's whitespace.

The question I eventually asked myself: if every serious tool does this for human-written code, why was I letting AI-generated code stay as raw text?

The problem with raw AI-generated code

When a model generates a function, it's producing text. That text has to be parsed, validated, and executed before you know if it's correct — and by then, it's already running in your process.

Here's the kind of thing that can happen:

// The model isn't being malicious — it's pattern-matching
// on "reading config" from its training data
const fs = require('fs');
const config = JSON.parse(fs.readFileSync('/etc/secrets.json'));

function calculate(price) {
  return price * config.taxRate;
}

Looks like a reasonable calculation. But if your system runs it, you've just done a file system read inside what was supposed to be simple business logic. Multiply this across hundreds of dynamically generated functions, and the attack surface becomes enormous.

And that's before accounting for the consistency problem. Ask the same model to generate equivalent logic twice and you'll often get structurally different code — different variable names, different control flow, sometimes different behavior in edge cases. There's no stable contract.

Letting the model generate AST instead

The shift is conceptually simple: instead of generating code, the model generates a data structure that describes what the code should do. Your system then compiles that structure into actual executable code — under complete control, with full validation before anything runs.

The same price * quantity logic that used to be a function now looks like this:

{
  "type": "BinaryExpression",
  "operator": "*",
  "left":  { "type": "Identifier", "name": "price" },
  "right": { "type": "Identifier", "name": "quantity" }
}

The model is no longer writing code. It's describing the shape of logic. Your system receives that description, validates every node against a schema, checks the tree for disallowed patterns, and only then produces a function — using a code generator that you wrote and fully understand.

The architecture that held up

After a few iterations, the pipeline stabilized into something clean:

User request (natural language)
        ↓
LLM generates AST (structured JSON)
        ↓
Schema validation          ← reject malformed trees
        ↓
Security checks            ← node allowlist, depth limits
        ↓
Code generator             ← controlled compilation
        ↓
Executable function

The critical insight: the model is never, at any point, producing something that runs directly. There are two independent checkpoints between model output and execution — and the code that runs is always generated by your system, not the AI.

Why security becomes easy

Once logic is a tree, enforcing security constraints becomes a tree traversal problem. You define what's allowed; everything else is rejected before compilation even begins.

Allowed node types:

BinaryExpression
IfStatement
LogicalExpression
ReturnStatement
CallNode (pre-approved built-ins only)
Identifier, Literal, ConditionalExpression

Blocked node types:

FunctionDeclaration
ImportDeclaration
CallExpression with eval
MemberExpression accessing fs, process, global
WhileStatement (unbounded loops)
NewExpression, AwaitExpression, ThrowStatement

You can also enforce structural constraints that would be nearly impossible to check on raw code: maximum tree depth, maximum node count, an explicit allowlist of which identifiers a function may reference. If a generated AST tries to use a variable that wasn't passed in as an argument, it fails validation before touching the runtime.

Reusable nodes: the real power move

Once you have an AST-based pipeline, a natural next step is exposing pre-built nodes that the model can reference but never reimplement. Think of them as a safe standard library — each node has a vetted internal implementation, and the model only decides how they're composed.

Your system might expose built-in nodes like:

CalculateDiscount
ValidateEmail
NormalizeCurrency
CheckInventory

Instead of generating the discount logic from scratch, the model produces:

{
  "type": "CallNode",
  "node": "CalculateDiscount",
  "inputs": {
    "price": "price",
    "discount": "discountRate"
  }
}

The model isn't writing the discount calculation. It's declaring that the calculation should happen and specifying the wiring. This has a few compounding benefits:

Security — the model can't inject unsafe logic because implementations are controlled by your system
Consistency — the same operation behaves identically across every generated function that uses it
Simpler reasoning — the model only needs to decide which nodes to use and how to connect them, not how to implement each one

It starts to look a lot like how real software is built with libraries. The AST is just the graph that describes how the pieces connect.

Language independence as a bonus

AST has no language. It's a description of logic, not a JavaScript function or a Rust closure or a Python method. That means the same AI-generated tree can be compiled into different targets just by swapping the code generator.

Same AST → JavaScript:

function applyRule(price, quantity) {
  if (quantity > 10) {
    return price * quantity * 0.9;
  }
  return price * quantity;
}

Same AST → Rust:

fn apply_rule(price: f64, quantity: u32) -> f64 {
    if quantity > 10 {
        price * quantity as f64 * 0.9
    } else {
        price * quantity as f64
    }
}

In my case, this wasn't the primary motivation — but once it came for free, it opened up possibilities I hadn't planned for. The AI generates the logic once; the system decides where it runs.

What this resembles, on reflection

Compilers cracked this problem decades ago. Source code as text is just a convenient input format for humans. The moment a compiler gets hold of it, the text is gone — converted to an AST, analyzed, optimized, and compiled into something entirely different. Nobody ships the text to production.

Programs shouldn't remain as text for long. They should become structured representations as early as possible. AI-generated logic is no different — the sooner it becomes a tree you can inspect, the sooner you can trust it.

What surprised me most about this architecture wasn't the security improvement, though that was significant. It was the predictability. Once the model was generating AST instead of code, the outputs became remarkably stable. The same logical requirement produced structurally equivalent trees. Edge cases that used to produce bizarre JavaScript now produced schema errors that were easy to catch and re-prompt for.

The model turned out to be quite good at describing logic structure. It was the job of emitting valid, safe, consistent code that it was bad at — and that was always going to be the system's job anyway.

The shift from code generation to structure generation didn't require a new model, a new framework, or any AI-specific tooling. It just required treating AI output the same way compilers have always treated source code: as a starting point, not a final form.