Onyedikachi Ejim

Posted on Jan 28

From Prompts to Programs: The Promise and Problem of AI-Generated Code

#ai #security #programming #python

Over the course of my AI engineering journey (20+ days and counting), I’ve seen just how many possibilities exist when you start working closely with large language models.

At first glance, LLMs don’t seem that magical.

You send a prompt -> tokens are generated -> text comes back.

We’ve been doing some version of this for years now. Just better models, better refinement, better UX.

But things get really interesting when you stop treating an LLM as just a text generator and start embedding it inside a system.

When LLMs Stop Talking and Start Doing

The real power shows up when an LLM’s output is no longer the final result, but an instruction for something else to happen.

Generate code
Trigger workflows
Transform files
Call tools
Execute logic

Once you let model outputs drive actions, you open the door to a completely different class of applications.

That shift hit me hard around Day 10 of my AI engineering journey, when we covered code generation with structured outputs.

Structured Output: Forcing the Model to Behave

The idea was simple:

Instead of letting the model return any text, you:

Define a structure (schema, format, contract)

Tell the model exactly what the output must look like

Reject anything that doesn’t comply

Now you’re not just “asking for code”, you’re constraining how code is generated.

As I went through the lessons and tasks, my brain immediately jumped to a bigger idea.

The “What If” Moment

What if I built a system where:

A user describes a problem in plain English

The system has no prebuilt feature for that problem

The LLM generates code on the fly based on the request

The code runs and solves a real-world task

Example:

A user uploads an Excel file and says:

“I want this reorganized, grouped, and summarized in a specific way.”

My app doesn’t support this feature at all.

But instead of saying “Sorry, not supported”, the system:

Interprets the request

Generates a custom script

Runs it

Returns the result

That felt… powerful.
Almost too powerful.

And Then Security Enters the Room

That excitement didn’t last long 😅

Because the next question immediately became:

How do you make this safe?

Once you allow:

Dynamic code generation

Execution based on user input

Open-ended instructions

You’re basically inviting abuse.

Prompt injection
Code injection
Escaping sandboxes
Resource exhaustion
Unintended file access
System manipulation

And that’s just the obvious stuff.

Guardrails Everywhere… and the Cost of Them

Naturally, I started thinking about defenses:

Prompt guardrails
Input validation
Keyword blocking
Delimiters and escaping
Schema enforcement
Allowlists
Sandboxing
Adversarial testing

But the more I thought about it, the clearer something became:

Every layer of protection limits the model’s freedom.

And here’s the uncomfortable truth I ran into:

If you already know exactly what code can be generated,
and exactly how it should behave,
why not just write the code yourself?

The only scenario where this system truly makes sense is the most dangerous one:

You don’t know what code will be generated

The schema is created dynamically

Guards are applied dynamically

Code is generated and executed without prior knowledge of the steps

That’s where the real value is.
And that’s also where the real risk lives.

The Hidden Cost: Validation at Scale

Another thought hit me while learning about prompt injection attacks.

There are so many of them.
I’ve already seen more than 10, and I can think of even more.

Each one adds:

Another check
Another regex
Another condition
Another validation pass Now imagine:

20+ validations per request

Multiple users hitting your system simultaneously

What does that do to:

Latency?
Cost?
Complexity?
Reliability?

This is where risk prioritization starts to matter more than perfection.

The Big Takeaway (So Far)

What I’m enjoying most about this journey is how every lesson leads to another question.

You start with:

“Can we do this?”

Then quickly move to:

“Should we do this?”
“At what cost?”
“And for whom?”

LLMs don’t just force you to think about intelligence —
they force you to think about systems, trade-offs, and responsibility.

And honestly?
That’s what’s making AI engineering genuinely exciting for me.

If you’re building systems where models don’t just respond, but act, security isn’t an add-on.

It’s the design.

And I’m still learning how to get that balance right.

DEV Community