Imran Siddique

Posted on Jan 1 • Originally published at Medium on Dec 30, 2025

The Self-Evolving Agent (Part 3): The Human in the Loop

#architecture #aigovernance #evaluation #engineeringleadershi

In Part 1 (Beyond Fine-Tuning: Architecting the Self-Evolving Agent), we built the “Brain”-an Async Observer that learns from experience.

In Part 2 (The Self-Evolving Agent (Part 2): Engineering the Signal), we built the “Eyes”-a Signal Engineering system that detects friction without asking for feedback.

This brings us to the final, existential question: If the system observes itself, learns from itself, and manages its own rollout… what do we do?

Do we just sit on the beach? No.

The role of the Senior Engineer doesn’t disappear; it elevates. We move from being the Authors of the code to being the Curators of the system. We stop micromanaging the syntax and start managing the boundaries.

Here is the “Scale by Subtraction” guide to the new Human Role: Subtract the routine, add the rigor.

1. The “Wisdom Curator” (Reviewing Design, Not Syntax)

The Old World:

“I need to review every Pull Request line-by-line. I need to check for missing semicolons, variable naming conventions, and simple logic bugs.”

The Engineering Reality:

This is low-leverage toil. The AI (and the compiler) can handle the syntax.

When you review an AI Agent’s work, you shouldn’t be looking at the variable names. You should be looking at the Alignment.

The Human Role shifts to High-Level Verification :

The “Design Check”: I don’t care how you wrote the function. I care: Did this implementation actually match the Architectural Design Proposal we agreed on?
The “Strategic Sample”: You cannot review 10,000 AI interactions a day. Instead, you review a random sample of 50 to check the “Vibe” and Strategy.
The “Policy Review”: The most critical review is no longer the code; it is the Memory . If the “Async Observer” (from Part 1) wants to save a new lesson saying, “Always ignore 500 errors to keep the user happy,” a Human must reject that, Policy .

The Lesson:

We stop being Editors (fixing grammar) and become Curators (approving the knowledge).

2. Evaluation Engineering (The New TDD)

The Old World:

“I write the code, then I write a unit test to prove it works.”

The Engineering Reality:

In a probabilistic world, you can’t write a unit test that covers every creative variation of an AI’s answer.

If the AI is the “Coder,” the Human is the “Examiner.” We don’t write the implementation anymore. We write the Exam.

This is Evaluation Engineering -the most valuable code a Senior Engineer writes today.

Instead of writing the function parseDate(), you write a dataset of 50 tricky, malformed date strings and the expected output.
You build the Scoring Rubric : “If the answer is correct but rude, score 5/10. If incorrect but polite, score 0/10.”

This is the evolution of TDD (Test Driven Development) into Eval-DD : You write the “Golden Dataset” first and let the AI iterate until it scores >90% against your rubric.

The Lesson:

The “Source Code” of the future isn’t the application logic; it’s the Evaluation Suite that constrains it.

3. Constraint Engineering (The Logic Firewall)

The Old World:

“Prompt Engineering. We need to find the perfect magic words to tell the AI not to delete the database.”

The Engineering Reality:

Prompting is fragile. A “jailbreak” can bypass your polite instructions in seconds.

We cannot rely on the AI’s “self-control” for safety. We need a Logic Firewall.

The Architecture:

We treat Constraint Engineering as a distinct, deterministic architectural layer.

The Brain (LLM): Generates a Plan (e.g., “I will query the DB and email the user”).
The Firewall (Constraint Engine): This is a deterministic code layer (Python/Go/C#). It intercepts the Plan before execution. Check : Does this SQL query contain DROP TABLE? Check : Is the user allowed to email this domain? Check : Is the cost of this action < $0.05?
The Hand (Executor): Only if the Firewall approves, the action is executed.

This allows us to use “Wild/Creative” models for the Brain (High Temperature) because we have a “Strict/Boring” Firewall guarding the door.

The Lesson:

Never let the AI touch the infrastructure directly. The Human builds the walls; the AI plays inside them.

Conclusion: The Architect’s New Job

The fear that “AI will replace engineers” is based on the idea that engineering is just typing code.

If your job is just typing syntax, yes, you are obsolete.

But if your job is Architecture , Strategy , and Safety , you have never been more valuable.

We Curate the Wisdom (ensuring the AI learns the right lessons).
We Engineer the Evaluations (defining what “Good” looks like).
We Build the Constraints (ensuring the system can’t destroy itself).

We are no longer the builders of the machine. We are the architects of its evolution.

Originally published at https://www.linkedin.com.

DEV Community

The Self-Evolving Agent (Part 3): The Human in the Loop

1. The “Wisdom Curator” (Reviewing Design, Not Syntax)

The Old World:

The Engineering Reality:

The Lesson:

2. Evaluation Engineering (The New TDD)

The Old World:

The Engineering Reality:

The Lesson:

3. Constraint Engineering (The Logic Firewall)

The Old World:

The Engineering Reality:

The Architecture:

The Lesson:

Conclusion: The Architect’s New Job

Top comments (0)