Pascal CESCATO

Posted on Oct 14

Building with AI without losing control: Claude, Mistral, and me

#ai #wordpress #productivity #webdev

TL;DR

This isn't really about WordPress or a security plugin. It's about how to effectively collaborate with AI without becoming a blind copy-paster. My lab: refactoring a plugin used on 6000+ sites. My AI partners: Claude and Mistral. Result: 1 hour instead of 10-15 hours, with better code quality than I would have written alone. The secret? Cross-validation and human-in-the-loop.

The real story: It's about the collaboration method

The setup:

Lab: Refactoring a WordPress plugin (6000+ active installs)
AI Partners: Claude (Anthropic) and Mistral AI
My role: Conductor, validator, final decision-maker

The results:
✅ 1 hour of development instead of 10-15 hours
✅ 100% compliant with WordPress standards
✅ Zero regressions on existing sites
✅ Bugs detected that I would have missed
✅ Architecture I wouldn't have dared attempt manually

The secret? Not the AI. Not the human. The collaboration method.

What you'll discover:

How to use two AIs in cross-validation
Why Mistral proposed buggy code (and how Claude caught it)
The human-in-the-loop workflow that changes everything
The 40 invisible errors AI fixed in one pass
What only humans can (and must) do

Spoiler: The WordPress plugin is just a pretext. The real story is how to orchestrate multiple AIs to multiply your productivity by 20 without sacrificing quality.

Act 1: Mistral analyzes (without being asked)

One day, out of curiosity, I presented my WordPress plugin to Mistral AI:

My message (not a request, just a presentation):

"Developed a WordPress plugin - blocks XML-RPC attacks while allowing Jetpack - 6000+ active sites - zero configuration"

[19 lines of code]

I didn't ask for anything. Just presented my work.

Mistral's response: Complete unsolicited analysis

Mistral responded with a thorough technical analysis I hadn't requested:

"Here's a technical and strategic analysis of your plugin, plus improvements to make it more robust while keeping its simplicity."

Strengths identified:
✅ Simplicity and code efficiency
✅ Precise targeting of dangerous methods
✅ Jetpack/WooCommerce compatibility preserved
✅ Optimal performance

Weaknesses detected:
⚠️ Blacklist vs whitelist approach
⚠️ No attack logging
⚠️ X-Pingback header could be re-added
⚠️ Cache plugin compatibility uncertain

And most importantly: Mistral proposed a complete "optimized" version with OOP architecture, admin interface, three security modes, and optional logging.

My first reaction: 🤔 "Interesting... but is it really better?"

With 6000 sites in production, I couldn't just copy-paste AI-generated code, no matter how convincing.

This is where the real story begins: cross-validation.

Act 2: Claude detects what Mistral missed

Facing this impressive but unsolicited proposal, I had the cautious developer's reflex: ask for a second opinion.

Cross-validation in action

I took Mistral's proposed code and submitted it to Claude (Anthropic) with a methodical approach:

My question to Claude:

"Can you compare these 2 approaches for a WordPress plugin?"

[Original code] vs [Mistral's proposed code]

What Claude immediately detected

Critical bug #1: Filter conflicts

// Line 13: Option to disable XML-RPC
add_filter('xmlrpc_enabled', '__return_false');

// Line 57: Filter that ALWAYS runs (CONFLICT!)
add_filter('xmlrpc_enabled', function ($enabled) {
    if (!is_user_logged_in()) return false;
    return $enabled;
});

Problem detected by Claude: The two filters contradict each other. The second makes the first completely useless.

Mistral hadn't seen this logical conflict.

Critical bug #2: Broken logging

$blocked = array_diff(array_keys($methods), array_keys($methods));
if (!empty($blocked)) {
    error_log('Blocked methods: ' . implode(', ', $blocked));
}

Problem detected by Claude: array_diff() of an array with itself always returns an empty array! Logging will never work.

Mistral had proposed this feature... that didn't work.

Critical bug #3: Performance overhead

add_filter('rewrite_rules_array', function ($rules) {
    foreach ($rules as $rule => $rewrite) {
        if (strpos($rewrite, 'xmlrpc.php') !== false) {
            unset($rules[$rule]);
        }
    }
    return $rules;
});

Problem detected by Claude: This filter is called on EVERY request and loops through ALL WordPress rewrite rules. Expensive and unnecessary.

Mistral thought it was improving security, but at the cost of performance.

The finding: Mistral was right about the concept, wrong about the implementation

What Mistral did well:
✅ Identified the need for flexibility
✅ Proposed modern architecture
✅ Suggested relevant features
✅ Correct strategic vision

What Mistral missed:
❌ Critical implementation bugs
❌ Logical conflicts in code
❌ WordPress standards not respected
❌ Performance impacts not anticipated

What Claude brought:
✅ Precise bug detection
✅ Understanding of WordPress standards
✅ Correct implementation solutions
✅ Pedagogical explanations for each problem

Act 3: The human conducts and decides

From there, development became a real ping-pong between the two AIs and me through 5 iterations, each tested with WP_DEBUG enabled.

My role in this orchestration

What I did NOT do:
❌ Write architecture from scratch
❌ Manually implement admin interface
❌ Search WordPress documentation for hours
❌ Do trial & error on hooks

What I DID (and only humans can do):
✅ Decide which of Mistral's proposals to accept or reject
✅ Arbitrate between Claude's suggestions and my constraints
✅ Test at each step with WP_DEBUG
✅ Understand impact on 6000 production sites
✅ Maintain "zero configuration by default" philosophy
✅ Validate that added complexity brings real value

AI proposes. Humans decide.

The revelation: 40 invisible errors

After several iterations and all local tests passing, moment of truth: submit to WordPress.org Plugin Check.

First attempt: The shock

❌ 40+ ERRORS AND WARNINGS

Stunned. The plugin worked perfectly locally:

✅ WP_DEBUG: 0 error
✅ Functional tests: OK
✅ Jetpack compatibility: OK
✅ Performance: Excellent

But WordPress.org categorically refused it.

The errors neither Mistral, nor Claude, nor I had seen

Main issues:

Text domain must be a literal string, not a constant
All outputs must be escaped (esc_html_e instead of _e)
load_plugin_textdomain() deprecated since WP 4.6
Missing translator comments for sprintf() placeholders
$_SERVER['REMOTE_ADDR'] must be sanitized

None of the AIs knew all these strict rules.

Back to Claude with the complete report

My request to Claude:

"Here are the 40 Plugin Check errors. Can you fix all this?"

Claude's response: A completely corrected version with explanations for each error, why WordPress.org rejects it, and the appropriate fix.

Second attempt: Perfection

✅ 0 ERROR
✅ 0 WARNING
✅ 100% COMPLIANT

Impressive. Claude fixed all 40 problems in a single pass.

Time spent on this fix: ~10 minutes with Claude. Without AI: probably 4-5 hours of documentation reading and trial & error.

💡 Key takeaway
Using two AIs in cross-validation + human validation = 20× faster development, 0 regressions, and 100% compliance.

Lessons from this collaboration

Lesson 1: AIs have complementary strengths

Mistral AI: The strategic visionary
✅ Excellent macro vision
✅ Detects conceptual weaknesses
✅ Proposes ambitious architectures
❌ Can propose code with bugs
❌ Implementation details sometimes incorrect

Claude: The rigorous technician
✅ Very precise bug detection
✅ Deep knowledge of standards
✅ Detailed pedagogical explanations
✅ Production-ready code
✅ One-pass corrections

Lesson 2: Cross-validation is essential

What could have gone wrong with blind trust:

Scenario A: Copy-paste Mistral's code
❌ Filter conflict bugs in production
❌ Completely non-functional logging
❌ Performance overhead on 6000 sites
😱 Negative impact on thousands of sites

Scenario B: Trust 100% in AIs
❌ 40 errors the AIs hadn't anticipated
❌ WordPress.org standards AIs didn't all know
❌ Counter-intuitive rules (load_plugin_textdomain deprecated, etc.)

Lesson 3: The human remains the conductor

What AIs did for me:
✅ Proposed complete architecture
✅ Detected bugs in Mistral's code
✅ Corrected non-compliances with standards
✅ Generated production-ready code

What only humans can (and must) do:
🧠 Decide: Accept or reject each proposal
🧠 Arbitrate: Between complexity and simplicity
🧠 Understand: Impact on 6000+ users
🧠 Test: In real conditions, not just theory
🧠 Maintain: Project philosophy
🧠 Assume: Responsibility for deployed code

AI accelerates. Humans direct.

The ideal human-in-the-loop workflow

Here's the method that emerged from this experience:

Step 1: Brainstorming with a "visionary" AI

Action: Present the problem to Mistral (or similar)
Goal: Get ideas, macro vision, architectural proposals
Attitude: Listen, note, but don't implement directly

Step 2: Cross-validation with a "rigorous" AI

Action: Submit proposals to Claude (or similar)
Goal: Detect bugs, validate implementation, correct errors
Attitude: Compare both opinions, identify divergences

Step 3: Informed human decision

Action: Choose what makes sense for the project
Goal: Preserve philosophy, anticipate user impact
Attitude: AI proposes, human decides

Step 4: Iterative implementation

Action: Develop in small testable iterations
Goal: Validate at each step, correct quickly
Attitude: Systematic local testing (WP_DEBUG enabled)

Step 5: Validation with official tools

Action: Plugin Check, compliance tests
Goal: Detect what AIs and local tests missed
Attitude: Accept that even AI can ignore certain rules

Step 6: Final correction with AI

Action: Submit detected errors to Claude
Goal: Fix all compliance issues in one pass
Attitude: AI excels at this type of repetitive task

The numbers

Metric	Value
Initial version	v1.0.1 (19 lines)
Final version	v2.0.0 (387 lines)
Exchanges with Claude	~20 back-and-forths
Bugs detected by Claude	4 critical bugs in Mistral's code
Plugin Check iterations	2 (40 → 0 errors)
Total dev time	~1 hour
Estimated time without AI	~10-15 hours
Time saved	~95%
Active installs	6000+

Final result: Flexible solution without sacrificing robustness

What changed between v1.0.1 and v2.0.0:

19 lines → 387 lines (but organized in OOP class)
1 mode → 3 configurable security modes
0 options → Complete admin interface
No logs → Optional logging with IP tracking

What did NOT change (and this is crucial):
✅ Still active from activation
✅ Zero mandatory configuration
✅ Jetpack/WooCommerce compatibility preserved
✅ Optimal performance maintained
✅ Default behavior identical to v1.0.1

Conclusion: AI multiplies, humans direct

This project showed me that the future of software development is neither:
❌ "AI completely replaces developers"
❌ "Developers do everything manually"

But rather:

✅ Developers orchestrate multiple specialized AIs

Each AI brings complementary strengths
Cross-validation detects each one's weaknesses
Humans make informed final decisions

✅ Systematic and rigorous validation

Local tests with WP_DEBUG
Validation with official tools
Real-world testing
Accept that even AI can be wrong

✅ Ineliminable human responsibility

Understand impact on users
Choose between contradictory proposals
Maintain project philosophy
Assume responsibility for production code

The real gain: 95% time, 100% quality

Without AI: 10-15 hours

With AI: ~1 hour

But: Final quality is identical, or even superior, because:

More rigorous cross-validation than a developer alone
Bug detection I probably would have missed
Standards compliance guaranteed
Cleaner architecture than I would have done manually

AI didn't replace me. It made me better.

My final advice

If you're developing in 2024 without using AI, you're wasting precious time.

If you're developing in 2024 by blindly trusting AI, you're taking huge risks.

The right approach:

✅ Use multiple AIs in cross-validation
✅ Test systematically at each step
✅ Validate with official tools
✅ Understand before implementing
✅ Remain the conductor

AI is an accelerator. Not an autopilot.

Resources

Plugin on WordPress.org: Stop XML-RPC Attacks
AIs used: Mistral AI and Claude
More on AI & development: Zone-Test.ovh

Your experience with AI in development?

👉 Have you tried orchestrating multiple AIs for coding?
What did you learn — synergy or chaos?

Share your experience in the comments! 👇

DEV Community