The evolution of AI prompting: how 4 years of research inspired my new Claude Code Skill

#opensource #ai #promptengineering #github

We use Large Language Models every day to write code across different languages and frameworks. But how does an AI actually reason about our code?

I recently read six major research papers published between 2022 and 2026.
They trace the entire history of how AI models think, moving from blind trust to a sharp reality check.

Rather than merely taking notes, I decided to turn this academic research into a practical tool.
I built a custom Claude Code skill called cot-skill-claude-code.
It forces the AI to apply the best prompting strategies directly in my terminal.

The golden age of prompting

In 2022, researchers discovered a technique called Chain-of-Thought (CoT).
They found that asking an AI to explain its logic step by step drastically improved its answers.
This mirrors asking a senior developer to explain their architecture before writing a single line of Dart code.

By 2023, a new strategy emerged: Least-to-Most Prompting.
Instead of solving a massive problem at once, the AI broke it into smaller sequential tasks.

Then came Progressive-Hint Prompting in 2024.
This method fed the AI's previous answers back into the prompt as hints, allowing it to refine its own logic iteratively.

The reality check

The honeymoon phase ended with a 2025 paper called the CoT Mirage.
Researchers proved that AI does not actually reason.
It just relies on advanced pattern matching from its training data.
When tasked with building a highly custom architecture, the AI might look confident but fail completely.

To solve this trust issue, a 2026 paper introduced the Thinker-Executor model.
It proposes splitting the work into two separate parts.
One AI agent plans the strict logic and another agent simply executes the code.

What I Built: the CoT Claude Code Skill

I realized that developers need a way to control how much "reasoning" an AI applies to a problem.
I therefore built a Claude Code skill that puts these research findings into practice.

When you run my plugin, it asks you what kind of reasoning mode you need:

Flash Mode: A direct, fast answer for simple syntax checks.
Normal Mode: Full structured reasoning using the Least-to-Most decomposition strategy.
Deep Mode: A multi-step validation process inspired by the Thinker-Executor model, used for complex architectures.

The skill forces Claude to break down problems, analyze constraints, and verify its own logic before generating any code.

If you want to try it out, you can find the plugin on my GitHub:
isSpicyCode/cot-skill-claude-code.
It's fully open-source and built for developers who want reliable answers, not just fast ones.

The six source references, in recommended reading order: