DEV Community

ABA Games
ABA Games

Posted on

Making AI a Better Coder by Teaching It to Doubt Itself

Introducing /criticalthink: A Command to Sanity-Check Your AI's Suggestions

We've all been there. Your AI coding assistant confidently suggests a brilliant solution. But then you notice it uses a non-existent API, or happily skips all error handling.

AI assistants are powerful, but their unwavering confidence can be misleading. The burden is on us, the developers, to validate their output. The problem is, coming up with the right critical questions on the spot is hard. What exactly should I be checking?

To solve this, I built /criticalthink, a slash command that forces an AI to take a second look at its own proposals.

GitHub logo abagames / slash-criticalthink

The `criticalthink` command is a custom command that embeds healthy skepticism into the dialogue process itself as a countermeasure against AI's "confirmation bias" and humans' "authority bias" of blindly trusting AI responses.

slash-criticalthink

English | 日本語

Overview

Modern LLMs (Large Language Models) excel at generating confident, plausible-sounding responses. However, these responses often ignore real-world constraints or contain logical flaws.

The criticalthink command is a custom command that embeds healthy skepticism into the dialogue process itself as a countermeasure against AI's "confirmation bias" and humans' "authority bias" of blindly trusting AI responses. By having the AI critically analyze its own previous response, it reveals hidden assumptions and overlooked risks.

Target Audience

  • Developers who routinely use coding agents (Claude Code / Codex CLI, etc.)
  • Engineers who want to critically verify AI suggestions rather than accepting them at face value

Setup

Option 1: Manual Installation

  1. Place the command file in the appropriate directory for your tool:

    • Claude Code: .claude/commands/criticalthink.md (in project root or home directory)
    • Codex CLI: ~/.codex/prompts/criticalthink.md (in home directory)
    • Gemini CLI: .gemini/commands/criticalthink.toml (in project root)
  2. Create directory and copy file:

Why the CQoT Framework Sparked This Idea

My inspiration came from a paper by Federico Castagna et al. titled "Critical-Questions-of-Thought" (CQoT). The core idea is simple but powerful: Large Language Models (LLMs) become more accurate when they are forced to critically evaluate their own reasoning before giving a final answer.

The paper uses Toulmin's model of argumentation to check the LLM's reasoning process with eight Critical Questions (CQs):

  1. Does the reasoning start from a clear premise?
  2. Are the premises supported by evidence or facts?
  3. Is there a logical link between the premises and the conclusion?
  4. Is that logical link valid?
  5. Does the reasoning avoid logical fallacies?
  6. Is the conclusion logically derived from the premises?
  7. Is the reasoning consistent with existing knowledge or principles?
  8. Is the conclusion of the reasoning plausible and reasonable?

The AI evaluates its own reasoning against these questions, marking each as Pass/Fail. It repeats this process until it meets a certain standard (e.g., at least 7 out of 8 Passes).

This made me wonder whether we could apply the same principle to coding agents like Claude Code or the Codex CLI.

Introducing /criticalthink

Using the command is straightforward. After the AI generates a response, you simply type /criticalthink. The AI then analyzes its own immediately preceding answer based on the following criteria:

  • Assumptions: What assumptions did I make?
  • Validity: Are those assumptions valid?
  • Logical Flaws: Are there any logical inconsistencies?
  • Risks: What risks or trade-offs have I overlooked?
  • Common AI Pitfalls: Am I falling into common traps like problem avoidance, happy path bias, over-engineering, or hallucination?

For example, if the AI suggests, "Let's use Redis for rate limiting," running /criticalthink might return feedback like, "I have not proposed a fallback strategy for when Redis is unavailable," or "This design introduces Redis as a single point of failure (SPOF)."

How It Differs from CQoT

While both CQoT and /criticalthink are based on AI self-critique, their goals and applications differ.

CQoT is an automated pipeline that integrates critical evaluation into the answer generation process. It's designed to improve accuracy in domains with correct answers, like math and logic problems, achieving a ~5% average accuracy boost on the MT-Bench benchmark.

/criticalthink, on the other hand, is a manual, post-hoc tool that the user triggers after receiving an answer. It's specialized for domains without a single "right" answer, like software design, where the goal is to uncover trade-offs and risks.

Put another way:

  • CQoT aims to turn an AI into a better logician.
  • /criticalthink aims to turn an AI into a more cautious design partner.

Putting It to the Test

The insights it generates can be surprisingly useful. In one case, I asked the AI to "review my README." It responded with a simple summary. After running /criticalthink, it pointed out its own flaw:

"I proceeded without clarifying the user's intent for 'review.' It was unclear whether they wanted a summary, error-checking, or a quality analysis."

It was right. My request was ambiguous.

Of course, the AI's critical analysis isn't perfect. It can be overly conservative or sometimes miss the mark entirely. As always, the developer must apply their own judgment.

Best Practices

You don't need to run /criticalthink on every single response. It consumes tokens and takes time. It's most valuable in high-stakes situations:

  • Architectural decisions
  • Large-scale refactoring
  • Security or performance-related implementations
  • Adopting a new external library

Occasionally the extra critique can clutter the context and negatively influence the AI's subsequent responses. I recommend using it with a "checkpoint" feature in your AI client:

  1. Receive a proposal from the AI.
  2. Run /criticalthink.
  3. Evaluate the analysis.
  4. If the analysis isn't useful, simply revert to the message before you ran the command.
  5. Continue your conversation with a clean context.

Using /criticalthink on This Very Article

I had the AI write the first draft of this blog post. And after it was done, I ran /criticalthink.

It spotted a major gap.

"I wrote this article without having read the actual CQoT paper PDF."

My AI had only read my project's README and assumed it understood the paper. After its own critique prompted me to make it read the source PDF, it identified several crucial missing pieces of information:

  • The specific content of the eight Critical Questions.
  • The four-step pipeline structure (Plan -> Verify with CQs -> Judge -> Final Answer).
  • The quantitative evaluation results on MT-Bench (~5% accuracy improvement).
  • The fact that it works during the generation process, not before.

Adding these details made the article substantially better. Without /criticalthink, this post would have been based on a superficial understanding. It served its exact purpose: to draw a line in the sand and help me, the human, think more critically.

Final Thoughts

AI is an incredible tool, but blindly trusting it is a recipe for disaster. The ideas behind CQoT show that we can improve the quality of AI output by forcing it to doubt itself.

/criticalthink is a simple tool that brings this concept to your daily workflow with coding agents. Use it as a quick sanity check before you commit to a path recommended by your AI partner.

The final decision is, and should remain, human.

Top comments (0)