DEV Community: Hamza Khan

3 Prompt Engineering Techniques That Unlock Better AI Reasoning

Hamza Khan — Sun, 11 Jan 2026 20:56:09 +0000

Steering the AI: A Developer's Guide to Prompt Engineering

If you've ever felt frustrated by an LLM giving you surface-level answers or completely missing the point, you're not alone. The good news? You don't need to retrain the model or tweak its weights. You just need better prompts.

Prompt engineering is your steering wheel for Large Language Models. Small adjustments to how you phrase instructions can completely transform the quality of output you get back.

What Exactly Is Prompt Engineering?

LLMs are incredibly powerful, but they're not mind readers. They need clear direction to deliver what you actually want.

Think of prompt engineering as the art and science of crafting instructions that help the model:

Think through problems step-by-step instead of jumping to conclusions
Follow specific constraints you've defined for the task
Stay focused on what actually matters
Avoid shallow or generic responses that waste your time

Here's the key insight: you're not modifying the model's underlying parameters—you're simply changing the instructions. And that changes everything.

It's the fastest, lowest-effort way to get dramatically better results from any LLM, whether you're using GPT, Claude, Gemini, or any other model.

Three Techniques That Unlock Better Reasoning

What makes AI coding assistants so powerful isn't just their ability to generate code—it's their ability to reason through it. This same reasoning capability applies to math problems, logic puzzles, debugging sessions, and any multi-step challenge.

Let's explore three prompting techniques that significantly boost an LLM's reasoning abilities.

1. Chain of Thought (CoT): Show Your Work

Chain of Thought is the simplest and most widely adopted technique in the prompt engineering toolkit.

Instead of asking the LLM to jump straight to the answer, you nudge it to reason step by step—just like showing your work in a math class.

Why it works: When the model articulates its reasoning process, it can catch logical errors and arrive at more accurate conclusions. It's the difference between guessing and understanding.

Simple example:

Standard prompt:
"What is 15% of 80?"

Chain of Thought prompt:
"What is 15% of 80? Let's think step by step."

That tiny addition—"Let's think step by step"—can unlock reasoning capabilities that zero-shot prompting would completely miss.

Real-world coding example:

"Debug this function. First, explain what it's supposed to do, 
then identify potential issues, and finally suggest fixes."

Pro tip: You can also provide an example of step-by-step reasoning (few-shot CoT) to guide the model's thinking pattern:

"Calculate 23% of 150.

Example: To find 15% of 80:
1. Convert 15% to decimal: 15/100 = 0.15
2. Multiply: 0.15 × 80 = 12
3. Answer: 12

Now solve for 23% of 150 using the same approach."

2. Self-Consistency: Democracy of Answers

Chain of Thought is powerful, but it has a weakness: inconsistency.

Run the same CoT prompt multiple times (especially with higher temperature settings), and you might get different answers. Which one is correct?

Self-Consistency embraces this variation strategically.

The approach is simple:

Generate multiple reasoning paths for the same question (typically 5-10 attempts)
Collect all the final answers
Select the most common answer (majority voting)

The logic: When in doubt, ask the model several times and trust the wisdom of the crowd.

This technique often produces more robust results, especially for ambiguous or complex problems. Research shows it can improve accuracy by 10-30% on reasoning tasks compared to single-pass CoT.

However, it focuses on the final answer rather than evaluating the quality of the reasoning itself.

Trade-offs to consider:

Higher accuracy on complex reasoning tasks
More robust against model inconsistencies
Multiple API calls = more latency and higher costs
Not suitable for creative tasks where diversity is desired

When to use it: Critical calculations, important decisions, or situations where accuracy matters more than speed or cost.

Implementation tip:

# Pseudocode for Self-Consistency
responses = []
for i in range(5):
    response = llm.generate(prompt_with_cot)
    responses.append(extract_final_answer(response))

final_answer = most_common(responses)

3. Tree of Thoughts (ToT): Exploring the Decision Tree

While Self-Consistency varies the final answer, Tree of Thoughts varies the reasoning steps themselves.

Instead of following a single linear path, ToT explores multiple branches at each decision point—like a chess player considering different moves before committing.

How it works:

At each reasoning step, the model generates multiple possible next steps (typically 2-5 alternatives)
These branches form a tree structure of possibilities
A separate evaluation process (another LLM call or heuristic) determines which path looks most promising
The model continues down the best path and repeats the process
If a path hits a dead end, backtrack and try another branch

Visual representation:

Problem
├─ Approach A
│  ├─ Step A1 (best)
│  │  └─ Solution
│  └─ Step A2
└─ Approach B
   ├─ Step B1
   └─ Step B2

This is the most sophisticated of the three techniques. It's particularly useful for complex problems where there are multiple valid approaches and the "best" path isn't obvious upfront.

When to use it:

Strategic planning and architecture decisions
Complex debugging with multiple potential root causes
Creative problem-solving where exploration adds value
Game-playing or optimization problems

Real example:

"Design a database schema for a social media app.

At each step:
1. Generate 3 different approaches
2. Evaluate each based on scalability, simplicity, and performance
3. Choose the best path forward
4. Continue until complete

Start by considering different ways to model user relationships."

Important note: ToT requires more sophisticated orchestration—you'll likely need to write code to manage the tree exploration, evaluation, and backtracking rather than relying on a single prompt.

Choosing the Right Technique

Here's a practical decision framework:

Technique	Use When	Trade-off
Chain of Thought	Default for any reasoning task	Single call, good baseline
Self-Consistency	Need high confidence, can afford multiple calls	5-10x cost, better accuracy
Tree of Thoughts	Complex problems with multiple valid paths	Requires orchestration code

Quick decision tree:

Does it need reasoning? → Use CoT at minimum
Is accuracy critical? → Add Self-Consistency
Are there multiple valid approaches worth exploring? → Consider ToT

Practical Tips for Implementation

1. Start simple, then scale up
Begin with basic CoT prompts. Only add complexity when you hit limitations.

2. Measure the improvement
Track metrics before and after applying these techniques. Sometimes the added complexity isn't worth it.

3. Combine techniques
You can use CoT within Self-Consistency, or use Self-Consistency at each node of a Tree of Thoughts.

4. Adjust temperature settings

CoT: Use moderate temperature (0.3-0.7) for consistent reasoning
Self-Consistency: Higher temperature (0.7-1.0) to get diverse paths
ToT: Vary based on exploration vs. exploitation needs

The Bottom Line

Prompt engineering isn't just about being polite to AI or adding "please" to your requests. It's about understanding how to structure instructions so the model can leverage its full reasoning capabilities.

The best part? These techniques work across different models and domains. Whether you're generating code, analyzing data, solving logic puzzles, or debugging complex systems, better prompts lead to better results.

Start today: Add "Let's think step by step" to your next complex prompt and watch the quality improve. It's a small change that makes a massive difference.

Your Turn

What prompting techniques have you found most effective? Have you experimented with any of these approaches? Share your experiences and learnings in the comments below—I'd love to hear what's working for you!

Resources to dive deeper:

Chain-of-Thought Guide - Practical examples and techniques
Self-Consistency Tutorial - Implementation guide
Tree of Thoughts GitHub - Official implementation with code and prompts
Tree of Thoughts Guide - Step-by-step implementation tutorial

Found this helpful? Follow me for more AI engineering insights and practical LLM techniques.

Complete Guide to Setting Up Claude Code Router with Qwen on macOS

Hamza Khan — Sat, 10 Jan 2026 20:52:41 +0000

A complete guide to setting up Claude Code Router (CCR) as a middleware layer to use Qwen's free AI models with Claude Code on macOS.

Why Use This Setup

Claude Code Router redirects Claude Code requests to alternative AI providers like Qwen, giving you:

Claude Code's agentic coding capabilities
Free, open-source models with zero API costs
Full control over your AI provider

Prerequisites

Claude Code requires Node.js 18 or higher. Check your version:

node --version

If you need to install or update Node.js, download it from nodejs.org.

Step 1: Install Required Tools

Install Qwen Code, Claude Code, and Claude Code Router globally:

npm install -g @qwen-code/qwen-code@latest
npm install -g @anthropic-ai/claude-code @musistudio/claude-code-router

Step 2: Authenticate with Qwen OAuth

Qwen Code uses OAuth authentication which is the recommended method for using Qwen models. This provides free access with a quota of 60 requests/minute and 2,000 requests/day.

Start the authentication flow

qwen

This command will automatically open your browser to complete the OAuth login.

Complete browser login

Log in with your qwen.ai account (create one if needed)
Click "Confirm" to authorize access
The CLI will confirm successful authentication

Credentials stored locally

After authorization, your OAuth credentials are automatically saved to /Users/YOUR_USERNAME/.qwen/oauth_creds.json and you won't need to log in again. The credentials will be automatically refreshed when needed.

Getting your access token for Claude Code Router

After successful authentication, you need to extract the access token from the OAuth credentials file:

cat /Users/YOUR_USERNAME/.qwen/oauth_creds.json

Replace YOUR_USERNAME with your actual macOS username.

This will display a JSON file similar to this:

{
  "access_token": "your-long-access-token-here",
  "refresh_token": "your-refresh-token-here",
  "token_type": "Bearer",
  "resource_url": "portal.qwen.ai",
  "expiry_date": 1767490948168
}

Copy the value of access_token (the long string after "access_token":). You'll need this token in the next step when configuring Claude Code Router.

Note: The access token is automatically managed by Qwen Code and refreshes as needed. However, for Claude Code Router integration, you'll need to manually update the token in the router configuration if it expires.

Step 3: Configure the Router

Create the configuration directory

mkdir -p ~/.claude-code-router

Create the configuration file

cat > ~/.claude-code-router/config.json << 'EOF'
{
  "LOG": true,
  "LOG_LEVEL": "info",
  "HOST": "127.0.0.1",
  "PORT": 3456,
  "API_TIMEOUT_MS": 600000,
  "Providers": [
    {
      "name": "qwen",
      "api_base_url": "https://portal.qwen.ai/v1/chat/completions",
      "api_key": "YOUR_ACCESS_TOKEN_HERE",
      "models": [
        "qwen3-coder-plus",
        "qwen3-coder-plus",
        "qwen3-coder-plus"
      ]
    }
  ],
  "Router": {
    "default": "qwen,qwen3-coder-plus",
    "background": "qwen,qwen3-coder-plus",
    "think": "qwen,qwen3-coder-plus",
    "longContext": "qwen,qwen3-coder-plus",
    "longContextThreshold": 60000,
    "webSearch": "qwen,qwen3-coder-plus"
  }
}
EOF

Important: Replace YOUR_ACCESS_TOKEN_HERE with the access token you copied from /Users/YOUR_USERNAME/.qwen/oauth_creds.json in Step 2.

Configuration Breakdown

LOG settings — Enable console output for debugging
HOST/PORT — Local address where CCR listens (127.0.0.1:3456)
API_TIMEOUT_MS — Maximum wait time for API responses (10 minutes)
Providers — Defines Qwen API connection and available models
Router — Specifies which model handles each task type

Step 4: Start the Router

Launch Claude Code Router:

ccr restart

You should see console logs confirming successful startup.

Step 5: Launch Claude Code

Open the interactive CLI:

ccr code

Test with a simple message:

> Hello Claude

You should receive a response from Qwen3-Coder-Plus.

Troubleshooting

404 errors — Verify your api_base_url and access token are correct.
Connection timeouts — Check your network connection and Qwen's service status.
Router won't start — Look for port conflicts or JSON syntax errors in your config file.
Model not responding — Ensure the model name in your config matches exactly with Qwen's available models.

Advanced Configuration

Using Different Models

Update the models array in the config to try different Qwen models:

"models": [
  "qwen3-coder-plus",
  "qwen2.5-coder-32b-instruct",
  "qwq-32b-preview"
]

Adding Multiple Providers

You can configure multiple AI providers in the same config file:

"Providers": [
  {
    "name": "qwen",
    "api_base_url": "https://portal.qwen.ai/v1/chat/completions",
    "api_key": "YOUR_QWEN_TOKEN"
  },
  {
    "name": "another-provider",
    "api_base_url": "https://api.example.com/v1/chat/completions",
    "api_key": "YOUR_OTHER_KEY"
  }
]

Adjusting Timeout Settings

For longer-running tasks, increase the timeout:

"API_TIMEOUT_MS": 900000  // 15 minutes

What's Next

With this setup, you can:

Use Claude Code's full capabilities powered by Qwen models
Update the configuration file to try different models
Add multiple providers for different use cases
Work seamlessly from your terminal

The router handles everything behind the scenes while you maintain Claude Code's familiar interface and workflow.