DEV Community: Sungwoo Lee

Chain-of-Thought Prompting, Explained (with the Research Behind It)

Sungwoo Lee — Thu, 18 Jun 2026 17:57:59 +0000

If you've ever typed "let's think step by step" into ChatGPT and watched the answer quality jump, you've already used chain-of-thought prompting without knowing it. That phrase isn't magic — it's a deliberate technique backed by peer-reviewed research.

What It Is

Chain-of-thought (CoT) prompting instructs an AI model to reason through a problem step by step before delivering its final answer. Instead of predicting a response in one leap, the model generates a sequence of intermediate reasoning steps — the "chain of thought" — that leads to the solution.

Why it works comes down to how language models operate: they predict the next token. Without CoT, a model answering a complex problem must compress all reasoning into a single prediction — it can't "work in its head" the way a human uses scratch paper. CoT changes that. Each intermediate step becomes a scaffold that grounds the next, reducing the compounding error that makes AI unreliable on multi-step tasks.

When a model writes "Step 1: identify the variables" before doing algebra, the context window now contains useful intermediate state — and the next token is predicted against something far more constrained than the raw question. The model is, in effect, using its own output as working memory.

Zero-Shot vs Few-Shot CoT

Zero-shot CoT needs only a trigger phrase — "Let's think step by step" — no examples. The model reasons from scratch.
Few-shot CoT provides 2–5 worked examples that demonstrate the reasoning process before the actual question, constraining the pattern more tightly.

Zero-shot is easier to implement; few-shot is more reliable for specialized or high-stakes tasks where the reasoning format matters.

Dimension	Zero-Shot CoT	Few-Shot CoT
What you provide	Trigger phrase only	2–5 worked examples + trigger
Prompt length	Short	Longer
Reliability	Good for general tasks	Higher for specialized tasks
Research basis	Kojima et al. (2022)	Wei et al. (2022)
API cost	Low	Higher (examples add tokens)

A concrete before/after: ask a multi-step apple word problem directly and a model may answer "22 apples." Append "Let's think step by step" and it lays out each operation (sell 1/3, receive a delivery, sell half) and lands on the correct 17. Same model, same question — the trigger alone shifts it into step-by-step mode.

The Research Behind It

CoT was formalized by Wei et al. (2022), "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (NeurIPS 2022, Google Brain). The headline result: on the GSM8K grade-school math benchmark, CoT pushed 540B PaLM from 57% to 74% accuracy — surpassing fine-tuned models.

Two findings matter for practitioners:

Emergent threshold effect. Below roughly 100B parameters, CoT shows little improvement and can actually hurt by generating confident-sounding wrong chains. Above that threshold, gains are dramatic. This is why CoT matters on GPT-4, Claude 3.5/3.7 Sonnet, and Gemini 1.5/2.0 Pro — not on smaller models.
Self-consistency amplifies CoT. A follow-up technique (Wang et al., 2022) samples multiple reasoning chains and takes a majority vote — improving reliability on tasks with a single correct answer.

The second key paper is Kojima et al. (2022), "Large Language Models are Zero-Shot Reasoners," which showed "Let's think step by step" alone triggers reasoning behavior even without examples — making CoT practical without a library of worked examples per task.

When to Use It — and When to Skip

Use CoT for multi-step math/logic, planning, debugging, argument analysis, decision-making with trade-offs, and research synthesis.

Skip it for simple factual lookups, short creative writing where flow matters, conversational replies, emotional contexts, and high-volume API calls where token cost compounds.

A rule of thumb: if the task is one you'd solve on scratch paper, CoT will help. If it's one you'd answer off the top of your head, CoT just makes the response longer without improving accuracy. Match the technique to the cognitive demand.

How Many Steps?

You generally don't need to specify a count. "Reason through this step by step" lets the model self-determine depth. For more structure, "reason in numbered steps." For very complex problems, add "If any step involves significant uncertainty, flag it explicitly" — this surfaces model uncertainty instead of hiding it in a confident chain.

For the full guide — six copy-ready CoT patterns (universal zero-shot, math/logic few-shot, decision-making, debugging, argument analysis, research synthesis), worked decision examples, and the complete FAQ with sources — I wrote it up here:
https://my-blog.org/tangents/post/chain-of-thought-prompting-explained

Perplexity AI Review: Is It Worth It in 2026?

Sungwoo Lee — Thu, 18 Jun 2026 17:57:50 +0000

Perplexity has carved out a distinct position: it's not a chatbot and it's not a traditional search engine. It's an answer engine that retrieves live web results and hands them back with numbered citations you can actually click.

That's a specific value proposition — worth examining closely before you decide whether the free tier is enough or the $20/month Pro earns its keep.

What Perplexity Actually Does

Perplexity retrieves live web results and synthesizes them with numbered citations, so every claim links back to a source you can verify. Unlike ChatGPT, which reasons from training data, Perplexity's answers are grounded in current web content — including news published minutes ago. The trade-off: its reasoning depth is shallower than a dedicated language model.

The mental model: ChatGPT is a brilliant colleague who's read enormously and reasons deeply — but hasn't checked the news and doesn't footnote. Perplexity is a research assistant who runs to the library in real time, pulls five articles, and hands them over with footnotes. Shallower synthesis, but the receipts are there.

Four Features That Matter

Pro Search — decomposes a complex question into 3–5 sub-queries, searches each independently, then synthesizes across all of them. For simple lookups the difference is marginal; for multi-component research questions it's meaningful. Limited to 5/day on the free tier.
Focus Modes — restrict the search to source types: Academic (peer-reviewed papers — verify DOIs), Reddit (community sentiment, genuinely better than summarizing marketing copy), News (recent events, no social noise), YouTube (titles + transcripts).
Spaces — collaborative research workspaces: pin sources, share, run searches within a collection. Underrated for anyone tracking a beat or building a literature base.
File upload — document analysis (limited on free, full on Pro).

Free vs Pro at a glance

Feature	Free	Pro ($20/mo)
Standard web search	Unlimited	Unlimited
Pro Search (multi-step)	5 per day	Unlimited
Focus Modes (Web/News/Reddit/YouTube)	Yes	Yes
Academic Focus Mode	No	Yes
File upload	Limited	Full
Spaces	No	Yes
Model choice (Claude, GPT-4o, Gemini)	No	Yes
API access	No	Yes (usage-based)

Where It Excels — and Where It Falls Short

Wins:

Current events and recent data — live retrieval beats any model with a training cutoff.
Fact-checking with an audit trail — click through to verify; the receipt is in the response.
Source discovery — three credible sources on a narrow topic, fast.
Cutting "tab overload" — one structured query replaces ten browser tabs.
Reddit/community sentiment — surfaces real user opinions traditional search buries under SEO review sites.

Falls short:

Reasoning and analysis — it summarizes what sources say, not what they imply. For depth, ChatGPT (especially o3) or Claude are stronger.
Hallucination is still present — citations make it auditable, not absent. It can misread a source or surface a citation that doesn't support its claim. Always verify what matters.
Long-form writing — optimized for brief cited summaries, not coherent documents.
No persistent memory — each session starts fresh (Spaces help at the source level, not the conversational level).
Variable source quality — niche topics sometimes return aggregator or paywalled junk.

Is the $20/Month Worth It?

Worth it if you regularly need cited, current information and hit the daily Pro Search limit on the free tier. The clearest signal: if you're doing real research — not casual lookup — and find yourself rationing Pro Searches, the upgrade pays for itself in reduced friction.

The free tier is genuinely useful: unlimited standard web search with citations, plus 5 Pro Searches/day. For casual use that's often enough.

The most underrated Pro feature: model choice. Pro users can switch the underlying model to Claude, GPT-4o, or Gemini — running Perplexity's citation layer on top of a more capable reasoning model. For questions needing both current sources and analytical depth, that combination is genuinely powerful.

Who Should Use It

The right primary tool for people whose core need is current, verifiable information: journalists tracking beats, researchers building literature bases, analysts monitoring industries, students doing source-based work.

The wrong primary tool for people who mainly need synthesis, reasoning, long-form writing, or multi-turn document work — where ChatGPT or Claude serve better at the same price.

The most useful frame: Perplexity isn't trying to replace ChatGPT. It's trying to replace your search workflow. If your current process is opening five tabs, reading three articles, and extracting the relevant parts by hand — Perplexity compresses that into one interaction. Whether that's worth $20/month depends on how often you do it.

For the full review — Pro Search transcripts, the complete feature/tier table, the use-case match table, seven copy-ready research prompt cards, and the FAQ — I wrote it up here:
https://my-blog.org/tangents/post/perplexity-ai-review

Best AI Coding Assistants Compared (2026): Copilot, Cursor, Claude Code & ChatGPT

Sungwoo Lee — Thu, 18 Jun 2026 17:57:41 +0000

Every AI coding tool claims to make you faster. Most of them do — but not in the same way, and not for the same tasks. Picking the wrong tool for your workflow doesn't just waste $20 a month — it creates friction exactly where you need acceleration.

The four tools developers actually debate in 2026 differ at the architecture level, not the feature level:

GitHub Copilot — built for inline completion inside your existing IDE.
Cursor — built around reading your entire codebase.
Claude Code — a terminal-first agent that reasons across files.
ChatGPT — a conversational debugger with excellent explanations.

The right AI coding assistant depends on your workflow, not the feature list.

At a Glance

Tool	Core Strength	Pricing (paid)	Interface	Best For
GitHub Copilot	Inline completion, team adoption, enterprise security	$10/mo individual, $19 Business, $39 Enterprise	VS Code, JetBrains, Neovim	Teams wanting low-friction completion
Cursor	Full codebase context, multi-file edits	Free (limited), $20/mo Pro, $40 Business	VS Code fork	Refactoring, building features from scratch
Claude Code	Multi-file reasoning, agentic loops	Included in Claude Pro ($20) / Max ($100)	Terminal CLI	Senior engineers, complex agentic tasks
ChatGPT	Conversational debugging, explanation, code interpreter	Free, Plus $20, Pro $200	Browser, API, mobile	Learning, debugging with explanation

A few research data points worth knowing (cite at source):

GitHub's own 2023 impact study reported 55% faster task completion with Copilot.
Stack Overflow's 2024 Developer Survey found 76% of developers use or plan to use AI coding tools.
McKinsey (June 2023) reported 45–75% faster code completion in structured tasks with AI tools.

GitHub Copilot: The Team Standard

Right choice when your team is already in VS Code or JetBrains, you want completion with minimal disruption, and enterprise compliance matters. It's the most widely deployed assistant precisely because it sits inside the IDE you already use.

The trade-off: its default context is the active file, not your whole codebase. Multi-file refactoring requires manual context injection.

Use it if: standardized IDE, want adoption without new tooling, need SOC 2 / IP indemnification.
Skip it if: your primary need is refactoring across dozens of files.

Cursor: Full Codebase Context

Cursor's edge is Composer — a multi-file editor that reads your entire repository index and makes coordinated changes across files in a single session, with a unified diff you review before accepting. No other tool here does that as cleanly in 2026.

It's a VS Code fork, so it inherits the extension ecosystem. You can also choose the underlying model per task (Claude or GPT-4o) — reasoning-heavy refactors lean Claude; quick autocomplete leans GPT-4o.

Use it if: active feature development or significant refactoring on a medium-to-large codebase, and you want multi-file edits with a clear diff workflow.

Claude Code: Agentic Reasoning

Terminal-first agent: it reads files, writes code, runs commands, observes output, and iterates — in a loop — without you orchestrating each step. For complex debugging, architecture discussions, or tasks that require running tests until they pass, it's the most capable tool here.

The trade-off is real: it's entirely CLI. No GUI, no inline completion, no visual diff panel. If you're not comfortable in a terminal, this isn't your tool.

Use it if: you're comfortable in the terminal, working on complex multi-file tasks or gnarly debugging. It pairs well with Cursor — Cursor for day-to-day editing, Claude Code for the hard problems.

ChatGPT: Conversational Debugging and Learning

Best for the back-and-forth of "here's my error, here's my code, what's wrong?" and for learning. It explains clearly, walks through concepts, and Code Interpreter (Advanced Data Analysis) actually executes Python and returns results.

No codebase context by default — you paste code rather than reference files — but for its use cases that barely matters. Frequently the right first choice for developers new to AI-assisted coding.

How to Choose

Match the tool to the task, not the brand:

Inline completion without changing your IDE → Copilot
Heavy refactoring with full codebase context → Cursor
Complex multi-file debugging / agentic automation, terminal-comfortable → Claude Code
Learning, conversational explanation, sandboxed code execution → ChatGPT

Most senior developers end up using two: a daily driver (Copilot or Cursor) and a heavy lifter (Claude Code) for the hard problems.

One honest caveat: AI-generated code can include insecure patterns, especially in auth, input validation, and cryptography. Treat it like human-written code — review, lint, and don't ship it unreviewed.

For the full comparison — the master table with Amazon Q Developer added, real before/after agentic session transcripts, copy-ready coding prompts, a situation-to-tool decision table, and the complete FAQ — I wrote it up here:
https://my-blog.org/tangents/post/best-ai-coding-assistants

ChatGPT vs Perplexity: Which AI Tool Actually Wins for Research?

Sungwoo Lee — Thu, 18 Jun 2026 17:57:32 +0000

If you've asked the same research question to both ChatGPT and Perplexity, you've noticed they feel fundamentally different — not just in interface, but in how they respond. That difference isn't a preference thing. It's a design thing, and it has real consequences for your research quality.

Here's the one-line version: ChatGPT is a reasoning engine. Perplexity is a search engine with an AI layer. Neither is "better" in the abstract. But for any specific research task — checking a claim, writing a literature review, tracking breaking news — one of them is almost always the right tool and the other is almost always the wrong one.

The Core Difference: How Each Tool "Thinks"

ChatGPT reasons from its training data — it synthesizes, analyzes, and generates original text based on what it has learned. Perplexity retrieves live web results and summarizes them with citations.

A mental model that makes this concrete: ChatGPT is a brilliant colleague who has read an enormous amount and can reason deeply about it — but hasn't checked the news in a while, and doesn't footnote their claims. Perplexity is a research assistant who runs to the library in real time, pulls five relevant articles, and hands them to you with footnotes — but their synthesis is shallower.

One important caveat: neither tool hallucinates less than the other in any categorical sense. ChatGPT can confidently cite a paper that doesn't exist. Perplexity can misread a source it retrieved. The difference is that Perplexity gives you the receipt — you can click through to verify. With ChatGPT, if you don't already know the space, you can't easily audit the output.

Quick-Reference Comparison

Dimension	ChatGPT	Perplexity
Core approach	Reasoning from training data	Real-time web retrieval + summarization
Source citations	Rarely, unless you push	Every response, by default
Current events	Limited by training cutoff	Live — updated to today
Reasoning depth	Strong — multi-step, cross-domain	Shallower — optimized for retrieval
Long-form writing	Excellent	Weak — outputs are brief summaries
Hallucination risk	Present, hard to detect	Present, easier to verify via citations
Pricing (paid)	$20/mo (Plus), $200/mo (Pro)	$20/mo (Pro)
Best for	Analysis, writing, synthesis	Current facts, source discovery, fact-checking

When ChatGPT Wins

ChatGPT outperforms when the task requires multi-step reasoning, synthesizing ideas across domains, generating novel hypotheses, drafting literature-style overviews, or producing structured analytical output (frameworks, decision trees, comparisons). These are thinking tasks, not retrieval tasks.

The clearest signal: if you'd benefit from an expert collaborator who can think through a problem with you — rather than a research assistant who fetches documents — use ChatGPT.

A copy-ready research prompt for ChatGPT:

(Role) You are a research analyst with expertise in [field].
(Context) I'm writing a [industry report / literature review] for [audience].
This topic is relatively stable — I'm not looking for breaking news.
(Task) Analyze [topic] using [first principles / SWOT / comparative analysis].
Identify the three most important tensions or trade-offs.
(Format) Start with a 2-sentence synthesis. Then numbered sections.
Flag any claim that would benefit from external verification.

When Perplexity Wins

Perplexity is stronger when you need information from the last weeks or months, explicit citations you can click and verify, a fast bibliography on a narrow topic, or fact-checking against the live web. Its Pro Search feature decomposes complex questions into sub-queries, searches each, and synthesizes.

The clearest signal: if you'd ask a reference librarian rather than a subject-matter expert — quick, verifiable, current — use Perplexity.

Where it wins in practice:

Current events — "What happened in EU AI Act enforcement in the last 30 days?" pulls live, dated, clickable sources.
Source discovery — three credible sources on a narrow topic, fast.
Fact-checking — "What is the current federal funds rate?" gives an auditable answer instead of a guess.
Academic discovery (with caveats) — Pro's academic mode surfaces real papers, but verify DOIs; it still occasionally hallucinates citations.

Same Question, Two Tools

Send the same question to both and the outputs diverge predictably. ChatGPT gives a coherent, well-structured analysis with no source trail. Perplexity gives a current, cited summary that's shorter and less analytically deep.

The professional workflow uses both in sequence: Perplexity first for source discovery and recency, ChatGPT second for synthesis and writing.

For context on adoption scale: according to McKinsey's 2024 Global Survey on AI, 65% of organizations are now regularly using generative AI — up from 33% the previous year. At that level of adoption, the question isn't whether to use AI for research; it's whether you're using the right tool for each task.

Which One Should You Pay For?

If you do research daily, ChatGPT Plus ($20/mo) and Perplexity Pro ($20/mo) serve different enough needs that having both is defensible. If you must choose one:

Perplexity Pro if your work is primarily current-events research where citations matter.
ChatGPT Plus if your work is primarily reasoning, writing, and synthesis on established topics.

One underrated detail: Perplexity Pro lets you choose GPT-4o or Claude as the underlying model — so you're effectively getting those models' reasoning with Perplexity's citation layer on top.

The thirty-second tool switch is worth it. Use Perplexity to gather current sources and key facts, then bring those findings into a ChatGPT conversation for analysis and writing. The improvement in output quality is consistent.

For the full breakdown — including six copy-ready prompts (three tuned for each tool), a head-to-head table, and the complete FAQ — I wrote it up here:
https://my-blog.org/tangents/post/chatgpt-vs-perplexity-comparison