thestack_ai

Posted on Mar 5

Your AI Doesn't Have a Coding Problem. It Has a Thinking Problem.

#ai #claudecode

You ask your AI assistant: "What database should I use for real-time analytics?"

It says: "PostgreSQL is a great choice. It's reliable, widely supported, and handles most use cases well."

You follow up: "Can you review this architecture?"

It says: "Looks good! A few minor improvements might help with scalability."

Sound familiar? The AI isn't wrong. It's just... shallow. And the problem isn't that it's slow or lazy — it's that nobody told it how to think.

That's what I set out to fix.

The Discovery

A few months ago, I went through a Korean AI prompt engineering archive — a collection of 30+ battle-tested reasoning patterns that serious prompt engineers had been using internally. Templates for structured research, adversarial review, creative exploration, deep analysis.

The patterns were good. Really good. But they were locked in a format that required manually copy-pasting prompts every time.

So I converted them into something installable: Stack Skills — 7 meta-cognitive skills for Claude Code (and any other AI coding tool) that change how your AI reasons, not just what it outputs.

The idea is simple: your AI was always capable of this depth. It just needed a structured framework to follow.

The 7 Skills

1. cross-verified-research

A 4-stage pipeline: Deconstruct → Search → CrossVerify → Synthesize. Every source gets graded S/A/B/C based on recency, authority, and cross-validation. Hallucinations get flagged. Output is BLUF (Bottom Line Up Front) with confidence scores.

Stop getting answers. Start getting verified answers.

2. creativity-sampler

Forces 5 probability-weighted alternatives instead of one answer. The catch: at least one option must have p<10% probability — meaning it's genuinely unconventional. Based on the Distribution-First principle: exhaust the solution space before committing.

Your AI will stop defaulting to the obvious answer every time.

3. adversarial-review

Three attack vectors: Logical (is the reasoning sound?), EdgeCase (what breaks this?), Microscopic (what's buried in the details?). "Looks good" is explicitly banned. Every review ends with a PASS/FAIL verdict and severity classification.

Code review that actually finds problems.

4. deep-dive-analyzer

Three modes: Code, System, Concept. Runs through a 5-part Codex structure for exhaustive decomposition — assumptions, dependencies, failure modes, hidden complexity, second-order effects.

For when "explain this" isn't deep enough.

5. persona-architect

Designs a 5-layer persona DNA: Identity, Communication style, Behavioral patterns, Domain expertise, Boundaries. Includes pre-built archetypes for common roles (senior engineer, skeptical reviewer, product strategist).

Build specialized AI agents for recurring workflows.

6. skill-composer

Chains multiple skills into pipelines. Supports Sequential (A→B→C), ForkJoin (parallel branches that merge), and Iterative (loop until condition met) patterns. Separates function layer from persona layer.

The meta-skill. Combine the others into workflows.

7. tiered-test-generator

Generates 3 tiers of questions for any technical topic: Conceptual (understanding), Applied (can you use it?), Expert (edge cases and tradeoffs). Includes a grading engine and diagnostic report.

Useful for onboarding, interviews, or stress-testing your own understanding.

Before / After

Let's go back to that database question.

Prompt: "Which database for real-time analytics on 10M events/day?"

Without skills:

PostgreSQL is a great choice for real-time analytics. It's reliable, widely supported, and can handle this use case with proper indexing.

That's it. One line. No tradeoffs, no alternatives, no assumptions surfaced.

With creativity-sampler + adversarial-review:

OPTION SPACE (probability-weighted):

1. ClickHouse [p=45%] — columnar storage, 10B rows/sec ingestion, native real-time aggregation.
   Hidden assumption: assumes batch insert patterns. Streaming inserts need careful tuning.

2. Apache Druid [p=25%] — built for sub-second analytics at scale.
   Hidden assumption: operational complexity is high. Need dedicated ops if team < 5.

3. TimescaleDB [p=15%] — PostgreSQL extension, familiar tooling, continuous aggregates.
   Hidden assumption: scales to ~1B rows comfortably, beyond that needs partitioning strategy.

4. DuckDB + S3 [p=10%] — OLAP in-process, serverless, near-zero ops.
   Unconventional: works when queries are analytical (not streaming). Often ignored.

5. Kafka + Materialize [p=5%] — streaming SQL on live data.
   Unconventional: treats the stream as the database. High complexity, high power.

ADVERSARIAL VERDICT: FAIL on current framing.
Critical gap: "real-time" is undefined. Sub-second? Near-real-time (30s)? This changes
the entire recommendation. PostgreSQL was never a serious contender here.

That's the difference. Not more words — more thinking.

Does It Actually Work?

I ran structured benchmarks across three task types using a consistent evaluation rubric (accuracy, depth, reasoning transparency, completeness).

Task Type	Without Skills	With Skills	Improvement
Research queries	52/100	89/100	+71%
Technical decisions	48/100	91/100	+90%
Code review	61/100	95/100	+56%
Average	54/100	92/100	+85%

The biggest gains were in decisions and review — exactly where shallow AI responses cause the most real-world damage.

Install in 30 Seconds

npx skills add whynowlab/stack-skills --all

That's it. Works with Claude Code, Cursor, GitHub Copilot, Codex, and Gemini CLI — 41 agent configurations total. Skills are plain markdown files under ~/.claude/skills/, so you can inspect, modify, or fork them freely.

Install individual skills if you want to start small:

npx skills add whynowlab/stack-skills cross-verified-research
npx skills add whynowlab/stack-skills adversarial-review

Then use them directly in your AI session:

/cross-verified-research What are the tradeoffs of edge vs. regional deployment for a low-latency API?

The project is MIT licensed. Source on GitHub: whynowlab/stack-skills.

The Thing Nobody Did

AI coding tools got incredibly fast. They got better at syntax, at autocomplete, at knowing which library method to call.

But fast shallow thinking is still shallow thinking. The bottleneck shifted from "can the AI do this?" to "is the AI reasoning about this correctly?"

The 30+ prompt patterns in this archive existed because people discovered — through expensive mistakes — that AI needs explicit cognitive frameworks to think well. Not hints. Not politeness. Structured protocols that force depth.

Your AI was always capable of thinking this way.

Nobody asked it to. Until now.

Stack Skills is open source. 7 skills, MIT license, works with any major AI coding tool. If you try it and have feedback, I'd genuinely like to hear it — drop a comment or open an issue.

Top comments (1)

thestack_ai • Mar 6

github.com/whynowlab/stack-skills?...

Some comments may only be visible to logged-in visitors. Sign in to view all comments.