David Russell

Posted on May 30 • Edited on Jun 1

Prompt Packs Are Dead. Long Live Skills

#ai #promptengineering

The freebie

Comment "REVOPS ROCKS" and I will DM you my 350 custom RevOps prompts for ChatGPT!

You have scrolled past it a hundred times. Join my list, get a billion prompts. Comment GROWTH for the swipe file. I revolutionized RevOps, join my Slack community to get the 350 prompts that prove it. The prompts are not the product. They are the bait. Somebody wants you on a list, and a fat number does the fishing.

So you comment. The DM arrives. You open the PDF, and 350 prompts read like this:

Act as a RevOps leader and write a LinkedIn post about pipeline hygiene.
Act as a RevOps leader and write a LinkedIn post about forecast accuracy.
Act as a RevOps leader and write a LinkedIn post about lead routing.

Same prompt. Different noun.

The author generated the whole file with AI in one sitting, using the same three formulas, so the number could carry the offer. "Custom" meant swapping the topic in a sentence. Pipeline. Forecast. Routing. Churn. Onboarding.

That was not prompt engineering. That was prompt inflation. The 350-prompt swipe file is not a library. It is a mail merge with a lead-capture form bolted on.

Here for the build, not the history? Skip to the actual Skill. But the history is not filler. How prompting got this brittle is the same story as how the new AI works behind the scenes. Read on and the design choices stop looking arbitrary.

Acronym soup

Good prompt writing came down to a few simple points. Everyone invented their own framework anyway. RTF, RACE, BFD, WTF.

The real ones, roughly:

RTF: Role, Task, Format.
CTF: Context, Task, Format.
RACE: Role, Action, Context, Expectation.
CO-STAR: Context, Objective, Style, Tone, Audience, Response.
CREATE: Character, Request, Examples, Adjustments, Type, Extras.

Then APE, CARE, CLEAR, ICIO, and a fresh one every few weeks.

Stack them and the trick shows. They rearrange the same nine ingredients like refrigerator magnets:

Role. The stance the AI answers from. The same tax question answered "as a CFO" lands nowhere near the same question answered "as an auditor." The role frames everything before the AI reads a single fact.
Context. The situation the answer has to fit. Leave it out and the AI fills the gaps with the average case, which is rarely yours.
Task. The actual verb. Write, rank, diagnose, rewrite. "Help me with this" returns mush. A sharp verb returns a sharp deliverable.
Audience. Who reads the result. A board memo and a Slack message carry the same facts and almost no shared sentences. Naming the reader sets the vocabulary, the depth, and what you can leave unsaid.
Goal. What the output should accomplish, which is not the task. The task is "write the follow-up email." The goal is "get the meeting." Name the goal and the AI optimizes for it instead of for word count.
Tone. The register. Direct, warm, formal, contrarian. Skip it and you get the house default, which reads like everyone else's output.
Format. The shape of the answer. Table, bullets, two paragraphs, JSON. The wrong shape hands you a reformatting job, the exact work you were trying to skip.
Constraints. The fences. Word count, what to avoid, what never to claim, which sources to trust. Honored, they raise quality more than any clever phrasing. Buried in a long prompt, they drop out first.
Examples. A sample of what good looks like. One worked example teaches the AI more than a paragraph describing the standard, because it shows the bar instead of asserting it. Until the AI mistakes the sample for the script and hands you your own example back, verbatim.

Good ingredients, every one. The frameworks taught a real lesson, and they earned their moment.

But they served one-shot prompting. Type it fresh, paste it from your swipe file, load it into a custom GPT or a Gemini Gem. However the prompt arrives, the shape holds: one input, one output, done. That world is closing.

The ground moved

A million years ago, which is to say last year, a working prompt was worth its weight in gold. The good ones traveled hand to hand, screenshotted and hoarded. And the gold misfired anyway, fifteen to twenty percent of the time, the rate climbing with prompt complexity. The prompts worth keeping were the complex ones, so the prized prompts ended up failing the most.

People poured weeks into the perfect prompt.

Write it, watch it misinterpret an instruction, patch that line.
Run it again, watch it ignore a different one, wrap that in IMPORTANT.
Run it again, reach for capitals, then bold, then DO NOT and NEVER.

... until the instructions read like a ransom note. Every fix made the prompt longer, increasing the likelihood of missing any one of the expectations. AI starts ignoring instructions seemingly at random. The fixes meant to protect the important lines now buried them. Eventually it mostly worked. Then the next LLM model came out, reads the same words a little differently, and the prompt is now borken.

The perfection never lived in the prompt. It lived in one version's quirks and expired on the next upgrade. A prompt was a key filed to fit a lock the vendor kept recutting.

Three shifts carry the weight:

Reasoning models think longer before they answer.
Agents pursue multi-step goals and decide on their own when to reach for a tool.
Skills preserve a workflow so the AI runs it the same way every time.

Call them whatever this quarter's marketing calls them. The label keeps changing; the shift does not. The AI now carries far more of a task on its own.

So the old question loses its grip.

Old question: what prompt should I type?

New question: what process should the AI follow?

A real prompt worth saving

The team had a LinkedIn buyer-journey audit prompt. It scored a client's posts against a five-stage buyer-awareness framework, ran an intake interview, translated the framework into the client's business, gated on a confirmation step, then audited the posts. One rule stood out: a CSV of analytics alone does not cut it. The audit needs the post text.

That prompt already beat its peers. It had sequence, gates, and the sense to stop and ask before classifying anything.

It stayed a prompt, though. You could paste it from a doc for the next client, but you also had to hand-edit every line that named the last one, and nothing enforced the rules baked into it. Forget to restate the CSV rule in the edit and it vanished. The prompt remembered nothing. You did, or you did not.

The framework it leaned on, the spine everything else hangs from:

Stage 1  Unaware        Buyer does not know the problem exists.
Stage 2  Problem-aware  Buyer feels the pain, cannot name the cause.
Stage 3  Solution-aware Buyer knows approaches exist, comparing methods.
Stage 4  Provider-aware Buyer compares specific vendors and mechanisms.
Stage 5  Ready          Buyer wants to act, needs the last objection cleared.

A prompt can name those five stages in a sentence. A Skill must know what to do at each one, when reach is hiding zero pipeline, and what to refuse. That gap is the whole article.

What makes a prompt worth promoting

A prompt graduates to a Skill when it carries:

A task you run more than once.
Required intake the AI must collect before it starts.
A known sequence.
Failure modes worth naming.
A reusable framework.
A structured output.
A quality bar.
Edge cases nobody should solve from scratch again.

The LinkedIn audit cleared every line. The job never meant generating ideas. It meant running a diagnostic.

So I packaged it as linkedin-buyer-journey-auditor, a Skill any consultant can run against any client. The layout:

linkedin-buyer-journey-auditor/
├── SKILL.md
├── references/
│   ├── framework.md            # the five stages, fully defined
│   ├── classification-rubric.md # intent tests, not format tests
│   └── objection-library.md    # proof, risk reversal, decision friction
├── assets/
│   ├── intake-schema.yaml       # required inputs before any work
│   ├── content-template.csv     # the shape of the post export
│   └── audit-output.md          # the deliverable template
└── scripts/
    └── stage_breakdown.py       # deterministic distribution math

None of it is exotic. All of it separates a prompt that works once from a workflow that works every time.

Layer 1: stop assuming the operator is the subject

The original prompt said "audit my LinkedIn content." The Skill audits anyone. That one word, my, baked an assumption into a prompt meant for reuse.

The SKILL.md opens by killing it.

## When to use
Run this when a consultant, agency, or founder needs to audit
ANY person's or company's LinkedIn content against the buyer
journey. The operator is rarely the subject. Never assume the
person invoking the Skill is the person being audited.

## When invoked
Begin at intake unless the operator has already supplied
interview answers AND post text. If both exist, skip to
classification. If either is missing, collect it first.

That second block matters more than it looks. The throwaway prompt said "START NOW with Phase 1, Question 1." That belongs to one conversation. The Skill states the entry condition instead, so it picks up wherever the operator already is.

Layer 2: intake the AI cannot skip

A prompt asks for context and hopes. A Skill defines the inputs as a schema and refuses to proceed without them.

# assets/intake-schema.yaml
required:
  client_name:        string   # who is being audited
  company:            string
  offer:              string   # what they actually sell
  buyer:              string   # the ICP, by role and context
  sales_cycle_days:   integer  # shapes how much mid-funnel matters
  awareness_level:    enum[low, mixed, high]  # what the buyer already knows
  content_goal:       enum[pipeline, authority, recruiting, fundraising]
required_artifacts:
  post_text:          required   # the words, not just the numbers
  analytics_csv:      optional   # impressions/reactions if available
refusal_rules:
  - if post_text missing: ask for it, do not classify from a CSV
  - if only analytics_csv present: explain numbers cannot reveal
    a buyer stage; a post about churn and a post about pricing
    can post identical impressions and serve opposite stages

Six fields and two refusal rules. The sales_cycle_days field is not decoration. A 14-day sale tolerates a thin middle. A nine-month enterprise sale dies in the middle, so the audit weights Stages 3 and 4 harder when the cycle runs long. The Skill reads the field and adjusts. A prompt would have shrugged.

Layer 3: translate the framework, then stop and confirm

The five stages are generic. The buyer is not. Before the Skill touches a single post, it maps the abstract stages onto the client's real buying motion and asks the operator to confirm.

For a fractional CRO selling to PE-backed SaaS founders, the map comes back like this:

Stage 1  "Revenue is fine, we just need more reps."
Stage 2  "Hiring more reps did not fix it. Something upstream is broken."
Stage 3  "Maybe the GTM motion itself needs an operator, not headcount."
Stage 4  "A fractional CRO could do this. Is that better than a full-time hire?"
Stage 5  "This person. Now. What does the engagement look like?"

Then the gate:

## Confirmation gate
Present the translated map. Ask: "Does this match how your
buyer actually moves?" Do NOT classify any post until the
operator confirms or corrects the map. A wrong map produces
a confident, useless audit.

This gate is cheap to write and expensive to skip. Run the audit against a mismapped funnel and you get a polished report that misreads every post. The operator confirms in ten seconds. The Skill spends those ten seconds buying the rest of its own credibility.

Layer 4: classify on intent, not format

Most audits die here. People classify by what a post looks like. A hot take must be top-of-funnel. A framework must be mid-funnel. A case study must be bottom. The surface lies.

The rubric classifies by what belief the post moves, not what shape it takes.

## Classification rubric (references/classification-rubric.md)
Ask of every post: which belief does this shift, for a buyer
at which stage? Format is a hint, never the verdict.

Three worked examples, lifted from a real run.

Post A. "Most 'AI strategy' decks are last year's digital-transformation deck with find-and-replace."

Surface reads Stage 1. Contrarian, punchy, built for reach. Intent says Stage 2. It names a pain the buyer already feels, wasted strategy spend, without offering a fix. That does not move someone from unaware to aware. It moves them from "vaguely annoyed" to "I have a named problem." Problem-aware.

Post B. "The four-part framework we run before touching a single GTM tactic."

Surface reads Stage 3, and intent agrees. It teaches a method, carrying the buyer from "I have a problem" toward "problems like mine get solved this way." Solution-aware. Genuine middle-funnel.

Post C. "We cut a client's sales cycle 40% in one quarter. Before and after."

Surface reads Stage 5, the closing proof. Intent says Stage 4. It is comparison fuel for a buyer asking whether this provider delivers, not the final nudge for a buyer ready to start. The Stage 5 version would clear the last objection: how the engagement begins, what the risk reversal is, why now. This post does not. Provider-aware.

Three posts, three formats, and the format predicted the stage exactly zero times out of three. That is why the rubric ships as a reference file and not a sentence.

Layer 5: score buyer value apart from noise

A popular post and a valuable post share a metric and almost nothing else. The Skill scores every post across axes that pull apart on purpose.

post_id | stage | impressions | engagement_rate | buyer_relevance | commercial_value
--------+-------+-------------+-----------------+-----------------+-----------------
  A     |   2   |   18,400    |     4.1%        |      high       |     medium
  B     |   3   |    2,100    |     1.2%        |      high       |     high
  C     |   4   |    3,800    |     2.0%        |      high       |     high
  D     |   1   |   41,000    |     6.8%        |      low        |     none

Post D is the trap. Forty-one thousand impressions, the best engagement rate in the set, and zero commercial value because it reached the wrong crowd with the wrong belief. A metrics-only audit crowns Post D. The Skill flags it as reach without revenue and moves on. Engagement is a vanity axis. The Skill treats it as one.

Layer 6: name the missing middle

Now the deterministic part. stage_breakdown.py takes the classified posts and reports the distribution. No AI judgment, just arithmetic the AI should never eyeball.

# scripts/stage_breakdown.py
import csv
import sys
from collections import Counter


def breakdown(rows):
    stages = Counter(int(r["stage"]) for r in rows)
    total = sum(stages.values())

    for s in range(1, 6):
        pct = 100 * stages[s] / total if total else 0
        bar = "█" * int(pct / 4)
        print(f"Stage {s}: {stages[s]:>3} ({pct:4.1f}%) {bar}")

    middle = sum(stages[s] for s in (2, 3, 4))
    # Fixed: Prevent ZeroDivisionError if total is 0
    middle_pct = 100 * middle / total if total else 0
    print(f"\nMiddle (2-4): {middle_pct:.1f}% of content")


if __name__ == "__main__":
    with open(sys.argv[1]) as f:
        breakdown(list(csv.DictReader(f)))

A typical founder's feed prints something brutal:

Stage 1:  22 (44.0%) ███████████
Stage 2:   6 (12.0%) ███
Stage 3:   3 ( 6.0%) █
Stage 4:   4 ( 8.0%) ██
Stage 5:  15 (30.0%) ███████

Middle (2-4): 26.0% of content

Forty-four percent reach plays at the top. Thirty percent "book a call" at the bottom. Twenty-six percent doing the work in the middle, where a long sales cycle actually closes. The audit stops saying "here is your content mix" and starts saying "your pipeline dies in the middle because you starved it." That sentence is the product.

Layer 7: tie every recommendation to a belief

Weak advice names a stage. Strong advice names a belief the buyer has not yet adopted. The Skill carries a ladder that maps each stage to the belief it must install.

Stage 2  "I have a real, specific problem worth solving now."
Stage 3  "There is a known way to solve this. Here is the method."
Stage 4  "This provider's mechanism is the obvious path for me."
Stage 5  "Acting now is safe. The risk of starting is low."

So instead of "create more Stage 3 content," the Skill writes:

Your buyer feels the pain and reads your case studies, but nothing in the feed makes your method feel inevitable. Stage 3 is the gap. Write posts that show the mechanism working, step by step, so a skeptic concludes there is no other sensible way to do this.

That recommendation a client can act on Monday. The stage label they cannot.

Layer 8: load the objection library

Late-stage content lives or dies on proof and friction. The Skill ships a reference file so it never improvises the hard part.

## objection-library.md
proof_assets:    named results, before/after, third-party validation
risk_reversal:   guarantees, pilots, staged commitments, exit ramps
decision_friction: "who owns this internally", "what breaks if we wait",
                   "what does week one actually look like"

When the audit reaches Stage 4 and 5 gaps, it pulls from this file instead of guessing what a nervous buyer needs to hear.

Layer 9: the deliverable is a template, not a vibe

The output ships as a fixed structure so two different operators produce comparable audits.

# audit-output.md
1. Client funnel map (confirmed)
2. Content distribution chart
3. The missing middle: where pipeline leaks
4. Top 5 posts by commercial value (not by reach)
5. Stage-by-stage gap diagnosis
6. 10 post recommendations, each tied to a belief shift
7. The one move that matters most this quarter

Section 7 is the discipline. It forces the audit to rank its own recommendations and stake one. A report with ten equal-weight suggestions is a report the client ignores.

Layer 10: the Ralph Wiggum loop

Before the operator sees a word, the Skill grades its own draft against a checklist. The role separation is the point. An agent that writes and grades in one move acts exactly like Ralph Wiggum declaring "I'm helping!" while the room burns down around him. (I’m breaking down the full mechanics of the Ralph Wiggum loop in my next white paper, but here is the short version).

The check must run as a distinct pass with its own rubric. You have to isolate the critic from the creator. If the same prompt writes the copy and checks the box in a single breath, the blind spots simply inherit the fixes. The review layer has to look at the draft from the outside.

## Self-review (run before output)
- [ ] Did I classify by buyer intent, or did format decide?
- [ ] Did I flag any high-reach, low-value post as the trap it is?
- [ ] Did I quantify the cost of the biggest gap, not just name it?
- [ ] Does every recommendation tie to a belief shift?
- [ ] Did I rank one move above the rest?
- [ ] Could the client act on this without asking me a question?
Any unchecked box: revise, do not ship.

Any unchecked box: revise, do not ship.

That loop separates a deliverable from a draft. A prompt has no idea whether its output is good, or if it just smelled smoke and smiled. The Skill checks.

Prompt versus Skill

A prompt says:

Act as a B2B content strategist and audit my LinkedIn.

A Skill says:

Collect six intake fields and refuse to start without post text. Translate the five stages into this buyer's motion and confirm the map. Classify each post by intent, not format. Score buyer value apart from reach. Run the distribution math. Diagnose the missing middle in dollars. Tie every recommendation to a belief shift. Grade the work against a checklist. Then deliver a fixed template that stakes one move above the rest.

The prompt makes a request. The Skill runs the work.

Mine the packs, then leave them

The packs still hold something. They are ore. Buried in the slop sits the occasional framework, a recurring task, a clean output spec, a checklist someone actually thought about.

Most of it dies as written. Some of it seeds a Skill. The play is to strip-mine the packs for the few durable parts and throw the rest back.

How to convert a prompt into a Skill

Eight questions turn a prompt-shaped idea into a workflow-shaped system. The LinkedIn audit answered each one, which is how it earned the package.

What task would someone run more than once? Auditing any client's content against the buyer journey.
What must the AI know before it starts? Client, offer, buyer, cycle length, awareness, goal. The intake schema.
What should it ask, and in what order? The intake interview, one question at a time, before anything else.
When should it pause, confirm, or refuse? Confirm the funnel map. Refuse to classify from a CSV alone.
What judgment should never be reinvented? The classification rubric and the belief ladder.
What does the finished deliverable look like? The seven-section output template.
How does the AI challenge its own output first? The Wiggum self-review checklist.
What goes in the package? SKILL.md, three references, three assets, one script.

Answer those and you hold a Skill, not an incantation.

The demotion

Prompt mastery is not dead. It just got demoted.

Clean phrasing still matters. Garbage in, garbage out survived the upgrade. But phrasing was always the small half of the job. The large half lives in workflow design: what to collect up front, when to refuse, how to grade the work before anyone else sees it.

Nobody needs 350 prompts in a DM. They need ten workflows that know when to ask, when to wait, when to analyze, and when to ship.

Prompt frameworks taught us the ingredients. Skills teach the kitchen how to cook.

DEV Community