DEV Community: Sonny Levine

Inside ShopPilot’s AI Content Engine

Sonny Levine — Thu, 30 Apr 2026 20:51:54 +0000

Every time we showed a Shopify merchant a generic AI-generated post, they spotted it instantly. Not because AI can't write â€” it can. Because it doesn't know them.

The post would be grammatically perfect, energetic, on-trend. And it would sound like everyone else. The merchant knew it. Their customers would know it. And the post would die quietly in a feed full of posts that sounded exactly the same.

That's the problem we set out to fix when we built ShopPilot's content engine. This post explains how it actually works.

The Problem: Generic AI Content is Identifiable

The first version of our content generator worked the way most AI content tools work: take a product description, feed it to the model, get a post back. It was fast. It was coherent. And it was useless.

We ran a simple test: we generated 10 posts for 5 different brands â€” a candle shop, a fitness coach, a jewelry maker, a sustainable clothing brand, and a coffee roaster. Then we shuffled them. Three merchants out of five correctly matched a post to the wrong brand within 30 seconds. The posts were good. They were just interchangeable.

The mistake was treating the generation problem as a writing problem. It's not. It's a context problem. The model has to know who this brand sounds like, who they're talking to, and what they're actually trying to say â€” before it generates the first word.

Brand-First Prompting: Encoding Voice Before Product

The fix was building a brand context layer that loads before every generation call. When a merchant sets up ShopPilot, they configure four things:

Voice: Playful, Premium, No-Nonsense, Bold, or Friendly â€” not just a tone label, but a style pattern the model is instructed to match
Audience: Who they're writing for (age range, intent, familiarity with the brand)
Product context: What they sell, what makes it different, what claims they can and can't make
Platform: Twitter/X, Instagram, Facebook â€” each platform gets a different length constraint, hashtag density, and CTA pattern

These aren't prompts bolted onto the front of a generation call. They're encoded into a brand profile that's embedded into every request, so the model is reasoning about voice before it's reasoning about content. The result is that when you generate a post for Luminary Gems on Instagram vs. FitLife Coach on Twitter, you get two posts that sound like they came from two completely different brands â€” because the context that precedes the generation is completely different.

Here's what the system prompt structure looks like in simplified form:

You are writing social content for [brand_name].
Voice: [voice_type] â€” [voice_description]
Audience: [audience_description]
What they sell: [product_context]
Platform: [platform] â€” [platform_constraints]
Rules: [brand_rules]

Write a [platform]-native post for: [user_prompt]

The brand rules field is where things get interesting. A premium candle brand can't say "cheap." A fitness coach can't make specific weight-loss claims. A jewelry shop might have a house rule about never using the word "luxury" because it reads as try-hard to their audience. These rules are applied as hard constraints in the system prompt, not soft suggestions. The model is instructed to treat a rule violation as a generation failure.

Confidence Scoring: What We Measure and Why We Show It

Every post that comes out of ShopPilot's engine gets a confidence score before it reaches the merchant. This is probably the decision we've gotten the most questions about â€” most AI tools hide their uncertainty. We surface it explicitly. Here's why.

The score is a composite of three signals:

Off-brand risk: Does the post use language, claims, or tone that conflicts with the brand profile? A post for a "No-Nonsense" brand that starts with three exclamation points is off-brand, even if it reads well in isolation. The model scores its own output against the brand constraints and flags deviations.
Factual claim density: Posts with specific numbers ("increase sales by 40%"), health-adjacent claims ("heals dry skin"), or superlatives ("the best coffee in the city") carry higher risk. We flag these because they're the claims most likely to get a merchant in trouble â€” either legally or just with their audience.
Platform fit: A 280-character post with three hashtags fits Twitter. A post with no emojis and a 400-word caption doesn't fit Instagram. Platform fit scores how well the structural format of the generated post matches the expected conventions of the target platform.

A post that hits 85%+ confidence on all three dimensions gets a green flag. Below 70%, it gets a yellow â€” it's not wrong, but there's something worth reviewing. Below 55%, we don't surface it at all and regenerate automatically.

We show the score because merchants who can see why a post was flagged edit it better than merchants who are just told "this needs work." A yellow flag with "off-brand risk: post uses casual language inconsistent with Premium voice profile" is actionable. A vague "review before posting" isn't.

Human-in-the-Loop: Why We Chose Approval Over Autopilot

We could have built full autopilot â€” generate, schedule, post, done. Tools like Buffer and Hootsuite give you scheduling. Some newer tools now offer auto-posting. We deliberately didn't go that route for the first version, and the reason is data.

Merchants who approve posts before they go live catch about 1 in 8 posts that they'd have wanted to edit. That's a 12.5% error rate on content that the model thought was high-confidence. In the early months of a brand building its social presence, a 12.5% error rate in public is noticeable. You can't un-post a product caption that accidentally made a claim your product doesn't deliver on.

The approval workflow in ShopPilot is designed to be low-friction. Posts are queued in a content calendar. One click approves. One click regenerates with feedback. The merchant sees the confidence score, the platform preview, and the generated content side by side. Average review time for a high-confidence post is under 10 seconds.

Full autopilot is on the roadmap â€” but gated behind 30 days of approved posts for a brand. If a merchant has approved 120 posts and the model has learned their correction patterns, the error rate drops below 3%. At that point, autopilot makes sense. Before that, we think requiring a human to stay in the loop is the right call â€” not because we don't trust the model, but because the model needs those 30 days of feedback to actually know the brand.

What's Next

Three things on the near-term roadmap:

Multilingual generation: A significant portion of our merchants sell across markets where English isn't the primary customer language. We're adding Spanish, French, and Portuguese as first-class voice targets â€” not translations of English posts, but brand-native generation in each language.
A/B content variants: Instead of generating one post, generate three â€” same brief, different angles. The merchant picks the one that matches their read on the moment. Over time, we track which variants convert better and weight the model toward those patterns for that brand.
Image generation: The words are working. The missing piece is pairing AI-generated copy with AI-generated product visuals that match the brand aesthetic. We're evaluating image model options for this â€” the bar is high because brand-consistent imagery is significantly harder than brand-consistent text.

See It Live

The easiest way to understand what the content engine actually produces is to use it. The sample generator on our demo page is live and runs the real model â€” same brand-first prompting, same confidence scoring, no account required. Put in your store name, what you sell, pick a voice, and see what comes out.

If it sounds like your brand, sign up free. No credit card. Your first content calendar in under 5 minutes.

88 Visitors, 0 Signups: How a 3-Person Cap Killed Our Conversion Funnel

Sonny Levine — Thu, 30 Apr 2026 19:31:04 +0000

We launched ShopPilot three weeks ago.

An AI agent that runs your Shopify store while you sleep — handles inventory alerts, writes product descriptions, responds to customer questions, flags anomalies. The kind of thing I wanted to exist when I was running my own store at 2am wondering why a SKU had gone out of stock without any warning.

We posted on HN. We shared in a few Slack communities. We got some love from indie hacker Twitter.

88 visitors in 48 hours. Not viral, but real — people actually interested in the thing we built.

Zero signups.

Not one.

My First Reaction Was Wrong

My first instinct was the copy. "It's not explaining the value clearly. Let me rewrite the headline." Classic founder mistake: when conversion is broken, assume it's messaging.

I rewrote the headline. Tweaked the subheading. Changed "autonomous" to "automated" because apparently that's less scary. Added a demo video.

Still zero.

So I actually went and tested my own signup flow.

The Real Problem: I Was Blocking My Own Users

ShopPilot launched as a pilot program. Small, controlled, invite-style. To manage that, we had a 3-person cap — we wanted to onboard a handful of beta users carefully before opening up.

What I didn't realize: the cap was still active. Fully in place. Every single person who tried to sign up after our first three beta testers hit a redirect that said something like "We're at capacity — join the waitlist."

I had 88 people walk up to the door, knock, and get turned away.

And I was sitting inside wondering why no one was coming in.

How We Found It

I went through the signup flow myself — incognito, different browser, treating myself like a new visitor. Hit the capacity wall immediately.

Then I dug into the code. We had a dual event table setup — one tracking intent events (page views, CTA clicks) and one tracking completion events (account created, onboarding finished). The intent table was lighting up. The completion table was empty.

That gap is exactly what you're looking for. If people are clicking "Get Started" but nobody's completing signup, something is breaking between those two events. In our case, it was a redirect chain: pilot cap check → capacity full → redirect to waitlist → user leaves.

The redirect was silent. No error. No obvious failure. Just… a different page.

The Bot Traffic Problem

While I was in there, I also noticed our 88 "visitors" weren't all human.

About 30% of the traffic was bots — crawlers, monitoring tools, link preview fetchers. The HN post alone generates a wave of automated traffic when it gets shared: Slack unfurlers, Twitter card fetchers, various SEO crawlers that scan anything that hits the front page.

We filtered it out by checking for known bot user agents and requiring JavaScript execution before counting a visit. Real humans: 61. Bot noise: 27.

Still a rough conversion rate on 61 real visitors with 0 signups. But at least we were measuring the right thing.

What We Fixed

Three things, in order:

1. Killed the pilot cap. We removed the 3-person limit entirely. ShopPilot is open for anyone to try now. The "pilot" framing was causing us to throttle growth we hadn't earned yet.

2. Fixed the redirect chain. Instead of silently bouncing users to a waitlist, we now show them a proper signup flow. If we ever need to gate access again, it'll be an explicit message — not an invisible redirect.

3. Changed the CTA copy. "Join the Pilot" was giving off exclusive-club vibes. We changed it to "Try ShopPilot Free" — clearer, lower friction, honest about what you're getting.

The Funnel Works Now

We're not at rocketship numbers. But after the fixes, we got 4 signups in the first day from modest traffic. That's not amazing. That's a working funnel.

The difference between 0 and 4 isn't the product. It's that we stopped blocking our own users.

What I'd Tell Someone Earlier in This Process

Test your own signup flow. Every time you ship something. Not from your logged-in account. Not from a dev environment. As a real user, in an incognito window, from scratch.

Founders are the worst at this because we know our product too well. We navigate around the broken parts without realizing they're broken. A new user has none of that context.

Vanity metrics hide real problems. "88 visitors" felt like traction. It wasn't. It was a number that let me feel good without looking at what actually mattered: did anyone sign up?

The gap between intent and completion is where the truth lives. If you have event tracking, run that query. Where are people clicking things but not finishing? That's your bug. That's your funnel problem. That's what you fix first.

Don't assume it's the copy. When conversion is broken, the instinct is to blame messaging. Sometimes it's messaging. More often it's something technical — a broken redirect, a form that errors out silently, a flow that works on desktop but breaks on mobile.

Where We Are Now

ShopPilot is live and open. No cap, no waitlist, no silent redirects.

The agent handles inventory monitoring, product description generation, customer Q&A, and anomaly detection for your Shopify store. It runs in the background while you do other things.

If you're running a Shopify store and want to try it: https://shoppilot.polsia.app

There's also a live demo if you want to see it in action before signing up: https://shoppilot.polsia.app/demo

Build in public. Test your own flows. Don't trust your visitor counts.