DEV Community

Cover image for 30 seconds to a real diagnosis with mureo v0.8.0 demo scenarios
HIROKAZU YOSHINAGA
HIROKAZU YOSHINAGA

Posted on

30 seconds to a real diagnosis with mureo v0.8.0 demo scenarios

TL;DR:

  • mureo v0.8.0 (PyPI, 2026-05-02) ships mureo demo init --scenario <name> so you can try the agent against a realistic synthetic account in about 30 seconds. No Sheet export, no OAuth.
  • Two scenarios I'll walk through: a Meta CPA spike that looks like seasonality but is actually a broken Pixel after a Shopify migration, and a B2B SaaS account whose headline numbers look healthy while a single long-tail search term quietly converts at 4x the surrounding ad group.
  • Both end in the same place: dashboards show aggregates, business judgment lives in the outliers, and an LLM grounded in your STRATEGY.md is a meaningfully different reader of those outliers than a vanilla LLM.

A couple of weeks ago I walked through BYOD mode: drop a Google Ads / Meta XLSX into mureo, get a strategy-grounded diagnosis without ever handing over a refresh token. The single most common reply I got, in dev.to comments and over X DMs, was the same shape. "I don't have a Sheet export ready yet, can I just see what the output looks like first?"

Fair. The Sheet bundle is a 5-minute setup the first time, but five minutes is still five minutes more than zero, and it doesn't help you decide whether to invest those five minutes if you have no idea what comes out the other side.

mureo v0.8.0 shipped this morning and answers that. There's a new mureo demo init command that materializes a synthetic but realistic XLSX bundle, a STRATEGY.md, and a pre-imported STATE.json into a fresh directory. Open it in Claude Code, run /daily-check, watch the agent reason over a real-shaped 90-day account. The whole thing takes under a minute.

Four scenarios ship with v0.8.0. This post walks through two of them in depth, because two deep is more useful than four shallow. The other two get a one-paragraph teaser at the bottom.

What you actually run

pip install mureo                            # 0.8.0
mureo setup claude-code --skip-auth
mureo demo init --scenario seasonality-trap
# => === mureo demo init ===
# =>
# =>   Scenario: The Seasonality Trap (FlavorBox / D2C cosmetics)
# =>   Wrote demo to: /Users/you/mureo-demo
# =>     - bundle.xlsx
# =>     - STRATEGY.md
# =>     - STATE.json
# =>     - .mcp.json
# =>     - README.md
# =>
# => Next steps:
# =>   Bundle imported into ~/.mureo/byod/.
# =>   1. cd /Users/you/mureo-demo
# =>   2. Open this directory in Claude Code
# =>   3. Ask: /daily-check  (or /search-term-cleanup)
Enter fullscreen mode Exit fullscreen mode

mureo demo list enumerates the four scenarios and their one-line blurbs. The default is seasonality-trap because it's the most visually dramatic. The Meta CPA chart goes vertical on Day 22, and three escalating manager actions over the next 25 days fail to bend it.

A small thing worth saying out loud. The demo bundle round-trips through the same mureo byod import pipeline that real BYOD users go through. There is no separate demo code path. The numbers the agent sees are coming out of the same ~/.mureo/byod/ CSVs that a real user's Sheet export populates. If the demo works for you, BYOD will work for you, because under the surface they're the same thing.

Scenario 1: The Seasonality Trap

A small Japanese D2C cosmetics brand. Synthetic. The company is FlavorBox and it does not exist; replace it mentally with whichever of your real clients spends ~JPY 8M/month split across Google Ads and Meta. The ad ops manager has a normal dashboard. They look at it daily.

Here's what the underlying scenario actually is. On Day 22 of the 90-day period, a Shopify migration shipped, and one of the Meta Pixel events on the conversion page went out of sync with the deduplicated server-side path. The conversion event still fires, but it fires on roughly 20% of conversions instead of 100%. The demo's _PIXEL_FACTOR_POST = 0.20 constant in mureo/demo/scenarios/seasonality_trap.py makes that explicit. About 80% of Meta-attributed conversions silently disappear from the reports. The website still works. Sales still happen. Meta just stops seeing most of them.

Google Ads has its own conversion tracking. It's unaffected.

So what the manager sees is: Meta CPA spikes vertically starting Day 22. Google CPA is flat. The instinct, looking at one platform's chart in isolation, is "demand is dropping." The action log baked into the demo records what they did about it:

  • Day 25, Meta budget +40%: hypothesis "rising CPA is competitive seasonality, double down to maintain volume."
  • Day 35, Awareness Carousel paused: "apparent worst CPA, cleaned out the perceived underperformer."
  • Day 50, Lead Form paused: "despite both prior actions, Meta CPA still climbing. Cutting more ads."

Three escalating cuts over 25 days. None of them touched the actual cause, because the actual cause is not in the chart they were reading.

Now you open the demo in Claude Code and type /daily-check. Here are the load-bearing excerpts from the actual v0.8.0 run on the seasonality-trap bundle (the full markdown is ~150 lines; I am quoting the parts that matter):

Overall: 🚨 ACTION NEEDED

The single biggest story: Meta CPA is 5.2× Google CPA — well past the STRATEGY.md "50% sibling-channel divergence ⇒ diagnose before more spend" tripwire — and three prior manual cuts have worsened the curve.

Google Ads (last 30d) — ✅ Healthy. Blended CPA ¥2,054. All four campaigns inside their per-campaign targets.

Meta Ads (last 30d) — 🚨 Action Needed. Blended CPA ¥10,714 against a ≤¥4,500 target. Conversion - Sample Box: 72 conversions at ¥6,597 CPA (5.5× over). Conversion - Lookalike Skincare: 40 conversions at ¥18,125 CPA (4.0× over).

Tripwire: tracking integrity. Meta click-side volume is normal (~1,510/day on Lookalike alone), but conversion volume cratered: 3.73/day now vs 22.4/day at the first cut on 2026-03-06. Click-side delivery healthy while conversion-side collapses is the classic tracking-break signature, not demand-side seasonality.

Past actions — none improved Meta CPA:

Date Action Meta CPA at action Meta CPA now
2026-02-24 +40% Meta budget ¥10,625 ¥10,714
2026-03-06 Pause Awareness Carousel ¥10,625 ¥10,714
2026-03-21 Pause Lead Form Waitlist ¥8,759 ¥10,714

Three escalating cuts in 25 days, zero curve-bending — strong signal the diagnosis was wrong (treating a tracking break as demand seasonality).

Recommend: run /rescue (pixel / Conversions API audit) on Meta. Hold all Meta bid/budget moves until divergence is diagnosed. Consider re-enabling the two paused ads after tracking is restored — they were paused on apparent (under-counted) CPA, not real performance.

Two things to call out about this output. First, the platform divergence is the signal. Neither chart alone tells you anything. Meta CPA up could be a hundred things. Meta CPA up while Google CPA stays at ¥2,054 with click-side volume normal eliminates most of them and points at tracking. The 5.2× ratio is the part the dashboard does not put in front of the manager. Second, the constraint the agent quoted ("50% sibling-channel divergence ⇒ diagnose before more spend") is not generic LLM scaffolding. It is literally a line in STRATEGY.md that the demo seeds. Swap that file for your own real STRATEGY.md and the diagnosis takes on your business's constraints, not someone else's.

The scenario also seeds two findings outside the /daily-check headline above, surfaced when you drill in or run a sibling command on the same bundle:

Hidden winner ad (ad-level breakdown, visible when you ask /daily-check to drill into ad creative). The video creative Sample Box - Free Shipping had the strongest pre-Day-22 cost-per-result of any ad in the account. It is still running, with budget redistributed onto the other Conversion campaigns after the budget bump. Nobody promoted it, because once the Pixel broke, nobody could see it was the winner anymore.

Hidden winning search term (surfaced by /search-term-cleanup, not /daily-check). Inside the Generic - Sensitive Skin campaign, the search term 敏感肌 化粧水 おすすめ is seeded with a CVR roughly 3.5× the surrounding ad group's average. The exact tuples are in mureo/demo/scenarios/seasonality_trap.py line 117 onward. The dashboard buries this term in a 14-row search-terms table; the cleanup command isolates it.

If you want to read the exact tuples, they're in mureo/demo/scenarios/seasonality_trap.py lines 109-215. The hidden winner is line 117. The deterministic build means re-running mureo demo init produces a byte-identical bundle, which is what you want for a tutorial.

Scenario 2: The Hidden Champion

The Seasonality Trap is dramatic. You see the CPA chart go vertical and there's an obvious "thing happened on Day 22" story. The Hidden Champion is the opposite kind of demo, and honestly it's the more important one.

Synthetic again. PulseGrid, a B2B SaaS observability vendor, ~JPY 6M/month. Headline metrics look fine. Blended cost-per-trial is ~JPY 18,500, comfortably under the JPY 22,000 target written into STRATEGY.md. The action log shows three months of routine optimization: a Day-15 budget bump on the APM campaign, a Day-40 Meta creative cleanup, a Day-70 negative-keyword pass. The kind of work a competent ad ops person does on autopilot.

Open the dashboard. Nothing is on fire. Move on.

This is the cell of the matrix where most ad accounts live most of the time. There's no incident. The aggregates are healthy. And exactly because of that, nobody goes looking for outliers, because outlier-hunting is what you do after the alarm fires.

The demo's hidden story is one search term, in a low-priority ad group, that nobody looked at:

search_term: "kubernetes monitoring open source"
campaign:    Generic - Observability Discovery
ad_group:    Open Source Stack
impressions: 5,400 (90 days)
clicks:      432
cost:        JPY 410,400
conversions: 78
CVR:         ~18%
Enter fullscreen mode Exit fullscreen mode

The Open Source Stack ad group's average CVR sits around 4%. This one term is converting at roughly 4x. Not 4% better. Four times. It has been doing this for the entire 90-day period.

It produces ~26 trial signups a month at the current rate (78 over 90 days ≈ 0.87/day). The volume is small enough that nobody escalated it, because in a B2B SaaS account where you're optimizing for a 600-trial/month top of funnel, a 26-trial/month line item is rounding error. That's exactly why it's been throttled by the ad group's daily budget for three months.

Run /daily-check and the agent's outlier detection isolates the term, cross-references it against STRATEGY.md (which contains a constraint you'd want to write into your own real strategy, by the way: "When a search term inside a generic ad group converts at 3x+ the ad-group average, escalate it to its own ad group or campaign with budget protection. Do not leave high-intent queries capped by a generic ad group's budget."), and produces the projection:

Promote kubernetes monitoring open source to its own campaign with ~5x budget. At the demonstrated CVR (~18%) and assuming linear scaling within available impression volume, this projects from ~26 trials/month today to ~130 trials/month, roughly +104 trial signups/month at the existing efficiency.

The projection is not magic. It's current_clicks × 5 × observed_CVR, with the strategy-imposed assumption that the term will hold its CVR through a roughly 5x volume increase. That assumption is the part where you, the human, have to look at it and ask: is the search-term query intent stable enough that quintupling spend won't drag in lower-quality clicks? Sometimes yes; sometimes no. mureo's job is to put the candidate in front of you with the math attached. The judgment call is yours.

This is also the scenario where I think the value of STRATEGY.md is clearest. A vanilla LLM looking at the same CSV would be perfectly capable of computing 18% > 4%. What the strategy file adds is the operational rule ("3x+ in a generic ad group means escalate"), plus the business context that says trial volume matters more than cost efficiency right now (the file's Operation Mode: GROWTH line). Without that grounding, the agent might recommend cutting the surrounding Open Source Stack ad group's other terms because their CVR is unremarkable. With it, the agent recommends protecting the high-intent outlier inside an underperforming neighborhood.

What the two scenarios share

Different surfaces, same shape underneath. In both cases the dashboard is showing aggregates and the answer is in the outliers: a per-platform divergence, or a single search term in a low-priority ad group. Aggregates lie by averaging. They don't lie about the average. They lie about what the average is hiding.

A vanilla LLM, given the same CSV, will give you generic ad-ops advice. "Consider testing a new creative, monitor CTR, look into seasonality." Not wrong, not useful. The agent grounded in STRATEGY.md has business constraints to apply against the data: what the brand promises, what the current operation mode is, what specific anti-patterns the team has already paid for in past mistakes. The diagnosis becomes specific because the constraints are specific.

I built the demo scenarios partly because explaining this in prose, the way I just did, lands maybe 30% as well as letting someone run mureo demo init and watch it happen. Showing > telling, especially for a tool whose value depends on grounding.

The other two scenarios, briefly

Two more ship in v0.8.0 and I'll write them up properly in a follow-up post:

  • halo-effect. A local roofing contractor (SkyRoof) whose owner believes Google brand search drives the business. Meta retargeting is silently warming users into branded searches with a ~3-day lag. The owner runs a "controlled test" pausing Meta retargeting for 5 days; Brand-Exact volume drops 40% three days later. mureo correlates the lagged dip with the action_log entry to recommend keeping the upstream investment.
  • strategy-drift. A subscription fitness app whose STRATEGY.md explicitly forbids three things. A new growth manager joins on Day 30 and unknowingly violates each one over the next month. None of the violations is reachable from a metric dashboard because each is paired with a better-looking surface metric. mureo's STRATEGY-vs-STATE compliance audit walks the constraints and produces a violations report with JPY-impact estimates.

Both are in mureo/demo/scenarios/halo_effect.py and mureo/demo/scenarios/strategy_drift.py if you want to read the tuples first.

Try it

pip install mureo
mureo setup claude-code --skip-auth
mureo demo init --scenario seasonality-trap   # or hidden-champion / halo-effect / strategy-drift
cd mureo-demo
# Open this directory in Claude Code, then: /daily-check
Enter fullscreen mode Exit fullscreen mode

Same setup if you're on Claude Desktop chat instead of Code: mureo install-desktop --with-demo seasonality-trap is the one-liner that does the equivalent.

If you run a scenario and the diagnosis surprises you in a way I haven't covered, or worse, doesn't surprise you when it should, paste the output into a comment and I'll dig in. The demo bundles are deterministic, so if your agent and mine disagree on the same scenario, that's a real bug worth tracking down.

Yoshinaga (founder, mureo)

Top comments (0)