Nex Tools

Posted on Apr 24 • Originally published at nextools.hashnode.dev

Claude Code for Data Pipelines: How I Automated My Entire Analytics Stack

#claudeai #productivity #ai #programming

For two years I pulled data manually. Shopify dashboard, Meta Ads dashboard, Klaviyo dashboard, a spreadsheet, and about three hours every Monday morning turning numbers into something actionable.

I'm not doing that anymore. Claude Code manages my entire data pipeline now - from ingestion to transformation to the report that lands in my inbox every morning. This is how I built it and what I'd do differently.

The Problem with Manual Data Work

Manual data pipelines break in two ways. The obvious way: you forget to pull something, or pull the wrong date range, and your analysis is wrong. The less obvious way: they consume so much time and attention that you stop looking at data that would change your decisions.

I was in the second category. I had access to good data but the friction of getting it meant I only looked deeply when something was visibly wrong. By then, problems had been running for weeks.

The goal wasn't automation for its own sake. It was removing friction so I'd actually use the data.

The best analytics setup isn't the most sophisticated one. It's the one you actually look at every day.

What the Pipeline Does

My current setup handles four data sources:

Shopify - orders, revenue, products, customers
Meta Ads - spend, impressions, clicks, conversions, ROAS by campaign and ad set
Klaviyo - email open rates, click rates, revenue attribution by flow
Inventory - stock levels, reorder triggers, days-of-stock remaining

Every morning at 6am, Claude Code pulls fresh data from each source, transforms it into a standard format, calculates the metrics I care about, and writes a daily briefing that I can read in five minutes.

The full build took about two weeks of evenings. Maintaining it takes almost nothing.

Phase 1: The Data Extraction Layer

Each source has its own extractor. These are simple scripts with one job: pull data from the API and write it to a local file in a consistent format.

For Shopify, Claude helped me build this:

const extractShopifyOrders = async (daysBack = 7) => {
  const since = new Date();
  since.setDate(since.getDate() - daysBack);

  const response = await fetch(
    `https://${SHOP_DOMAIN}/admin/api/2024-01/orders.json?` +
    `status=any&created_at_min=${since.toISOString()}&limit=250`,
    {
      headers: {
        'X-Shopify-Access-Token': SHOPIFY_TOKEN,
        'Content-Type': 'application/json'
      }
    }
  );

  const data = await response.json();

  return data.orders.map(order => ({
    id: order.id,
    created_at: order.created_at,
    total_price: parseFloat(order.total_price),
    line_items: order.line_items.map(item => ({
      product_id: item.product_id,
      quantity: item.quantity,
      price: parseFloat(item.price)
    }))
  }));
};

The key design decision: extract only what you need, normalize it immediately, write it as clean JSON. Don't try to be clever at the extraction layer.

For Meta, the extraction is similar but requires handling pagination and rate limits, which Claude generated correctly on the first try when I described the API structure.

Phase 2: The Transformation Layer

Raw data from different sources doesn't talk to each other. The transformation layer translates everything into a common language.

My common language has three entity types:

Revenue events - any money coming in, tagged by source and date
Spend events - any money going out on ads or marketing, tagged by platform and campaign
Performance metrics - calculated ratios like ROAS, conversion rate, email revenue per send

Claude Code built most of the transformation logic. I'd describe what I wanted in plain language:

I need a function that takes a list of Meta Ads campaign objects 
and produces a list of spend events. Each spend event should have:
- date (YYYY-MM-DD)
- platform (always "meta")
- campaign_name
- campaign_id  
- spend_usd
- impressions
- clicks
- conversions
- roas (spend / conversions * average_order_value, where aov is passed as parameter)

The generated code was clean and handled edge cases I would have missed - campaigns with zero spend but active status, timezone normalization, campaigns that ran across midnight.

At the 60% mark of building any data pipeline, you'll want to start using the data. Don't. Finish the transformation layer first. Half-transformed data is worse than no data.

Phase 3: The Calculation Layer

This is where business logic lives. What does ROAS actually mean for my specific operation? What's the threshold between a good day and a bad day? Which metrics should trigger alerts?

I asked Claude to build a calculation module based on my specific economics:

Here are my unit economics:
- COGS per order: $17 USD (product + fulfillment, fixed)
- Average order value: varies, pull from last 30 days actual data
- Meta target ROAS: 2.0 (breakeven at 1.77, I want 13% buffer)
- Email revenue threshold: $500/day is strong, $200-500 is acceptable

Build me a function that takes today's data and returns:
1. Whether we're above or below target ROAS
2. Whether today's email performance is strong/acceptable/weak
3. Net margin for the day (revenue - COGS - ad spend)
4. A traffic light status: GREEN / YELLOW / RED with specific reason

The output: a structured daily assessment that tells me in one glance whether things are working.

Phase 4: The Reporting Layer

I tried building a dashboard. It was beautiful and I stopped looking at it within two weeks.

What I use instead: a plain text briefing that gets generated each morning and sent to my email. No clicking, no logging in, no app to open.

Claude generates the briefing template and the generation logic:

const generateDailyBrief = (metrics) => {
  const status = metrics.overall_status;
  const emoji = status === 'GREEN' ? 'GREEN' : status === 'YELLOW' ? 'YELLOW' : 'RED';

  return `
DAILY BRIEF - ${metrics.date}
Status: ${emoji} ${status}

REVENUE: $${metrics.revenue_today.toFixed(0)} 
vs yesterday: ${metrics.revenue_vs_yesterday > 0 ? '+' : ''}${metrics.revenue_vs_yesterday.toFixed(1)}%
vs 30d avg: ${metrics.revenue_vs_30d_avg > 0 ? '+' : ''}${metrics.revenue_vs_30d_avg.toFixed(1)}%

META ADS:
Spend: $${metrics.meta_spend.toFixed(0)} | ROAS: ${metrics.meta_roas.toFixed(2)}x
Status: ${metrics.meta_status}

EMAIL:
Revenue: $${metrics.email_revenue.toFixed(0)} | Status: ${metrics.email_status}

NET MARGIN: $${metrics.net_margin.toFixed(0)} (${metrics.net_margin_pct.toFixed(1)}%)

${metrics.alerts.length > 0 ? 'ALERTS:\n' + metrics.alerts.map(a => '- ' + a).join('\n') : 'No alerts today.'}
  `.trim();
};

This lands in my inbox at 6:15am. I read it while having coffee. Anything yellow or red gets my attention before the day starts.

What Claude Code Does in the Pipeline Daily

The pipeline runs without my involvement. But there's one point where Claude Code's intelligence adds real value: anomaly explanation.

When a metric hits red, the briefing includes a pre-investigation:

The pipeline detected that Meta ROAS dropped to 1.2x today (threshold: 1.77x).
Claude's pre-diagnosis: [auto-generated explanation based on campaign data]
Suggested investigation: [specific queries to run]

This is generated by passing the anomaly data to Claude with context about what normally causes this type of drop. Not always correct, but it saves 20 minutes of initial investigation on 80% of anomaly days.

Building It: The Practical Order

If I were starting over, this is the sequence I'd use:

Week 1 - Start with one source, complete end-to-end. Pick Shopify or whatever your primary revenue source is. Build extraction, transformation, and a simple report for just that one source. Run it manually every day for a week. Find what's wrong before adding complexity.

Week 2 - Add your second source. The hard part isn't the second extraction - it's reconciling two data sources. Build the unified schema before you build the second extractor.

Week 3 - Automate. Only after the pipeline runs correctly manually do you schedule it. Automating a broken pipeline makes bugs harder to find, not easier.

Week 4 - Add intelligence. Alerting, anomaly detection, automated pre-diagnosis. This is where Claude Code's reasoning capabilities add the most value.

The Maintenance Reality

Six months in, the pipeline requires about 30 minutes per month of maintenance. Most of that is API changes - Meta Ads especially loves to deprecate fields.

When an API changes and breaks something, my process: paste the error to Claude Code along with the relevant extraction code and the API changelog if I have it. It identifies the breaking change and proposes the fix in under five minutes.

This is the other payoff from building with Claude Code: when things break, fixing them is fast.

A data pipeline you maintain is worth more than a sophisticated pipeline you eventually abandon. Build for maintainability first.

Resources

The extraction templates and transformation utilities I use are available at mynextools.com. If you're building for Shopify + Meta, those templates will save you several days of initial setup.

The pattern works for any combination of data sources. The principles - extract clean, transform to common schema, calculate last, report simply - apply regardless of your stack.

What data sources are you trying to connect? Drop them in the comments and I'll share whether I've built an extractor for that API.

Follow for a new Claude Code deep-dive every week.

DEV Community