DEV Community

Evgenii Milevich
Evgenii Milevich

Posted on

How We Built a Fully Automated Shopify Blog Pipeline with AI

Six months ago, we had a client problem. We run a Shopify development agency (MILEDEVS) and our clients kept asking the same thing: "Can you help us blog consistently?" They understood that organic traffic matters, but nobody had the bandwidth to research topics, write articles, optimize for SEO, create images, and publish — every single week.

So we built a pipeline that does it automatically. Not a WordPress plugin, not a SaaS tool — a custom Node.js system that runs on a cron job and handles the entire workflow from topic discovery to published Shopify blog post.

Here's exactly how it works.


The Architecture

The pipeline has four stages, each handled by a different AI or API:

[Perplexity API] → Topic Research & Trend Detection
       ↓
[GPT-4o]         → Article Generation (SEO-optimized)
       ↓
[Gemini Flash]   → Featured Image Generation
       ↓
[Shopify Admin API] → Publishing with metadata
Enter fullscreen mode Exit fullscreen mode

A single orchestrator script runs nightly, coordinates all four stages, and handles failures gracefully. If image generation fails, the article still publishes with a fallback image. If the Shopify API is rate-limited, posts queue for the next run.

Let's walk through each stage.


Stage 1: Topic Discovery with Perplexity

The hardest part of consistent blogging isn't writing — it's figuring out what to write about. We use Perplexity's API to find topics that are trending in the client's niche but don't yet have strong competition.

async function discoverTopics(niche, existingTitles) {
  const prompt = `You are an SEO content strategist for a ${niche} ecommerce store.

Find 5 blog topic ideas that meet ALL criteria:
1. Currently trending or seasonally relevant (April 2026)
2. Have search intent that leads to product purchases
3. Not already covered: ${existingTitles.join(', ')}

For each topic return:
- title (under 60 chars)
- primary_keyword
- search_intent (informational | commercial | transactional)
- estimated_monthly_volume (rough range)

Return as JSON array.`;

  const response = await perplexity.chat.completions.create({
    model: 'sonar-pro',
    messages: [{ role: 'user', content: prompt }],
  });

  return JSON.parse(response.choices[0].message.content);
}
Enter fullscreen mode Exit fullscreen mode

Why Perplexity instead of a keyword tool? Traditional keyword research tools give you historical data. Perplexity searches the live web, which means it catches trends before they show up in Ahrefs or SEMrush. For a blog pipeline that runs daily, this matters.

We also pass in existingTitles — an array of all previously published posts — so the model doesn't suggest topics we've already covered. Duplicate content is an SEO penalty we can avoid with a simple array check.


Stage 2: Article Generation with GPT-4o

Once we have a topic, we generate the full article. This is where most "AI blog" tools fall flat — they produce generic, surface-level content. Our prompt engineering took weeks of iteration to get right.

async function generateArticle(topic, storeContext) {
  const systemPrompt = `You are a senior content writer for an ecommerce blog.
Store context: ${JSON.stringify(storeContext)}

Rules:
- Write 1200-1800 words
- Use the primary keyword in H1, first paragraph, one H2, and meta description
- Include at least one data point or statistic with a source
- Add internal link opportunities as [INTERNAL_LINK:collection-handle] markers
- Write a meta_description (under 155 chars)
- Tone: expert but approachable, not corporate
- DO NOT use filler phrases like "In today's world" or "It's no secret that"
- Every paragraph must contain actionable information`;

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: `Write a blog post about: ${topic.title}\nPrimary keyword: ${topic.primary_keyword}\nSearch intent: ${topic.search_intent}` }
    ],
    temperature: 0.7,
  });

  const article = response.choices[0].message.content;
  return postProcess(article, storeContext);
}
Enter fullscreen mode Exit fullscreen mode

The postProcess function handles several things that the LLM can't do reliably:

function postProcess(article, storeContext) {
  // Replace internal link markers with real Shopify URLs
  let processed = article.replace(
    /\[INTERNAL_LINK:([\w-]+)\]/g,
    (_, handle) => {
      const collection = storeContext.collections.find(c => c.handle === handle);
      return collection
        ? `[${collection.title}](/collections/${handle})`
        : '';
    }
  );

  // Inject product recommendation block after 3rd paragraph
  const paragraphs = processed.split('\n\n');
  if (paragraphs.length > 4) {
    const productBlock = buildProductRecommendation(storeContext.featuredProducts);
    paragraphs.splice(3, 0, productBlock);
    processed = paragraphs.join('\n\n');
  }

  // Add table of contents from H2 headings
  const toc = buildTableOfContents(processed);
  processed = toc + '\n\n' + processed;

  return processed;
}
Enter fullscreen mode Exit fullscreen mode

The [INTERNAL_LINK:handle] marker pattern is a technique we use a lot. Instead of asking the LLM to generate correct URLs (which it will hallucinate), we ask it to place semantic markers. Our code then resolves those markers against real data. This separation of concerns — LLM handles content, code handles data — eliminates an entire class of errors.


Stage 3: Featured Image Generation with Gemini

Every blog post needs a featured image. Stock photos look generic. Custom photography is expensive. AI-generated images hit a sweet spot when done right.

async function generateFeaturedImage(topic, articleExcerpt) {
  const prompt = `Create a clean, professional blog header image for an article titled "${topic.title}".

Style: modern flat illustration, muted color palette, no text overlays,
suitable for an ecommerce blog. The image should suggest the topic
without being literal. Aspect ratio 16:9.

Article context: ${articleExcerpt.substring(0, 300)}`;

  const response = await genai.models.generateImages({
    model: 'imagen-3.0-generate-002',
    prompt: prompt,
    config: {
      numberOfImages: 1,
      aspectRatio: '16:9',
    },
  });

  const imageBuffer = Buffer.from(response.generatedImages[0].image.imageBytes, 'base64');

  // Optimize before upload — Shopify doesn't compress on ingest
  const optimized = await sharp(imageBuffer)
    .resize(1200, 675, { fit: 'cover' })
    .webp({ quality: 82 })
    .toBuffer();

  return optimized;
}
Enter fullscreen mode Exit fullscreen mode

The sharp optimization step is non-optional. Raw AI-generated images are typically 2-4 MB. After resizing to 1200x675 and converting to WebP at quality 82, they're under 100 KB with no visible quality loss. Since these images load on every blog listing page, the cumulative bandwidth savings are significant.

We also explicitly tell the model "no text overlays" because AI-generated text in images is still unreliable — misspellings and garbled characters would look unprofessional.


Stage 4: Publishing to Shopify

The final stage uploads the image and creates the blog post via Shopify's Admin API.

async function publishToShopify(article, image, topic, shopifyClient) {
  // Upload image first
  const stagedTarget = await shopifyClient.graphql(`
    mutation {
      stagedUploadsCreate(input: {
        resource: BLOG_IMAGE
        filename: "${topic.primary_keyword.replace(/\s+/g, '-')}.webp"
        mimeType: "image/webp"
        httpMethod: POST
      }) {
        stagedTargets {
          url
          resourceUrl
          parameters { name value }
        }
      }
    }
  `);

  // POST image to staged URL (simplified)
  const target = stagedTarget.stagedUploadsCreate.stagedTargets[0];
  await uploadToStaged(target, image);

  // Create the blog article
  const result = await shopifyClient.graphql(`
    mutation {
      articleCreate(article: {
        blogId: "gid://shopify/Blog/${BLOG_ID}"
        title: "${escapeGql(topic.title)}"
        body: "${escapeGql(article)}"
        summary: "${escapeGql(topic.meta_description)}"
        image: { src: "${target.resourceUrl}" }
        isPublished: true
        publishDate: "${new Date().toISOString()}"
      }) {
        article { id handle }
        userErrors { field message }
      }
    }
  `);

  return result;
}
Enter fullscreen mode Exit fullscreen mode

Why GraphQL instead of REST? Shopify's REST blog API works fine for simple cases. But the GraphQL mutation lets us handle staged uploads and article creation in a predictable flow. More importantly, userErrors in GraphQL responses are structured — we can parse and retry specific failures instead of guessing from HTTP status codes.


The Orchestrator

The cron job that ties it all together runs at 21:00 UTC daily:

// Simplified orchestrator
async function runPipeline(storeConfig) {
  const existingPosts = await fetchExistingBlogPosts(storeConfig.shopify);
  const existingTitles = existingPosts.map(p => p.title);

  const topics = await discoverTopics(storeConfig.niche, existingTitles);
  const selectedTopic = topics[0]; // Take highest-priority topic

  const article = await generateArticle(selectedTopic, storeConfig.context);

  let image;
  try {
    image = await generateFeaturedImage(selectedTopic, article);
  } catch (err) {
    console.error('Image generation failed, using fallback:', err.message);
    image = storeConfig.fallbackImage;
  }

  const result = await publishToShopify(
    article, image, selectedTopic, storeConfig.shopify
  );

  await logToDatabase({
    topic: selectedTopic,
    articleId: result.article?.id,
    status: result.userErrors?.length ? 'error' : 'published',
    errors: result.userErrors,
    timestamp: new Date(),
  });
}
Enter fullscreen mode Exit fullscreen mode

Error handling philosophy: Every stage can fail independently. Image generation failures shouldn't block publishing. Shopify rate limits shouldn't crash the process. Each failure is logged with enough context to debug, and the pipeline moves on. Over six months, we've had a 94% success rate on first attempt, with failures almost always being transient API issues that resolve on retry.


Results After 6 Months

For one client store in the home decor niche:

  • 180+ articles published automatically
  • Organic traffic up 340% (from ~2,000 to ~8,800 monthly sessions)
  • 42 keywords ranking on page 1 that had no presence before
  • Average time to publish: 4 minutes from trigger to live post
  • Monthly cost: ~$45 in API calls (Perplexity + OpenAI + Gemini)

The ROI math is straightforward: $45/month in API costs replaces what would be $2,000-3,000/month in freelance writing fees for equivalent output.

You can see examples of pipeline-generated content on our blog — some of our own posts use the same system.


What We'd Do Differently

1. Add a human review queue. Fully autonomous publishing works 90% of the time. The other 10% produces posts that are technically correct but tonally off, or that cover a topic too similar to an existing post. A Slack notification with a 2-hour approval window would catch these.

2. A/B test titles. We generate the article but not title variations. Running two titles against each other on social channels before committing to one for SEO would improve click-through rates.

3. Multi-language support. Several clients sell internationally. Generating articles in 2-3 languages per topic would multiply the organic reach with minimal additional API cost.


Should You Build This?

If you're publishing fewer than 4 blog posts per month because of bandwidth constraints, and you have a Shopify store with products that benefit from content marketing — yes. The pipeline pays for itself within the first month.

If you're interested in this kind of automation for your store, reach out to our team at MILEDEVS. We build custom pipelines, not one-size-fits-all tools.

The full pipeline is about 800 lines of JavaScript. The hardest part isn't the code — it's the prompt engineering and the edge case handling. Both take weeks of iteration that you only learn by running the system in production.

Top comments (0)