DEV Community

Cover image for How PostAll's Content Pipeline Actually Works: A Full Architecture Breakdown
Aakash Gour
Aakash Gour

Posted on

How PostAll's Content Pipeline Actually Works: A Full Architecture Breakdown

I'm going to show you the entire internal architecture of PostAll — the content automation platform I've been building — because I think there's a gap in how people talk about AI content tools publicly.

Most founders describe their product in terms of outcomes: "generate 10x content faster," "consistent brand voice," "scales with your team." That's fine for a landing page. But if you're a developer evaluating whether to build something similar, or trying to understand what makes an AI content pipeline actually work at scale, outcome-speak tells you nothing.

So here's what I should have read when I started: how a production content automation system is actually structured, what each layer does, where things break, and the decisions I made — and regret — along the way.

This is not a tutorial. It's a transparency post. PostAll is live. This is how it works.


The 30-second version

PostAll has five distinct layers:

  1. Queue system — ingests jobs, prioritizes them, handles retries
  2. Orchestration layer — breaks content requests into atomic tasks
  3. LLM layer — calls the models, handles rate limits and fallbacks
  4. Formatting engine — transforms raw LLM output into structured content
  5. CMS connectors — pushes formatted content to wherever the client wants it

Each layer is independently deployable and has its own failure modes. The worst mistake I made early on was treating them as one monolith, and I'll explain exactly what broke.


The queue system

What it does

Every content request — whether it's one blog post or 500 product descriptions — enters PostAll as a job. Jobs hit the queue before anything else happens. The queue is the single source of truth for what work exists, what's in progress, and what's failed.

I use BullMQ (backed by Redis) for this. I evaluated Postgres-backed queues (pg-boss, Graphile Worker) first. They're fine for lower throughput. At scale, the Redis-backed approach handles bursts better and the job visibility tools in BullMQ's UI saved me significant debugging time in production.

// job-producer.ts
import { Queue } from 'bullmq';
import { redis } from './redis-client';

const contentQueue = new Queue('content-generation', {
  connection: redis,
  defaultJobOptions: {
    attempts: 3,
    backoff: {
      type: 'exponential',
      delay: 2000, // 2s, 4s, 8s — stays inside most LLM retry windows
    },
    removeOnComplete: { count: 500 }, // keep last 500 completed jobs for audit
    removeOnFail: false,              // keep all failed jobs — you'll want to inspect them
  },
});

export async function enqueueContentJob(payload: ContentJobPayload) {
  const job = await contentQueue.add(
    'generate',
    payload,
    {
      priority: payload.bulkBatch ? 2 : 1, // single requests get higher priority
      jobId: `${payload.clientId}:${payload.contentId}`, // idempotent — safe to re-enqueue
    }
  );
  return job.id;
}
Enter fullscreen mode Exit fullscreen mode

The jobId pattern matters. Setting it to a deterministic string means if a client double-submits (network retry, user clicking twice), BullMQ silently deduplicates. You want this. Without it, you'll spend a Friday afternoon tracking down why the same article got generated three times.

What I got wrong: worker concurrency

My initial worker config:

// WRONG — don't do this
const worker = new Worker('content-generation', processJob, {
  connection: redis,
  concurrency: 50, // I thought "more = faster"
});
Enter fullscreen mode Exit fullscreen mode

At 50 concurrent workers hitting OpenAI simultaneously, I was saturating my rate limit constantly. The retries from failed requests caused a feedback loop: more retries → more failures → more retries.

The fix:

// RIGHT — sized to your actual API tier
const worker = new Worker('content-generation', processJob, {
  connection: redis,
  concurrency: 8, // OpenAI Tier 3 gives us ~3,500 RPM across all workers
  limiter: {
    max: 8,
    duration: 1000, // max 8 jobs started per second across all worker instances
  },
});
Enter fullscreen mode Exit fullscreen mode

The limiter option in BullMQ is underused. It's not just about your worker — it's a global rate governor across however many worker instances you're running. With autoscaling, you can have 10 pods all pulling from the same queue. Without the limiter, 10 pods × 50 concurrency = 500 simultaneous LLM calls. That's how you get a $3,000 bill and a support ticket in the same hour.


The orchestration layer

What it does

A "content job" sounds atomic. It isn't. A single blog post request in PostAll decomposes into:

  • Keyword research task (if enabled)
  • Outline generation task
  • Section drafts (one per section, parallelizable)
  • Introduction/conclusion stitching task
  • SEO metadata generation task
  • Quality score evaluation task

The orchestration layer handles this decomposition, tracks task dependencies, and aggregates results into a coherent output.

I built this with a simple DAG (directed acyclic graph) structure rather than reaching for a workflow engine like Temporal or Inngest. For PostAll's use case, a purpose-built DAG was simpler and cheaper to operate. If I were doing multi-tenant workflows with human-in-the-loop steps, I'd reconsider.

// content-dag.ts
type TaskNode = {
  id: string;
  type: 'outline' | 'section' | 'metadata' | 'quality';
  dependsOn: string[];
  input: (context: JobContext) => Promise<TaskInput>;
  execute: (input: TaskInput) => Promise<TaskOutput>;
};

export function buildContentDAG(request: ContentRequest): TaskNode[] {
  const outline: TaskNode = {
    id: 'outline',
    type: 'outline',
    dependsOn: [],
    input: async (ctx) => ({ keyword: request.keyword, tone: request.tone }),
    execute: generateOutline,
  };

  const sections: TaskNode[] = request.sections.map((section, i) => ({
    id: `section-${i}`,
    type: 'section',
    dependsOn: ['outline'],
    input: async (ctx) => ({
      outline: ctx.results['outline'],
      sectionIndex: i,
      sectionTitle: section.title,
    }),
    execute: generateSection,
  }));

  const metadata: TaskNode = {
    id: 'metadata',
    type: 'metadata',
    dependsOn: sections.map((s) => s.id), // needs all sections complete
    input: async (ctx) => ({ fullContent: assembleContent(ctx) }),
    execute: generateMetadata,
  };

  return [outline, ...sections, metadata];
}
Enter fullscreen mode Exit fullscreen mode

The executor resolves the DAG topologically, runs independent nodes in parallel, and passes results through context:

// dag-executor.ts
async function executeDag(nodes: TaskNode[], jobId: string): Promise<JobContext> {
  const context: JobContext = { jobId, results: {}, errors: {} };
  const remaining = [...nodes];

  while (remaining.length > 0) {
    // find nodes whose dependencies are all complete
    const ready = remaining.filter((node) =>
      node.dependsOn.every((dep) => dep in context.results)
    );

    if (ready.length === 0) {
      throw new Error(`DAG deadlock in job ${jobId} — circular dependency or failed prerequisite`);
    }

    // run ready nodes in parallel
    await Promise.allSettled(
      ready.map(async (node) => {
        try {
          const input = await node.input(context);
          context.results[node.id] = await node.execute(input);
        } catch (err) {
          context.errors[node.id] = err;
          // don't throw — let other branches complete, fail job at end
        }
      })
    );

    // remove completed nodes
    ready.forEach((node) => {
      const idx = remaining.indexOf(node);
      remaining.splice(idx, 1);
    });
  }

  return context;
}
Enter fullscreen mode Exit fullscreen mode

The Promise.allSettled choice is intentional. If one section fails, you still want the other sections to complete. You can fail the job at the end with partial results, which is far better than losing all work because one 500-word section errored.


The LLM layer

What it does

Every LLM call in PostAll goes through a single abstraction: the LLMClient. This class handles model selection, prompt construction, retry logic, fallback routing, and cost tracking. Nothing in the codebase calls OpenAI directly.

// llm-client.ts
export class LLMClient {
  private providers: ProviderConfig[];

  constructor(config: LLMClientConfig) {
    this.providers = config.providers; // ordered by preference
  }

  async complete(request: CompletionRequest): Promise<CompletionResponse> {
    for (const provider of this.providers) {
      try {
        const result = await this.callProvider(provider, request);
        await this.trackUsage(provider.name, result.usage);
        return result;
      } catch (err) {
        if (this.isRateLimit(err) || this.isServerError(err)) {
          // try next provider
          continue;
        }
        throw err; // bad request, auth error — don't retry with another provider
      }
    }
    throw new Error('All providers exhausted');
  }

  private async callProvider(
    provider: ProviderConfig,
    request: CompletionRequest
  ): Promise<CompletionResponse> {
    const client = this.getClient(provider.name);
    const response = await client.chat.completions.create({
      model: provider.model,
      messages: this.buildMessages(request),
      max_tokens: request.maxTokens ?? 2000,
      temperature: request.temperature ?? 0.7,
      response_format: request.structured ? { type: 'json_object' } : undefined,
    });

    return {
      content: response.choices[0].message.content ?? '',
      usage: {
        promptTokens: response.usage?.prompt_tokens ?? 0,
        completionTokens: response.usage?.completion_tokens ?? 0,
        provider: provider.name,
        model: provider.model,
      },
    };
  }

  private isRateLimit(err: unknown): boolean {
    return err instanceof Error && err.message.includes('429');
  }

  private isServerError(err: unknown): boolean {
    return err instanceof Error && err.message.includes('5');
  }
}
Enter fullscreen mode Exit fullscreen mode

The fallback routing config

// llm-config.ts
export const llmConfig: LLMClientConfig = {
  providers: [
    {
      name: 'openai',
      model: 'gpt-4o',
      apiKey: process.env.OPENAI_API_KEY!,
    },
    {
      name: 'anthropic',
      model: 'claude-sonnet-4-20250514', // fallback for rate limit hits
      apiKey: process.env.ANTHROPIC_API_KEY!,
    },
  ],
};
Enter fullscreen mode Exit fullscreen mode

Multi-provider fallback sounds complicated. It's mostly just the ordering. The insight that took me too long to reach: don't try to be smart about routing at call time. Keep a priority list, try them in order, move on if one fails. Smart routing (latency-based, cost-based) added complexity with no meaningful real-world gain at my current scale.

Token budgeting

One thing that bit me early: LLM requests for long-form content can exceed context limits in ways that are hard to predict at job creation time. PostAll now estimates token usage before each call:

function estimatePromptTokens(messages: Message[]): number {
  // rough estimate: 1 token ≈ 4 characters for English prose
  const charCount = messages.reduce((sum, m) => sum + m.content.length, 0);
  return Math.ceil(charCount / 4) + 100; // +100 overhead for formatting
}

async function callWithBudget(
  client: LLMClient,
  request: CompletionRequest
): Promise<CompletionResponse> {
  const MODEL_CONTEXT_LIMIT = 128_000; // gpt-4o context window
  const estimatedPrompt = estimatePromptTokens(request.messages);
  const safeMaxTokens = Math.min(
    request.maxTokens ?? 2000,
    MODEL_CONTEXT_LIMIT - estimatedPrompt - 500 // leave buffer
  );

  if (safeMaxTokens < 200) {
    throw new Error('Prompt too long for requested output length — consider chunking');
  }

  return client.complete({ ...request, maxTokens: safeMaxTokens });
}
Enter fullscreen mode Exit fullscreen mode

This saves you from silent truncation — which is the failure mode where the model just... stops mid-sentence because you ran out of context window and you have no idea why.


The formatting engine

What it does

Raw LLM output is text. What clients actually need is structured, formatted content ready to paste into a CMS — proper heading hierarchy, internal links in the right format, meta descriptions within character limits, image alt text suggestions, and so on.

The formatting engine handles this transformation. It runs after every LLM call that produces content (not metadata calls).

// formatter.ts
export type FormatTarget = 'markdown' | 'html' | 'wordpress' | 'contentful-richtext';

export async function formatContent(
  rawContent: string,
  target: FormatTarget,
  options: FormatOptions
): Promise<FormattedContent> {
  // 1. Parse raw text into a content tree
  const tree = parseContentTree(rawContent);

  // 2. Apply structural fixes
  const normalized = normalizeHeadings(tree);         // enforce H2 → H3 hierarchy
  const withKeyword = injectKeywordDensity(normalized, options.keyword, 0.012); // ~1.2%
  const withLinks = applyInternalLinks(withKeyword, options.internalLinkMap);

  // 3. Validate SEO constraints
  validateSeoConstraints(withLinks, {
    minWords: options.minWords ?? 800,
    maxMetaDescLength: 160,
    titleTagRange: [50, 60],
  });

  // 4. Serialize to target format
  return serialize(withLinks, target);
}
Enter fullscreen mode Exit fullscreen mode

The most underrated function in this whole stack is normalizeHeadings. LLMs are inconsistent about heading levels. You'll get H1s inside H3 sections, H4s appearing before H2s, and occasionally no structure at all. A single regex pass doesn't fix it — you need to understand the tree.

function normalizeHeadings(tree: ContentTree): ContentTree {
  let expectedLevel = 2; // PostAll-generated content starts at H2 (H1 is the page title)

  return walkTree(tree, (node) => {
    if (node.type !== 'heading') return node;

    if (node.level < expectedLevel) {
      // heading skipped levels — promote to expected
      return { ...node, level: expectedLevel };
    }

    if (node.level > expectedLevel + 1) {
      // heading jumped down too far — demote to one level below expected
      return { ...node, level: expectedLevel + 1 };
    }

    expectedLevel = node.level;
    return node;
  });
}
Enter fullscreen mode Exit fullscreen mode

What I wish I'd built earlier: a validation layer with human-readable errors

For six months, validation failures surfaced as generic job errors. I'd get a failed job, dig through logs, and eventually find something like ValidationError: heading_hierarchy_violated. Not helpful.

Now every validation failure returns a structured report:

type ValidationReport = {
  passed: boolean;
  issues: Array<{
    severity: 'error' | 'warning';
    code: string;
    message: string;       // human-readable
    location?: string;     // e.g. "Section 2, paragraph 3"
    suggestion?: string;   // what to do about it
  }>;
};
Enter fullscreen mode Exit fullscreen mode

This also feeds back into the LLM layer for automatic retry with targeted instructions: if the meta description is too long, the retry prompt includes "The previous meta description was 184 characters. Rewrite it to be under 160 characters." This reduced formatting-related job failures by 67% over two weeks.


The CMS connectors

What it does

Formatted content needs to go somewhere. PostAll currently supports WordPress (REST API), Contentful, Webflow CMS, and a generic webhook target. Each connector is a thin adapter that translates PostAll's internal FormattedContent type to whatever the target CMS expects.

// connectors/wordpress.ts
export class WordPressConnector implements CMSConnector {
  async publish(content: FormattedContent, config: WordPressConfig): Promise<PublishResult> {
    const payload = {
      title: content.metadata.title,
      content: content.body.html,    // WP expects HTML
      status: config.publishImmediately ? 'publish' : 'draft',
      slug: slugify(content.metadata.title),
      meta: {
        _yoast_wpseo_metadesc: content.metadata.metaDescription,
        _yoast_wpseo_title: content.metadata.seoTitle,
        _yoast_wpseo_focuskw: content.metadata.keyword,
      },
      categories: config.categoryIds ?? [],
      tags: config.tagIds ?? [],
    };

    const response = await fetch(`${config.siteUrl}/wp-json/wp/v2/posts`, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        Authorization: `Basic ${Buffer.from(`${config.username}:${config.appPassword}`).toString('base64')}`,
      },
      body: JSON.stringify(payload),
    });

    if (!response.ok) {
      const error = await response.json();
      throw new CMSConnectorError('wordpress', response.status, error.message);
    }

    const post = await response.json();
    return { contentId: post.id.toString(), url: post.link, status: 'published' };
  }
}
Enter fullscreen mode Exit fullscreen mode

Each connector handles its own error mapping. A 409 from Contentful (slug conflict) is a different recovery path than a 401 (expired auth token). The CMSConnectorError type carries enough context for the job system to decide whether to retry, notify the client, or surface a specific error message.

The webhook target

The most useful connector I built turned out to be the simplest: a generic webhook. When a client doesn't use any of the supported CMS platforms, they give PostAll a URL and we POST formatted content as JSON to it. They handle their own delivery.

// connectors/webhook.ts
export class WebhookConnector implements CMSConnector {
  async publish(content: FormattedContent, config: WebhookConfig): Promise<PublishResult> {
    const payload = {
      version: '1.0',
      timestamp: new Date().toISOString(),
      jobId: content.jobId,
      content: {
        title: content.metadata.title,
        body: {
          markdown: content.body.markdown,
          html: content.body.html,
        },
        metadata: content.metadata,
        seoData: content.seoData,
      },
    };

    // sign the payload so clients can verify it came from PostAll
    const signature = createHmac('sha256', config.signingSecret)
      .update(JSON.stringify(payload))
      .digest('hex');

    const response = await fetch(config.url, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-PostAll-Signature': `sha256=${signature}`,
        'X-PostAll-Event': 'content.published',
      },
      body: JSON.stringify(payload),
    });

    // treat 2xx as success, anything else as failure
    if (!response.ok) {
      throw new CMSConnectorError('webhook', response.status, `Webhook returned ${response.status}`);
    }

    return { contentId: content.jobId, url: config.url, status: 'delivered' };
  }
}
Enter fullscreen mode Exit fullscreen mode

HMAC signing is something I added after the first time a client asked "how do we know this request is actually from you?" Now every webhook delivery is signed with a per-client secret. The verification code on their end is four lines of Python or Node. Worth adding from day one.


The architecture in one diagram

Here's how these five layers connect at runtime for a single content job:

Client Request
      │
      ▼
┌─────────────┐
│  Queue      │  BullMQ / Redis — ingests, prioritizes, handles retries
│  System     │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ Orchestration│  DAG executor — decomposes job into parallel tasks
│   Layer     │
└──────┬──────┘
       │  (multiple parallel tasks)
       ▼
┌─────────────┐
│   LLM       │  Multi-provider client — OpenAI primary, Anthropic fallback
│   Layer     │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Formatting │  Heading normalization, SEO validation, keyword density
│   Engine    │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│    CMS      │  WordPress / Contentful / Webflow / Webhook
│ Connectors  │
└─────────────┘
Enter fullscreen mode Exit fullscreen mode

Each box is a separate module with its own error handling and retry budget. The queue system is the only layer that knows about job persistence — everything below it is stateless.


What I'd do differently

1. I would have built the formatting engine first.

The LLM layer gets all the attention because models are interesting. But the formatting engine is where most client complaints come from. Heading was wrong, meta description was too long, link format didn't match their CMS. Investing in formatting validation early would have saved two months of reactive fixes.

2. I would have instrumented the DAG execution from day one.

Right now I can tell you how long individual LLM calls take. For the first six months, I couldn't tell you how long a job took end-to-end, or which tasks inside the DAG were the bottleneck. Turns out: metadata generation was consistently the slowest step, not section drafting. I only discovered this when I added distributed tracing. It was a 15-minute fix once I knew where to look.

3. The multi-provider fallback is table stakes, not a "nice-to-have."

I treated provider fallback as a future optimization. During an OpenAI incident in early production, I had 40 jobs stuck with no fallback, clients emailing me, and nothing to do but wait. Anthropic as a fallback target took me a few hours to wire in. It should have been there before the first paying customer.


Where PostAll is now

The current pipeline processes around 800 content jobs per day with a p95 job latency of 94 seconds (from queue entry to CMS delivery). The LLM layer accounts for about 78% of that time. The formatting engine accounts for about 14%. The queue system and CMS connectors split the rest.

Error rate over the last 30 days: 1.3% of jobs require manual intervention. About 40% of those are auth issues on the client's CMS (expired credentials, changed endpoints). The other 60% are LLM-layer failures that exhaust retries — usually structured output that doesn't parse correctly on the first three attempts.

There's a lot still to build. Quality scoring is currently rule-based (reading level, keyword density, heading count). The next version will use a lightweight model eval layer to assess coherence and factual consistency. That's a different post.


If you're building something similar — a content pipeline, a document generation system, anything that orchestrates LLM calls at scale — I'm curious what your DAG execution looks like. Did you use a workflow engine, or roll your own? The tradeoffs there are non-obvious and I'd genuinely like to hear what others landed on.


The complete source for PostAll's connector layer is on GitHub at github.com/postall-platform/connectors. The LLM client abstraction is published as a standalone npm package: @postall-platform/llm-client. Drop a star if any of this was useful — it's the best signal I have for what to write about next.

Top comments (0)