shashank ms

Posted on Jun 28

Engineering LLM Solutions for Enhanced Content Creation

#engineering #oxlo #ai

We are building a Content Studio Agent that turns a raw topic brief into a full blog post, SEO metadata, and social media snippets. It chains three specialized Oxlo.ai models into a pipeline so that each stage does one job well, and because Oxlo.ai charges a flat rate per request instead of per token, long outlines and drafts do not inflate your bill. If you are automating content workflows, this is a concrete system you can run today and extend tomorrow.

What you'll need

Python 3.10 or newer
The openai SDK (pip install openai)
An Oxlo.ai API key from https://portal.oxlo.ai

I also recommend reviewing https://oxlo.ai/pricing. Because Oxlo.ai uses request-based pricing, running a multi-step pipeline with large context windows costs the same per call whether you send two sentences or two thousand tokens. That predictability matters when you are shipping agentic workflows.

Step 1: Configure the Oxlo.ai client

The OpenAI SDK is a drop-in replacement for Oxlo.ai. I instantiate the client once and reuse it across every stage of the pipeline.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY", "YOUR_OXLO_API_KEY")
)

Step 2: Define the content strategist system prompt

This system prompt anchors the planning stage. It forces the model to return strict JSON that the rest of the pipeline can consume without fragile string parsing.

SYSTEM_PROMPT = """You are a senior content strategist.
Given a brief, return a JSON object with exactly these keys:
- title: a compelling H1 title
- sections: an array of section objects, each with "heading" and "bullet_points"
- keywords: an array of 5 SEO keywords
- meta_description: a 160-character summary

Be concise. Do not wrap the JSON in markdown fences."""

Step 3: Generate the structured outline with JSON mode

I use Kimi K2.6 for planning because its reasoning and long-context window handle detailed outlines reliably. Enabling JSON mode guarantees valid output we can parse with json.loads.

import json

def generate_outline(brief: str) -> dict:
    response = client.chat.completions.create(
        model="kimi-k2.6",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": brief},
        ],
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

brief = "Write an article about how request-based LLM pricing reduces cost for multi-step agent pipelines."
outline = generate_outline(brief)
print(json.dumps(outline, indent=2))

Step 4: Expand the outline into a full markdown draft

With the outline in hand, we pass it to Llama 3.3 70B to write the prose. Because Oxlo.ai bills per request, sending the entire outline back into context costs the same as a short ping, which makes this expansion step cheap even when the outline is long.

DRAFT_PROMPT = """You are a technical writer.
Expand the provided outline into a complete markdown blog post.
Write in a plain, practical voice. Include the title as an H1.
Return only the article markdown."""

def generate_draft(outline: dict) -> str:
    user_msg = f"Outline:\n{json.dumps(outline)}\n\nWrite the full article."
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": DRAFT_PROMPT},
            {"role": "user", "content": user_msg},
        ],
    )
    return response.choices[0].message.content

draft = generate_draft(outline)
print(draft[:500])

Step 5: Generate social snippets

Finally, we derive Twitter and LinkedIn snippets from the draft. I use Qwen 3 32B here because it handles concise, agentic text transformations well.

SOCIAL_PROMPT = """You are a social media editor.
Given a blog post, return a JSON object with:
- twitter: one thread hook under 280 characters
- linkedin: one professional post under 150 words

Return only the JSON."""

def generate_snippets(draft: str) -> dict:
    response = client.chat.completions.create(
        model="qwen-3-32b",
        messages=[
            {"role": "system", "content": SOCIAL_PROMPT},
            {"role": "user", "content": draft[:4000]},
        ],
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

snippets = generate_snippets(draft)
print(json.dumps(snippets, indent=2))

Step 6: Assemble the pipeline

Now we wire the three stages into a single callable agent. Running the full flow costs three requests on Oxlo.ai, regardless of how long the brief or draft becomes.

class ContentStudioAgent:
    def __init__(self, client):
        self.client = client

    def run(self, brief: str) -> dict:
        outline = generate_outline(brief)
        draft = generate_draft(outline)
        snippets = generate_snippets(draft)
        return {
            "outline": outline,
            "draft": draft,
            "snippets": snippets,
        }

agent = ContentStudioAgent(client)
result = agent.run(brief)

print("=== TITLE ===")
print(result["outline"]["title"])
print("\n=== DRAFT PREVIEW ===")
print(result["draft"][:800])
print("\n=== SOCIAL ===")
print(json.dumps(result["snippets"], indent=2))

Run it

Save the script as content_agent.py, set your OXLO_API_KEY, and run python content_agent.py. Here is what the output looks like for the brief above.

=== TITLE ===
How Per-Request LLM Pricing Cuts Agent Pipeline Costs

=== DRAFT PREVIEW ===
# How Per-Request LLM Pricing Cuts Agent Pipeline Costs

Most teams building LLM agents quickly notice that token bills scale with context length...

=== SOCIAL ===
{
  "twitter": "Token costs killing your agent pipeline? Flat per-request pricing changes the math entirely.",
  "linkedin": "We rebuilt our content agent on a flat per-request provider and eliminated the surprise token bills that come with long outlines and multi-step reasoning."
}

If you hit rate limits on the free tier, upgrade to Pro or wait for the daily reset. The 7-day full-access trial on Oxlo.ai is enough to test this pipeline end-to-end.

Next steps

Try exposing the agent through a FastAPI endpoint so your CMS can call it directly, or add an image generation stage with Oxlo.ai Image Pro to produce a hero banner for each article. Both extensions stay predictable under request-based pricing because even large image payloads do not inflate the inference cost.

DEV Community