DEV Community

shashank ms
shashank ms

Posted on

Unlocking LLM Potential for Content Generation

I built a content generation pipeline that turns a raw topic into a complete blog post with SEO metadata. It uses Oxlo.ai's flat per-request API, so I can stuff long style guides into the context without the bill scaling by token count. Here is exactly how I wired it together.

What you'll need

Step 1: Initialize the client and system prompt

I start by pinning the system prompt. This keeps every section consistent, which matters when you are generating an article across multiple requests.

from openai import OpenAI

SYSTEM_PROMPT = """You are an expert B2B technical content writer.
Follow these rules strictly:
- Write short, scannable paragraphs.
- Favor concrete examples over adjectives.
- Do not use em-dashes or en-dashes; use commas, periods, or hyphens instead.
- When asked for JSON, output only valid JSON."""

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

Step 2: Generate a structured outline

I ask Llama 3.3 70B to build the skeleton first. Separating outline from drafting lets me review the narrative before burning requests on full sections.

def generate_outline(topic):
    user_message = (
        f"Topic: {topic}\n\n"
        "Create a blog outline with exactly 5 sections. "
        "For each section provide a headline and 2 bullets describing what to cover. "
        "Use plain text, not markdown headers."
    )
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

topic = "Reducing LLM inference costs with request-based pricing"
outline = generate_outline(topic)
print(outline)

Step 3: Expand the outline into article sections

I parse the outline and call the model once per section. Because Oxlo.ai uses request-based pricing, the cost is flat per section even if my bullets run long. For a different voice, you can swap in qwen-3-32b or kimi-k2.6 without changing any other code.

import re

def parse_outline(outline_text):
    sections = []
    current_title = None
    current_bullets = []
    for line in outline_text.splitlines():
        line = line.strip()
        if not line:
            continue
        if re.match(r"^\d+[\.\)]", line):
            if current_title:
                sections.append((current_title, " ".join(current_bullets)))
            parts = line.split(".", 1)
            if len(parts) == 2:
                current_title = parts[1].strip()
            else:
                current_title = line
            current_bullets = []
        elif line.startswith("-"):
            current_bullets.append(line.lstrip("- ").strip())
    if current_title:
        sections.append((current_title, " ".join(current_bullets)))
    return sections

def draft_section(title, guidance):
    user_message = (
        f"Write a blog section with the headline: {title}\n"
        f"Guidance: {guidance}\n\n"
        "Write 120 to 150 words. Use a practical, technical tone."
    )
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

sections = parse_outline(outline)
drafts = [draft_section(title, guidance) for title, guidance in sections]
full_article = "\n\n".join(
    f"## {title}\n\n{body}" for title, body in zip([s[0] for s in sections], drafts)
)
print(full_article)

Step 4: Generate SEO metadata with JSON mode

I switch to Qwen 3 32B for the structured extraction and force JSON mode. The OpenAI SDK passes the response_format parameter straight through on Oxlo.ai.

import json

def generate_seo_metadata(article_text, topic):
    user_message = (
        f"Topic: {topic}\n\n"
        f"Article:\n{article_text[:2000]}\n\n"
        "Return a JSON object with keys: title (50-60 chars), "
        "meta_description (150-160 chars), keywords (array of 5 strings)."
    )
    response = client.chat.completions.create(
        model="qwen-3-32b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

seo = generate_seo_metadata(full_article, topic)
print(json.dumps(seo, indent=2))

Step 5: Assemble and save the final package

I bundle the SEO metadata and the article into one JSON file so the next step in my pipeline can pick it up without parsing markdown.

import datetime

def assemble_package(seo, article):
    return {
        "generated_at": datetime.datetime.utcnow().isoformat() + "Z",
        "seo": seo,
        "article_markdown": article,
    }

package = assemble_package(seo, full_article)
with open("content_package.json", "w") as f:
    json.dump(package, f, indent=2)

print("Saved content_package.json")

Run it

This is the entrypoint I actually run. It chains the steps together and writes the result to disk.

if __name__ == "__main__":
    topic = "Reducing LLM inference costs with request-based pricing"
    
    print("Generating outline...")
    outline = generate_outline(topic)
    print(outline)
    
    print("\nDrafting sections...")
    sections = parse_outline(outline)
    drafts = [draft_section(title, guidance) for title, guidance in sections]
    full_article = "\n\n".join(
        f"## {title}\n\n{body}" for title, body in zip([s[0] for s in sections], drafts)
    )
    
    print("\nGenerating SEO metadata...")
    seo = generate_seo_metadata(full_article, topic)
    
    print("\nAssembling package...")
    package = assemble_package(seo, full_article)
    with open("content_package.json", "w") as f:
        json.dump(package, f, indent=2)
    
    print("Done. See content_package.json")

Example output:

Generating outline...
1. The Hidden Cost of Token-Based Billing
   - How per-token pricing penalizes long system prompts and few-shot examples
   - Real-world scenarios where input tokens dominate the bill

2. How Request-Based Pricing Works
   - Flat rate per API call regardless of prompt length
   - Why this model favors agents and retrieval-augmented generation

...

Drafting sections...
## The Hidden Cost of Token-Based Billing

Most inference providers bill by the token. That means every paragraph of context you add, every few-shot example, and every retrieved document nudges the cost upward...

Generating SEO metadata...
{
  "title": "Cut LLM Costs with Request-Based Inference Pricing",
  "meta_description": "Learn how request-based LLM pricing replaces unpredictable token bills with flat per-call costs, ideal for long-context and agentic workloads.",
  "keywords": [
    "LLM pricing",
    "request-based inference",
    "AI cost optimization",
    "token billing alternatives",
    "Oxlo.ai"
  ]
}

Assembling package...
Done. See content_package.json

Wrap-up

Two concrete ways to push this further. First, hook the script into a GitHub Action or CMS webhook so publishing a new topic automatically queues a draft. Second, experiment with kimi-k2.6 for reasoning-heavy technical white papers, or use deepseek-v3.2 on the Oxlo.ai free tier to keep early experiments at zero cost. You can compare plans at https://oxlo.ai/pricing.

Top comments (0)