I built a content generation pipeline that turns a raw topic into a complete blog post with SEO metadata. It uses Oxlo.ai's flat per-request API, so I can stuff long style guides into the context without the bill scaling by token count. Here is exactly how I wired it together.
What you'll need
- Python 3.10 or newer
- An Oxlo.ai API key from https://portal.oxlo.ai
- The OpenAI SDK:
pip install openai
Step 1: Initialize the client and system prompt
I start by pinning the system prompt. This keeps every section consistent, which matters when you are generating an article across multiple requests.
from openai import OpenAI
SYSTEM_PROMPT = """You are an expert B2B technical content writer.
Follow these rules strictly:
- Write short, scannable paragraphs.
- Favor concrete examples over adjectives.
- Do not use em-dashes or en-dashes; use commas, periods, or hyphens instead.
- When asked for JSON, output only valid JSON."""
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
Step 2: Generate a structured outline
I ask Llama 3.3 70B to build the skeleton first. Separating outline from drafting lets me review the narrative before burning requests on full sections.
def generate_outline(topic):
user_message = (
f"Topic: {topic}\n\n"
"Create a blog outline with exactly 5 sections. "
"For each section provide a headline and 2 bullets describing what to cover. "
"Use plain text, not markdown headers."
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content
topic = "Reducing LLM inference costs with request-based pricing"
outline = generate_outline(topic)
print(outline)
Step 3: Expand the outline into article sections
I parse the outline and call the model once per section. Because Oxlo.ai uses request-based pricing, the cost is flat per section even if my bullets run long. For a different voice, you can swap in qwen-3-32b or kimi-k2.6 without changing any other code.
import re
def parse_outline(outline_text):
sections = []
current_title = None
current_bullets = []
for line in outline_text.splitlines():
line = line.strip()
if not line:
continue
if re.match(r"^\d+[\.\)]", line):
if current_title:
sections.append((current_title, " ".join(current_bullets)))
parts = line.split(".", 1)
if len(parts) == 2:
current_title = parts[1].strip()
else:
current_title = line
current_bullets = []
elif line.startswith("-"):
current_bullets.append(line.lstrip("- ").strip())
if current_title:
sections.append((current_title, " ".join(current_bullets)))
return sections
def draft_section(title, guidance):
user_message = (
f"Write a blog section with the headline: {title}\n"
f"Guidance: {guidance}\n\n"
"Write 120 to 150 words. Use a practical, technical tone."
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content
sections = parse_outline(outline)
drafts = [draft_section(title, guidance) for title, guidance in sections]
full_article = "\n\n".join(
f"## {title}\n\n{body}" for title, body in zip([s[0] for s in sections], drafts)
)
print(full_article)
Step 4: Generate SEO metadata with JSON mode
I switch to Qwen 3 32B for the structured extraction and force JSON mode. The OpenAI SDK passes the response_format parameter straight through on Oxlo.ai.
import json
def generate_seo_metadata(article_text, topic):
user_message = (
f"Topic: {topic}\n\n"
f"Article:\n{article_text[:2000]}\n\n"
"Return a JSON object with keys: title (50-60 chars), "
"meta_description (150-160 chars), keywords (array of 5 strings)."
)
response = client.chat.completions.create(
model="qwen-3-32b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)
seo = generate_seo_metadata(full_article, topic)
print(json.dumps(seo, indent=2))
Step 5: Assemble and save the final package
I bundle the SEO metadata and the article into one JSON file so the next step in my pipeline can pick it up without parsing markdown.
import datetime
def assemble_package(seo, article):
return {
"generated_at": datetime.datetime.utcnow().isoformat() + "Z",
"seo": seo,
"article_markdown": article,
}
package = assemble_package(seo, full_article)
with open("content_package.json", "w") as f:
json.dump(package, f, indent=2)
print("Saved content_package.json")
Run it
This is the entrypoint I actually run. It chains the steps together and writes the result to disk.
if __name__ == "__main__":
topic = "Reducing LLM inference costs with request-based pricing"
print("Generating outline...")
outline = generate_outline(topic)
print(outline)
print("\nDrafting sections...")
sections = parse_outline(outline)
drafts = [draft_section(title, guidance) for title, guidance in sections]
full_article = "\n\n".join(
f"## {title}\n\n{body}" for title, body in zip([s[0] for s in sections], drafts)
)
print("\nGenerating SEO metadata...")
seo = generate_seo_metadata(full_article, topic)
print("\nAssembling package...")
package = assemble_package(seo, full_article)
with open("content_package.json", "w") as f:
json.dump(package, f, indent=2)
print("Done. See content_package.json")
Example output:
Generating outline...
1. The Hidden Cost of Token-Based Billing
- How per-token pricing penalizes long system prompts and few-shot examples
- Real-world scenarios where input tokens dominate the bill
2. How Request-Based Pricing Works
- Flat rate per API call regardless of prompt length
- Why this model favors agents and retrieval-augmented generation
...
Drafting sections...
## The Hidden Cost of Token-Based Billing
Most inference providers bill by the token. That means every paragraph of context you add, every few-shot example, and every retrieved document nudges the cost upward...
Generating SEO metadata...
{
"title": "Cut LLM Costs with Request-Based Inference Pricing",
"meta_description": "Learn how request-based LLM pricing replaces unpredictable token bills with flat per-call costs, ideal for long-context and agentic workloads.",
"keywords": [
"LLM pricing",
"request-based inference",
"AI cost optimization",
"token billing alternatives",
"Oxlo.ai"
]
}
Assembling package...
Done. See content_package.json
Wrap-up
Two concrete ways to push this further. First, hook the script into a GitHub Action or CMS webhook so publishing a new topic automatically queues a draft. Second, experiment with kimi-k2.6 for reasoning-heavy technical white papers, or use deepseek-v3.2 on the Oxlo.ai free tier to keep early experiments at zero cost. You can compare plans at https://oxlo.ai/pricing.
Top comments (0)