shashank ms

Posted on Jul 4

LLM for Media: Opportunities and Challenges

#learnai #oxlo #ai

Media monitoring teams lose hours each day reading full articles to extract sentiment, entities, and narrative shifts. In this tutorial, we will build a media intelligence agent that takes raw article text and returns structured JSON analysis, then scales to batch and comparative workflows. I run this stack on Oxlo.ai because request-based pricing means a 3,000-word investigative piece costs the same as a headline, which flattens costs for long-form media workloads.

What you'll need

Python 3.10 or newer
The OpenAI SDK: pip install openai
An Oxlo.ai API key from https://portal.oxlo.ai

Step 1: Set up the Oxlo.ai client

Instantiate the OpenAI-compatible client pointing at Oxlo.ai and verify connectivity with a low-token ping. This confirms your key and network path are healthy before we send full articles.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY", "YOUR_OXLO_API_KEY")
)

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Reply 'ready' only."}],
    max_tokens=10
)
assert "ready" in response.choices[0].message.content.lower()
print("Client authenticated and responsive.")

Step 2: Define the media analyst system prompt

The system prompt is the only training the agent gets. It constrains Llama 3.3 70B to return strict JSON and defines the analysis dimensions. Tune these fields for your own news vertical.

SYSTEM_PROMPT = """You are a media intelligence analyst. Analyze the article provided by the user and return ONLY a valid JSON object with no markdown formatting. Use this exact schema:

{
  "headline_summary": "One sentence summarizing the core news.",
  "sentiment": "positive | neutral | negative",
  "key_entities": ["Entity Name", "..."],
  "topics": ["topic tag", "..."],
  "bias_flag": "left | center | right | unclear",
  "confidence": "high | medium | low",
  "actionable_insight": "One sentence on why this matters to a media monitor."
}

Rules:
- Do not include preamble or explanation outside the JSON.
- If the article is an opinion piece, note that in topics.
- Base bias_flag solely on lexical cues and framing, not your prior knowledge of the outlet."""

Step 3: Ingest and structure a single article

Raw text needs clear delimiters so the model does not confuse instructions with content. This helper wraps the article and sets a low temperature to keep outputs deterministic.

def analyze_article(article_text: str) -> str:
    user_message = f"""ARTICLE TEXT START
{article_text}
ARTICLE TEXT END

Analyze the above article according to your instructions. Return only valid JSON."""

    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
        temperature=0.2,
        max_tokens=1024
    )
    return response.choices[0].message.content

Step 4: Parse structured JSON output

Models occasionally wrap JSON in markdown fences. We strip those and parse with the standard library so downstream code gets a clean dict.

import json

def parse_analysis(raw_response: str) -> dict:
    cleaned = raw_response.strip()
    if cleaned.startswith("

```json"):
        cleaned = cleaned.split("```

json", 1)[-1]
    if cleaned.endswith("

```"):
        cleaned = cleaned.rsplit("```

", 1)[0]
    cleaned = cleaned.strip()
    return json.loads(cleaned)

Step 5: Batch process a feed of articles

Real workflows handle multiple sources at once. We loop over a list of article dicts, collect parsed results, and attach metadata. Because Oxlo.ai charges per request rather than per token, running ten long articles is predictable.

def batch_process(articles: list[dict]) -> list[dict]:
    results = []
    for art in articles:
        raw = analyze_article(art["text"])
        parsed = parse_analysis(raw)
        parsed["source"] = art.get("source", "unknown")
        parsed["slug"] = art.get("slug", "untitled")
        results.append(parsed)
    return results

Step 6: Cross-reference coverage for narrative drift

To detect how different outlets frame the same story, we feed two articles into Kimi K2.6 and ask for a structured comparison. Its 131K context window handles long-form pieces easily on Oxlo.ai.

def compare_coverage(article_a: str, article_b: str) -> dict:
    prompt = f"""Compare the following two articles covering the same event. Return ONLY valid JSON.

ARTICLE A:
{article_a}

ARTICLE B:
{article_b}

JSON schema:
{{
  "narrative_overlap": "What both agree on.",
  "narrative_divergence": "Where they differ in facts or emphasis.",
  "tone_delta": "How the sentiment or framing shifts between A and B.",
  "missing_in_a": "Key details present in B but not A.",
  "missing_in_b": "Key details present in A but not B."
}}"""

    response = client.chat.completions.create(
        model="kimi-k2.6",
        messages=[
            {"role": "system", "content": "You are a media comparison engine. Return only valid JSON."},
            {"role": "user", "content": prompt},
        ],
        temperature=0.3,
        max_tokens=1200
    )
    raw = response.choices[0].message.content
    cleaned = raw.strip()
    if cleaned.startswith("

```json"):
        cleaned = cleaned.split("```

json", 1)[-1]
    if cleaned.endswith("

```"):
        cleaned = cleaned.rsplit("```

", 1)[0]
    return json.loads(cleaned.strip())

Run it

The snippet below defines two short articles about a fictional semiconductor announcement, then runs both the batch analyzer and the coverage comparator.

if __name__ == "__main__":
    articles = [
        {
            "slug": "techcrunch-chip-announcement",
            "source": "TechCrunch",
            "text": "NovaChip unveiled its 3nm Arctic processor today, claiming 40% better efficiency than rivals. CEO Jane Doe called the launch a 'turning point for edge AI.' Investors pushed the stock up 8% in early trading, though analysts warned supply-chain constraints could delay mass production until Q3."
        },
        {
            "slug": "policy-daily-chip-reaction",
            "source": "Policy Daily",
            "text": "Regulators are scrutinizing NovaChip's 3nm Arctic processor over export-control implications. The announcement emphasized military-grade encryption features, prompting concerns from trade officials. While the technology is impressive, lawmakers may block sales in key Asian markets, damping the firm's bullish revenue projections."
        }
    ]

    print("=== BATCH ANALYSIS ===")
    for row in batch_process(articles):
        print(json.dumps(row, indent=2))

    print("\n=== COVERAGE COMPARISON ===")
    comp = compare_coverage(articles[0]["text"], articles[1]["text"])
    print(json.dumps(comp, indent=2))

Example output:

=== BATCH ANALYSIS ===
{
  "headline_summary": "NovaChip launches 3nm Arctic processor with efficiency gains.",
  "sentiment": "positive",
  "key_entities": ["NovaChip", "Jane Doe", "Arctic processor"],
  "topics": ["semiconductor", "AI hardware", "earnings"],
  "bias_flag": "center",
  "confidence": "high",
  "actionable_insight": "Watch supply-chain commentary for Q3 guidance revisions.",
  "source": "TechCrunch",
  "slug": "techcrunch-chip-announcement"
}
{
  "headline_summary": "NovaChip faces regulatory scrutiny over military encryption features in new chip.",
  "sentiment": "negative",
  "key_entities": ["NovaChip", "Arctic processor", "trade officials"],
  "topics": ["semiconductor", "export controls", "policy"],
  "bias_flag": "center",
  "confidence": "high",
  "actionable_insight": "Monitor legislative blocks in Asian markets for revenue impact.",
  "source": "Policy Daily",
  "slug": "policy-daily-chip-reaction"
}

=== COVERAGE COMPARISON ===
{
  "narrative_overlap": "Both acknowledge NovaChip's 3nm Arctic processor announcement.",
  "narrative_divergence": "TechCrunch focuses on efficiency and investor reaction, while Policy Daily centers on regulatory and military implications.",
  "tone_delta": "TechCrunch is cautiously optimistic, whereas Policy Daily is wary due to policy risk.",
  "missing_in_a": "Export-control scrutiny and military-grade encryption details.",
  "missing_in_b": "Specific efficiency metrics, CEO quote, and stock price movement."
}

Wrap-up and next steps

Two concrete ways to extend this. First, persist daily results to SQLite so you can trend sentiment and entity frequency over time. Second, add Oxlo.ai's embeddings endpoint to cluster articles by semantic similarity, which makes deduplication and topic grouping automatic.

DEV Community