LLM for Text Analysis: A Comprehensive Overview of Natural Language Processing

#learnai #oxlo #ai

We are building a batch text analysis pipeline that turns unstructured customer feedback into structured JSON. It extracts sentiment, topics, action items, and a one-line summary in a single pass. I run this on Oxlo.ai because flat per-request pricing means long transcripts do not inflate the cost, and the OpenAI SDK compatibility lets me deploy without changing existing code.

What you'll need

Python 3.10 or newer
The OpenAI SDK: pip install openai
An Oxlo.ai API key from https://portal.oxlo.ai

If you have not signed up yet, grab a key from the portal. Oxlo.ai offers a free tier with 60 requests per day, which is enough to prototype this pipeline.

Step 1: Configure the Oxlo.ai client

I instantiate the client exactly like the OpenAI SDK, but I point it at Oxlo.ai. A quick health check confirms the key is live.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Say OK"}
    ],
)

print(response.choices[0].message.content)

Step 2: Define the system prompt and schema

The system prompt is the contract. It forces the model to return only JSON and defines the exact keys I expect. I keep it strict so downstream parsing never breaks.

SYSTEM_PROMPT = """You are a text analysis engine. Analyze the user-provided text and return a single JSON object with exactly these keys:

- sentiment: one of "positive", "neutral", or "negative"
- topics: an array of up to 5 short topic tags
- action_items: an array of specific, actionable tasks implied by the text, or an empty array if none
- summary: one concise sentence summarizing the text

Rules:
1. Output valid JSON only. No markdown, no explanation.
2. Do not exceed the requested structure.
3. Be concise."""

Step 3: Build the analysis function

I wrap the call in a reusable function. I use JSON mode to lock the output format, then parse the result with the standard library.

import json

def analyze_text(raw_text: str) -> dict:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": raw_text},
        ],
        response_format={"type": "json_object"}
    )

    content = response.choices[0].message.content
    return json.loads(content)

Step 4: Process a batch of documents

In production these strings come from a database or CSV. For the tutorial I use a hardcoded list of feedback snippets and collect the parsed results.

documents = [
    "The onboarding flow is smooth, but the export feature crashes every time I select PDF. Please fix this before our audit next week.",
    "Love the new dark mode. No issues so far, just wanted to say great work to the team.",
    "Pricing is confusing. I cannot tell if the Pro tier includes API access. Also, the docs page for billing returns a 404."
]

results = []
for doc in documents:
    parsed = analyze_text(doc)
    results.append(parsed)
    print(json.dumps(parsed, indent=2))

Step 5: Aggregate and export

JSON is great for machines, but most teams want a spreadsheet. I flatten the arrays and write a CSV.

import pandas as pd

flat_results = []
for r in results:
    flat_results.append({
        "sentiment": r["sentiment"],
        "topics": "; ".join(r["topics"]),
        "action_items": "; ".join(r["action_items"]) if r["action_items"] else "None",
        "summary": r["summary"]
    })

df = pd.DataFrame(flat_results)
df.to_csv("analysis_output.csv", index=False)
print(df)

Run it

Save the complete script as analyzer.py and run it. The block below shows the entry point I use and the exact output.

if __name__ == "__main__":
    for doc in documents:
        print("=" * 50)
        print("INPUT:", doc[:60] + "...")
        out = analyze_text(doc)
        print("OUTPUT:", json.dumps(out, indent=2))

Terminal output:

==================================================
INPUT: The onboarding flow is smooth, but the export feature cras...
OUTPUT: {
  "sentiment": "negative",
  "topics": ["export", "PDF", "crash", "onboarding", "audit"],
  "action_items": ["Fix PDF export crash before audit next week"],
  "summary": "User reports smooth onboarding but consistent PDF export crashes."
}
==================================================
INPUT: Love the new dark mode. No issues so far, just wanted to ...
OUTPUT: {
  "sentiment": "positive",
  "topics": ["dark mode", "UI", "feedback"],
  "action_items": [],
  "summary": "User praises the new dark mode and reports no issues."
}
==================================================
INPUT: Pricing is confusing. I cannot tell if the Pro tier includ...
OUTPUT: {
  "sentiment": "negative",
  "topics": ["pricing", "API access", "documentation", "billing"],
  "action_items": ["Clarify Pro tier API access in pricing page", "Fix 404 on billing docs page"],
  "summary": "User finds pricing confusing and encounters a 404 on billing documentation."
}

Wrap up and next steps

This pipeline is now a reusable component you can drop into any ETL or support workflow. Because Oxlo.ai uses request-based pricing, you can feed it multi-page transcripts or long artifacts without the cost ballooning the way token-based providers charge.

Two concrete next steps: schedule this script as a cron job that polls a support mailbox every hour, or swap in kimi-k2.6 to analyze documents up to 131k tokens in a single request. For current plans and pricing, see https://oxlo.ai/pricing.