shashank ms

Posted on Jun 17

Using LLM for Sentiment Analysis

#learnai #oxlo #ai

We are going to build a batch sentiment analysis pipeline that classifies customer feedback into structured JSON. It is useful for support teams and product managers who need to track opinion trends across thousands of unstructured text snippets without maintaining a custom NLP model.

What you'll need

Python 3.10 or newer, the OpenAI SDK, and an API key from https://portal.oxlo.ai. Oxlo.ai uses request-based pricing, so long reviews or multi-turn prompts do not inflate your cost. See https://oxlo.ai/pricing for details. Install the SDK:

pip install openai

Step 1: Initialize the client

Start by creating an OpenAI client pointed at Oxlo.ai. I keep my key in an environment variable, but you can paste it directly for local testing.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.getenv("OXLO_API_KEY", "YOUR_OXLO_API_KEY")
)

# Quick connectivity check
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Say OK"}],
    max_tokens=10
)
print(response.choices[0].message.content)

Step 2: Design the system prompt

The system prompt locks the model into a strict JSON schema. I ask for a sentiment label, a confidence score, and the specific phrases that drove the decision. This keeps the output predictable and easy to parse.

SYSTEM_PROMPT = """You are a sentiment analysis engine. Analyze the user-provided text and respond with a single JSON object containing exactly these keys:
- sentiment: one of "positive", "negative", "neutral", or "mixed"
- confidence: a float between 0.0 and 1.0
- reasoning: one sentence explaining why the text received that label
- key_phrases: an array of up to three verbatim text snippets that most influenced the sentiment

Rules:
- Do not include markdown code fences.
- Do not add extra keys or commentary outside the JSON object.
- If the text is ambiguous, label it "mixed" and lower the confidence accordingly."""

Step 3: Build the analyzer function

Wrap the API call in a helper that sends one piece of text and returns a Python dict. I use llama-3.3-70b here because it follows structured instructions reliably. If you are testing, deepseek-v3.2 is also available on Oxlo.ai and includes a free tier.

import json

def analyze_sentiment(text: str) -> dict:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": text},
        ],
        temperature=0.1,
        max_tokens=256,
    )
    
    raw = response.choices[0].message.content.strip()
    # Some models may return the JSON with trailing punctuation
    if raw.endswith("."):
        raw = raw[:-1]
    return json.loads(raw)

Step 4: Prepare sample data

I created a short list of fake app-store reviews to simulate a real workload. In production this would come from a CSV, database, or support ticket API.

reviews = [
    "The new dashboard is incredible. I can finally see all my metrics in one place.",
    "It crashes every time I try to export a PDF. Completely unusable.",
    "Login page loads fine. Nothing special, nothing broken.",
    "Great customer support, but the onboarding flow is confusing and took too long.",
    "I waited three days for a response. Horrible experience.",
]

Step 5: Run batch analysis

Loop over the reviews, call the analyzer for each, and collect the results. I add a small delay between requests out of politeness, though Oxlo.ai has no cold starts on popular models so the throughput is already high.

import time

results = []
for review in reviews:
    try:
        result = analyze_sentiment(review)
        result["source_text"] = review
        results.append(result)
        time.sleep(0.5)
    except json.JSONDecodeError as e:
        print(f"Failed to parse response for: {review[:50]}... Error: {e}")

# Pretty-print the structured output
for r in results:
    print(f"{r['sentiment']:10} | confidence: {r['confidence']:.2f} | {r['reasoning']}")

Step 6: Aggregate and report

Raw JSON is useful, but stakeholders usually want counts and percentages. This small aggregation block summarizes the batch.

from collections import Counter

sentiment_counts = Counter(r["sentiment"] for r in results)
total = len(results)

print("\n--- Sentiment Summary ---")
for label in ["positive", "negative", "neutral", "mixed"]:
    count = sentiment_counts.get(label, 0)
    pct = (count / total) * 100
    print(f"{label.capitalize():10}: {count} ({pct:.1f}%)")

avg_confidence = sum(r["confidence"] for r in results) / total
print(f"\nAverage confidence: {avg_confidence:.2f}")

# Show negative items for immediate triage
negatives = [r for r in results if r["sentiment"] == "negative"]
if negatives:
    print("\n--- Negative items requiring attention ---")
    for item in negatives:
        print(f"- {item['reasoning']} (text: {item['source_text'][:60]}...)")

Run it

Save everything in a file named sentiment.py, export your key, and run it. Here is the complete script followed by the expected output.

import os
import json
import time
from collections import Counter
from openai import OpenAI

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.getenv("OXLO_API_KEY", "YOUR_OXLO_API_KEY")
)

SYSTEM_PROMPT = """You are a sentiment analysis engine. Analyze the user-provided text and respond with a single JSON object containing exactly these keys:
- sentiment: one of "positive", "negative", "neutral", or "mixed"
- confidence: a float between 0.0 and 1.0
- reasoning: one sentence explaining why the text received that label
- key_phrases: an array of up to three verbatim text snippets that most influenced the sentiment

Rules:
- Do not include markdown code fences.
- Do not add extra keys or commentary outside the JSON object.
- If the text is ambiguous, label it "mixed" and lower the confidence accordingly."""

def analyze_sentiment(text: str) -> dict:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": text},
        ],
        temperature=0.1,
        max_tokens=256,
    )
    raw = response.choices[0].message.content.strip()
    if raw.endswith("."):
        raw = raw[:-1]
    return json.loads(raw)

reviews = [
    "The new dashboard is incredible. I can finally see all my metrics in one place.",
    "It crashes every time I try to export a PDF. Completely unusable.",
    "Login page loads fine. Nothing special, nothing broken.",
    "Great customer support, but the onboarding flow is confusing and took too long.",
    "I waited three days for a response. Horrible experience.",
]

results = []
for review in reviews:
    try:
        result = analyze_sentiment(review)
        result["source_text"] = review
        results.append(result)
        time.sleep(0.5)
    except json.JSONDecodeError as e:
        print(f"Failed to parse response for: {review[:50]}... Error: {e}")

for r in results:
    print(f"{r['sentiment']:10} | confidence: {r['confidence']:.2f} | {r['reasoning']}")

sentiment_counts = Counter(r["sentiment"] for r in results)
total = len(results)

print("\n--- Sentiment Summary ---")
for label in ["positive", "negative", "neutral", "mixed"]:
    count = sentiment_counts.get(label, 0)
    pct = (count / total) * 100
    print(f"{label.capitalize():10}: {count} ({pct:.1f}%)")

avg_confidence = sum(r["confidence"] for r in results) / total
print(f"\nAverage confidence: {avg_confidence:.2f}")

negatives = [r for r in results if r["sentiment"] == "negative"]
if negatives:
    print("\n--- Negative items requiring attention ---")
    for item in negatives:
        print(f"- {item['reasoning']} (text: {item['source_text'][:60]}...)")

Expected output:

positive   | confidence: 0.95 | The review expresses strong enthusiasm about the new dashboard.
negative   | confidence: 0.92 | The review describes a consistent crash and calls the product unusable.
neutral    | confidence: 0.78 | The review states functional facts without strong emotion.
mixed      | confidence: 0.85 | The review praises support but criticizes the onboarding experience.
negative   | confidence: 0.91 | The review highlights a long wait time and a horrible experience.

--- Sentiment Summary ---
Positive  : 1 (20.0%)
Negative  : 2 (40.0%)
Neutral   : 1 (20.0%)
Mixed     : 1 (20.0%)

Average confidence: 0.88

--- Negative items requiring attention ---
- The review describes a consistent crash and calls the product unusable. (text: It crashes every time I try to export a PDF. Com...)
- The review highlights a long wait time and a horrible experience. (text: I waited three days for a response. Horrible experi...)

Next steps

Replace the hardcoded reviews list with a CSV reader or a webhook that pulls fresh support tickets every hour. If you need to process high volumes, consider switching to deepseek-v3.2 on Oxlo.ai, which offers a free tier and strong instruction following for classification tasks. You could also add async batching with asyncio and semaphore limits to speed up large backlogs without hitting rate limits.

DEV Community