We are building a batch text analysis agent that reads raw .txt files and emits structured JSON containing sentiment, entities, topics, and a summary. It is aimed at developers who need to process unstructured documents without maintaining separate NLP libraries for each task. We will run it against Oxlo.ai, where flat per-request pricing means cost does not scale with input length, unlike token-based providers such as Together AI, Fireworks AI, or OpenRouter. That makes Oxlo.ai a strong fit for long-context text analysis workloads. See https://oxlo.ai/pricing for details.
What you'll need
- Python 3.10 or newer
- The OpenAI SDK installed with
pip install openai - An Oxlo.ai API key from https://portal.oxlo.ai
Step 1: Define the system prompt
The system prompt forces the model to return only a JSON object with four required keys. Keeping this strict reduces parsing errors later.
SYSTEM_PROMPT = """You are a precise text analysis engine. Analyze the user provided text and respond with a single JSON object containing exactly these keys:
- sentiment: one of Positive, Negative, Neutral, or Mixed
- entities: an array of objects with keys name and type (Person, Organization, Location, Product, or Event)
- topics: an array of up to five strings representing main themes
- summary: a one sentence summary of the text
Respond with only the JSON object. No markdown fences, no commentary."""
Step 2: Initialize the client and build the analysis function
We point the OpenAI SDK at Oxlo.ai and wrap the call in a small function that strips accidental markdown and parses JSON. I use Llama 3.3 70B here because it follows structured instructions reliably.
from openai import OpenAI
import json
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
def analyze_text(text: str) -> dict:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": text},
],
)
raw = response.choices[0].message.content.strip()
if raw.startswith("
```"):
raw = raw.strip("`").strip()
if raw.lower().startswith("json"):
raw = raw[4:].strip()
return json.loads(raw)
Step 3: Add batch file ingestion
Real workloads rarely involve a single string. This helper reads every .txt file in a directory and attaches the filename to the result so we can trace output back to its source.
from pathlib import Path
def analyze_directory(directory: str):
results = []
for path in Path(directory).glob("*.txt"):
text = path.read_text(encoding="utf-8")
try:
analysis = analyze_text(text)
analysis["file"] = path.name
results.append(analysis)
except Exception as e:
print(f"Failed on {path.name}: {e}")
return results
Step 4: Harden against malformed output
Even with a strong prompt, an LLM can occasionally prepend a stray word. Rather than crashing the pipeline, we catch JSON errors and make one retry with a corrected prompt.
import json
def safe_analyze(text: str) -> dict:
try:
return analyze_text(text)
except json.JSONDecodeError:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": text},
{"role": "assistant", "content": "That was not valid JSON. Return only the JSON object."},
],
)
raw = response.choices[0].message.content.strip()
if raw.startswith("```
"):
raw = raw.strip("`").strip()
if raw.lower().startswith("json"):
raw = raw[4:].strip()
return json.loads(raw)
Step 5: Build the CLI interface
We add a small argparse wrapper so the script accepts a directory and writes newline-delimited JSON. This makes it easy to pipe results into jq or load them into Pandas.
import argparse
import json
from pathlib import Path
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
SYSTEM_PROMPT = """You are a precise text analysis engine. Analyze the user provided text and respond with a single JSON object containing exactly these keys:
- sentiment: one of Positive, Negative, Neutral, or Mixed
- entities: an array of objects with keys name and type (Person, Organization, Location, Product, or Event)
- topics: an array of up to five strings representing main themes
- summary: a one sentence summary of the text
Respond with only the JSON object. No markdown fences, no commentary."""
def analyze_text(text: str) -> dict:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": text},
],
)
raw = response.choices[0].message.content.strip()
if raw.startswith("```"):
raw = raw.strip("`").strip()
if raw.lower().startswith("json"):
raw = raw[4:].strip()
return json.loads(raw)
def safe_analyze(text: str) -> dict:
try:
return analyze_text(text)
except json.JSONDecodeError:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": text},
{"role": "assistant", "content": "That was not valid JSON. Return only the JSON object."},
],
)
raw = response.choices[0].message.content.strip()
if raw.startswith("```"):
raw = raw.strip("`").strip()
if raw.lower().startswith("json"):
raw = raw[4:].strip()
return json.loads(raw)
def analyze_directory(directory: str):
results = []
for path in Path(directory).glob("*.txt"):
text = path.read_text(encoding="utf-8")
try:
analysis = safe_analyze(text)
analysis["file"] = path.name
results.append(analysis)
except Exception as e:
print(f"Failed on {path.name}: {e}")
return results
def main():
parser = argparse.ArgumentParser(description="Batch text analysis with Oxlo.ai")
parser.add_argument("directory", help="Path to directory containing .txt files")
parser.add_argument("--output", default="analysis.jsonl", help="Output file")
args = parser.parse_args()
results = analyze_directory(args.directory)
with open(args.output, "w", encoding="utf-8") as f:
for r in results:
f.write(json.dumps(r, ensure_ascii=False) + "\n")
print(f"Wrote {len(results)} analyses to {args.output}")
if __name__ == "__main__":
main()
Run it
Create a few sample documents and invoke the script. Because Oxlo.ai uses flat per-request pricing, these files cost the same whether each one is two hundred or two thousand tokens.
mkdir sample_docs
echo "Apple Inc. reported record quarterly earnings yesterday, driven by strong iPhone sales across Asia and a new partnership with a major Japanese carrier." > sample_docs/tech.txt
echo "The city council approved a new downtown park project after months of debate. Local residents and the Green Earth Organization celebrated the decision." > sample_docs/local.txt
python analyze.py sample_docs --output results.jsonl
cat results.jsonl
Expected output:
{"sentiment": "Positive", "entities": [{"name": "Apple Inc.", "type": "Organization"}, {"name": "Asia", "type": "Location"}], "topics": ["earnings", "iPhone sales", "partnership"], "summary": "Apple Inc. reported record quarterly earnings driven by strong iPhone sales in Asia and a new Japanese partnership.", "file": "tech.txt"}
{"sentiment": "Positive", "entities": [{"name": "Green Earth Organization", "type": "Organization"}, {"name": "city council", "type": "Organization"}], "topics": ["urban planning", "public parks", "local government"], "summary": "The city council approved a new downtown park project celebrated by residents and the Green Earth Organization.", "file": "local.txt"}
Wrap-up
You now have a working batch text analysis pipeline that turns unstructured files into structured JSON. Two concrete next steps: deploy this as a FastAPI endpoint so other services can POST text directly for analysis, or experiment with qwen-3-32b on Oxlo.ai for multilingual document processing without changing any client code.
Top comments (0)