Unlocking LLM Potential in Education and Research

#learnai #oxlo #ai

We will build a research synthesis agent that turns a pile of paper abstracts into a structured literature review and study guide. It is aimed at grad students and research assistants who need to map a field quickly. I run it on Oxlo.ai because the flat per-request pricing means I can feed it long concatenated abstracts without watching token costs scale, which matters when you are processing dozens of papers (see https://oxlo.ai/pricing).

What you'll need

Python 3.10 or newer
An Oxlo.ai API key from https://portal.oxlo.ai
The OpenAI SDK installed with pip install openai

Step 1: Configure the Oxlo.ai client

First, import the SDK and point the base URL to Oxlo.ai. I keep the key in an environment variable, but you can paste it directly for local testing.

import json
from openai import OpenAI

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key="YOUR_OXLO_API_KEY"
)

Step 2: Define the system prompt

The system prompt is the contract between you and the model. It tells the agent to act as a research reviewer and to return valid JSON with specific keys.

SYSTEM_PROMPT = """You are a research synthesis assistant. Analyze the provided academic abstracts and emit a structured literature review. Be concise, accurate, and note contradictions.

Return valid JSON with exactly these keys:
- summary: a 2-sentence overview of the field state
- themes: an array of 3 to 5 objects, each with label and description
- gaps: an array of 2 to 3 open research gaps
- contradictions: an array of conflicting findings, or an empty array if none exist
"""

Step 3: Format raw inputs into a research bundle

Next, we need a helper that takes a topic and a list of abstracts and turns them into a single clean string. This keeps the chat message tidy and separates sources with numeric markers.

def format_research_bundle(topic, abstracts):
    lines = [f"Topic: {topic}", "", "Abstracts:"]
    for idx, text in enumerate(abstracts, start=1):
        lines.append(f"[{idx}] {text.strip()}")
    return "\n".join(lines)

Step 4: Synthesize structured output via JSON mode

Now we can call Llama 3.3 70B through Oxlo.ai and enforce JSON mode so the response is machine readable. I set the temperature low because literature review should be grounded, not creative.

def run_synthesis(topic, abstracts):
    user_message = format_research_bundle(topic, abstracts)

    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
        response_format={"type": "json_object"},
        temperature=0.3,
    )

    return json.loads(response.choices[0].message.content)

Step 5: Generate study questions from the synthesis

Finally, we reuse the synthesis to generate review questions. I switch to Qwen 3 32B here because it handles structured agent workflows cleanly, and on Oxlo.ai the request cost stays flat even though the input now includes the full JSON payload from the previous step.

STUDY_PROMPT = """You are an exam prep assistant. Given a research topic and a literature synthesis, generate 5 graduate-level review questions. Return valid JSON with key 'study_questions' containing objects with 'question' and 'concept_tested'."""

def generate_study_questions(topic, synthesis):
    user_message = json.dumps({"topic": topic, "synthesis": synthesis}, indent=2)

    response = client.chat.completions.create(
        model="qwen-3-32b",
        messages=[
            {"role": "system", "content": STUDY_PROMPT},
            {"role": "user", "content": user_message},
        ],
        response_format={"type": "json_object"},
        temperature=0.4,
    )

    return json.loads(response.choices[0].message.content)

Run it

Save the complete script as research_agent.py and run it. The example below passes three mock abstracts on chain-of-thought reasoning and prints the synthesis followed by study questions.

if __name__ == "__main__":
    topic = "Chain-of-thought reasoning in small language models"

    abstracts = [
        "We investigate whether chain-of-thought prompting improves reasoning in models under 10B parameters. Results show minimal gains without explicit intermediate supervision.",
        "Our distillation framework transfers chain-of-thought capabilities from a 70B teacher to a 7B student, achieving strong reasoning benchmarks.",
        "We find that step-by-step reasoning is largely emergent above 100B parameters, suggesting architectural bottlenecks in smaller models."
    ]

    print("=== Synthesis ===")
    synthesis = run_synthesis(topic, abstracts)
    print(json.dumps(synthesis, indent=2))

    print("\n=== Study Questions ===")
    questions = generate_study_questions(topic, synthesis)
    print(json.dumps(questions, indent=2))

Typical output looks like this:

{
  "summary": "Research on chain-of-thought reasoning in small language models shows mixed results, with some work achieving gains through distillation while others find reasoning largely emergent only at larger scales.",
  "themes": [
    {"label": "Prompting strategies", "description": "Methods for eliciting step-by-step reasoning through manual or automated prompts."},
    {"label": "Distillation", "description": "Transferring reasoning patterns from large teachers to compact students."},
    {"label": "Emergent capabilities", "description": "The observation that certain reasoning behaviors appear only above parameter thresholds."}
  ],
  "gaps": [
    "Limited evaluation on multilingual reasoning benchmarks.",
    "Lack of consensus on minimum viable model size for reliable chain-of-thought."
  ],
  "contradictions": [
    "Some studies report minimal gains under 10B parameters, while distillation work demonstrates strong reasoning in a 7B student."
  ]
}

{
  "study_questions": [
    {"question": "What distinguishes zero-shot chain-of-thought from few-shot variants?", "concept_tested": "Prompting taxonomy"},
    {"question": "How does distillation transfer reasoning capabilities across scale boundaries?", "concept_tested": "Knowledge transfer"},
    {"question": "What evidence suggests that chain-of-thought is an emergent rather than learnable behavior in small models?", "concept_tested": "Emergence vs learnability"},
    {"question": "Which architectural bottlenecks limit step-by-step reasoning in sub-10B parameter models?", "concept_tested": "Model architecture"},
    {"question": "Why might intermediate supervision be necessary for reliable reasoning in small models?", "concept_tested": "Supervision signals"}
  ]
}

Next steps

Wire this agent to the Semantic Scholar API so it fetches abstracts automatically instead of relying on manual paste. Or wrap the CLI in a lightweight Streamlit app so lab mates can upload a PDF and get the same structured output without touching code.