Using Oxlo.ai for Academic Research with LLMs

#learnai #oxlo #ai

I built a lightweight literature synthesis agent that ingests academic paper abstracts and returns structured summaries, method tags, and limitation flags. It runs entirely on Oxlo.ai's flat per-request pricing (https://oxlo.ai/pricing), which keeps costs predictable when I feed it long PDF excerpts or multi-paper context windows. If you are a researcher or grad student tired of token-metering your literature reviews, this tutorial will get you running in under ten minutes.

What you'll need

Before starting, grab your Oxlo.ai API key from https://portal.oxlo.ai. You will also need Python 3.10+ and the OpenAI SDK.

Python 3.10+
pip install openai
An Oxlo.ai API key

Step 1: Instantiate the client

I start by instantiating the client. Oxlo.ai is fully OpenAI SDK compatible, so the only change is the base URL.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful research assistant."},
        {"role": "user", "content": "List three best practices for prompt engineering."},
    ],
)

print(response.choices[0].message.content)

Step 2: Define the system prompt

Next, I define the system prompt that shapes every analysis. Keeping this in a dedicated variable makes it easy to iterate without touching the request logic.

SYSTEM_PROMPT = """You are an academic literature analyzer. Given a paper title and abstract, output a JSON object with exactly these keys:

- summary: one paragraph describing the core contribution.
- methods: a list of techniques, datasets, or experimental setups used.
- limitations: a list of stated or implied limitations.
- related_work_queries: three concrete search queries to find related papers.

Be concise, technical, and accurate. Output only valid JSON."""

Step 3: Analyze a single paper

Now I wrap the call in a function that accepts a title and abstract, then returns parsed JSON. I use Llama 3.3 70B because it handles structured instructions reliably.

import json

def analyze_paper(title: str, abstract: str) -> dict:
    user_message = f"Title: {title}\n\nAbstract: {abstract}"

    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
        response_format={"type": "json_object"},
    )

    raw = response.choices[0].message.content
    return json.loads(raw)

Step 4: Synthesize multiple papers

For literature reviews, I often need to compare several papers at once. Because Oxlo.ai uses request-based pricing, I can pack multiple full abstracts into a single prompt without paying for extra input tokens. This is where Oxlo.ai wins over token-based billing for long-context research tasks. I switch to Kimi K2.6 here for its strong reasoning window.

COMPARE_PROMPT = """You are a literature review synthesizer. Given several paper abstracts, output a JSON object with exactly these keys:

- synthesis: one paragraph integrating the main findings.
- method_comparison: a list comparing the techniques used in each paper.
- gaps: a list of research gaps not addressed by the provided papers.

Output only valid JSON."""

def synthesize_papers(papers: list[dict]) -> dict:
    blocks = []
    for idx, paper in enumerate(papers, 1):
        blocks.append(f"[{idx}] {paper['title']}\n{paper['abstract']}")

    user_message = "\n\n".join(blocks)

    response = client.chat.completions.create(
        model="kimi-k2.6",
        messages=[
            {"role": "system", "content": COMPARE_PROMPT},
            {"role": "user", "content": user_message},
        ],
        response_format={"type": "json_object"},
    )

    return json.loads(response.choices[0].message.content)

Step 5: Generate BibTeX entries

Finally, I add a helper to generate BibTeX entries. DeepSeek V3.2 handles the formatting consistently, and I can call it in the same script without managing a second provider.

def generate_bibtex(title: str, authors: list[str], year: str, venue: str) -> str:
    user_message = (
        f"Generate a clean BibTeX entry for the following paper.\n\n"
        f"Title: {title}\n"
        f"Authors: {', '.join(authors)}\n"
        f"Year: {year}\n"
        f"Venue: {venue}\n\n"
        f"Output only the BibTeX block."
    )

    response = client.chat.completions.create(
        model="deepseek-v3.2",
        messages=[
            {"role": "system", "content": "You generate valid BibTeX citations."},
            {"role": "user", "content": user_message},
        ],
    )

    return response.choices[0].message.content

Run it

I test the analyzer with a well-known paper. The JSON output is ready to drop into a note-taking app or reference manager.

if __name__ == "__main__":
    paper = {
        "title": "Attention Is All You Need",
        "abstract": (
            "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks "
            "that include an encoder and a decoder. The best performing models also connect the encoder and decoder through "
            "an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention "
            "mechanisms, dispensing with recurrence and convolutions entirely."
        ),
    }

    result = analyze_paper(paper["title"], paper["abstract"])
    print(json.dumps(result, indent=2))

Example output:

{
  "summary": "The authors introduce the Transformer, a novel sequence transduction architecture that replaces recurrent and convolutional layers with multi-head self-attention, achieving state-of-the-art translation quality while enabling significantly faster training.",
  "methods": [
    "Multi-head self-attention",
    "Positional encoding",
    "Encoder-decoder architecture",
    "WMT 2014 English-German and English-French datasets"
  ],
  "limitations": [
    "Quadratic complexity with respect to sequence length",
    "Positional encoding may not generalize to lengths unseen during training"
  ],
  "related_work_queries": [
    "efficient attention mechanisms long sequences transformer",
    "non-autoregressive neural machine translation",
    "positional encoding generalization length extrapolation"
  ]
}

Wrap-up

This agent is already useful for my weekend literature sweeps. Two concrete next steps: wire it to the arXiv API to auto-ingest papers by keyword, or swap in deepseek-r1-671b-moe on Oxlo.ai for deeper methodological critique when evaluating complex experimental designs. Because Oxlo.ai does not charge by the token, I can stuff entire related-work sections into context without watching a meter run up.