Building a Literature Learning Agent that turns raw academic abstracts into structured summaries, critical questions, and Anki flashcards saves hours for students and researchers. In this tutorial, I will walk through the exact Python script I use to process papers. We will call Oxlo.ai's OpenAI-compatible endpoint so that long abstracts do not inflate the bill, because Oxlo.ai charges per request rather than per token.
What you'll need
- Python 3.10 or newer
- The OpenAI SDK:
pip install openai - An Oxlo.ai API key from https://portal.oxlo.ai
I also recommend setting your key in an environment variable named OXLO_API_KEY so you do not commit it to git.
Step 1: Scaffold the client
First, I create an OpenAI client pointed at Oxlo.ai and verify that the connection works with a one-line test. I use llama-3.3-70b here because it is a reliable general-purpose model for instruction following.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.oxlo.ai/v1",
api_key=os.getenv("OXLO_API_KEY", "YOUR_OXLO_API_KEY"),
)
# Verify connectivity
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Say hello"}],
)
print(response.choices[0].message.content)
Step 2: Define the research system prompt
The system prompt is the only magic in this agent. It forces the model to return a predictable structure that I can parse later.
SYSTEM_PROMPT = """You are a research assistant helping graduate students understand academic papers.
When given an abstract, produce exactly the following sections:
1. Plain-Language Summary: One paragraph, no jargon.
2. Key Findings: A bulleted list of the three most important results.
3. Methodology Critique: One paragraph noting strengths and one weakness.
4. Follow-Up Questions: Three Socratic questions the student should ask their advisor.
5. Anki Flashcards: Exactly five question-and-answer pairs formatted as:
Q: [question]
A: [answer]
Do not include any text outside these sections."""
Step 3: Build the analysis function
Next, I wrap the API call in a function that accepts an abstract and returns the structured analysis. I keep the temperature low so the output stays deterministic.
def analyze_abstract(abstract: str) -> str:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Analyze this abstract:\n\n{abstract}"},
],
temperature=0.2,
)
return response.choices[0].message.content
Step 4: Generate structured flashcards with JSON mode
I run a second pass over the same abstract with deepseek-v3.2 and JSON mode to get machine-readable flashcards. Oxlo.ai supports the response_format parameter, so I can request a strict schema that my script consumes without regex.
import csv
import json
FLASHCARD_PROMPT = """You are a flashcard generator. Given an academic abstract, emit exactly 5 Anki flashcards as a JSON object.
The JSON must match this schema:
{
"flashcards": [
{"front": "...", "back": "..."}
]
}
Keep each front under 120 characters and each back under 240 characters."""
def generate_flashcards(abstract: str) -> list[dict]:
response = client.chat.completions.create(
model="deepseek-v3.2",
messages=[
{"role": "system", "content": FLASHCARD_PROMPT},
{"role": "user", "content": abstract},
],
response_format={"type": "json_object"},
temperature=0.2,
)
data = json.loads(response.choices[0].message.content)
return data["flashcards"]
def save_flashcards(cards: list[dict], filename: str = "flashcards.csv"):
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["Front", "Back"])
for card in cards:
writer.writerow([card["front"], card["back"]])
print(f"Saved {len(cards)} flashcards to {filename}")
Step 5: Tie it together
I add a small main block that reads an example abstract, prints the analysis, and writes the CSV. You can swap in any abstract from arXiv or PubMed.
if __name__ == "__main__":
sample_abstract = (
"We introduce a novel transformer architecture that reduces attention complexity "
"from quadratic to linear via kernelized positional sampling. Experiments on WikiText-103 "
"and BookCorpus show a 2.3x speedup at 4096-token context lengths with no loss in perplexity. "
"However, the method underperforms on tasks requiring fine-grained positional reasoning. "
"This work suggests a new trade-off between efficiency and positional fidelity in large language models."
)
print("=== Analysis ===")
analysis = analyze_abstract(sample_abstract)
print(analysis)
print("\n")
print("=== Flashcards ===")
cards = generate_flashcards(sample_abstract)
for c in cards:
print(f"Q: {c['front']}")
print(f"A: {c['back']}")
print()
save_flashcards(cards)
Run it
Save the full script as literature_agent.py and run it with your Oxlo.ai key exported.
export OXLO_API_KEY="sk-oxlo.ai-..."
python literature_agent.py
When I run this against the sample abstract, I get output similar to the following. Your exact wording will vary slightly because of sampling.
=== Analysis ===
1. Plain-Language Summary: This paper presents a new way to make transformers faster by changing how they handle position information, cutting the computational cost while keeping text quality the same on long documents.
2. Key Findings:
- Attention complexity drops from quadratic to linear using kernelized positional sampling.
- A 2.3x speedup is measured at 4096-token contexts on WikiText-103 and BookCorpus.
- Perplexity remains unchanged compared to standard transformers.
3. Methodology Critique: The experiments are thorough on language modeling benchmarks, but the evaluation lacks downstream task testing on retrieval or reasoning benchmarks. One weakness is the reported drop in fine-grained positional reasoning tasks.
4. Follow-Up Questions:
- Why does kernelized sampling preserve perplexity yet hurt positional reasoning?
- What downstream tasks beyond language modeling were considered, and why were they omitted?
- How does this approach compare to sparse attention patterns such as Longformer?
5. Anki Flashcards:
Q: What is the main contribution of this paper?
A: A linear-complexity transformer using kernelized positional sampling.
Q: On which datasets was the model evaluated?
A: WikiText-103 and BookCorpus.
...
=== Flashcards ===
Q: What complexity does the proposed attention reduce?
A: From quadratic to linear.
Q: What is the measured speedup at 4096 tokens?
A: 2.3x.
...
Saved 5 flashcards to flashcards.csv
Wrap up
From here, you can extend the agent by adding a PDF extraction step with PyPDF2 or pymupdf so it ingests full papers instead of just abstracts. Another solid upgrade is swapping in kimi-k2.6 for the analysis step when you need deeper reasoning over multi-page methodology sections, because its 131K context window on Oxlo.ai handles entire articles under the same flat per-request price.
Top comments (0)