We are building a command line test preparation tutor that generates adaptive practice questions, evaluates answers, and tracks weak spots across topics. It is useful for anyone building edtech tooling or students who want a personalized study assistant they can extend. Oxlo.ai is a natural fit here because its request-based pricing does not scale with input length, so detailed system prompts and multi-turn evaluation loops stay affordable. See https://oxlo.ai/pricing for plan details.
What you'll need
- Python 3.10 or newer
- The OpenAI SDK:
pip install openai - An Oxlo.ai API key from https://portal.oxlo.ai
- A topic you want to study. I will use distributed systems as the example.
1. Set up the Oxlo.ai client
We initialize the OpenAI-compatible client and verify the connection with a lightweight request. I use llama-3.3-70b because Oxlo.ai serves it with no cold starts.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "Say OK"},
],
)
print(response.choices[0].message.content)
2. Lock in the system prompt
The system prompt enforces the output format so we can parse questions and feedback without brittle regex. I validate it with a quick call to qwen-3-32b.
SYSTEM_PROMPT = """You are an adaptive test-prep tutor. Your job is to help the user master a specific topic through targeted practice.
Rules:
- Generate exactly one multiple-choice question per call unless asked to evaluate.
- Provide four options labeled A, B, C, D.
- After the options, output an answer line exactly like this: Answer: X
- When evaluating a user answer, start with either "Correct" or "Incorrect".
- Then give a two-sentence explanation.
- If incorrect, end with a single-line JSON block: {"weak_area": "specific subtopic", "difficulty": 1-10}
- Keep explanations concise and actionable."""
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
response = client.chat.completions.create(
model="qwen-3-32b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "Confirm you understand the rules by saying 'Ready.'"},
],
)
print(response.choices[0].message.content)
3. Generate adaptive questions
I add a helper that asks the model for a new question at a chosen difficulty level. The difficulty slider is what makes the tool adaptive.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
SYSTEM_PROMPT = """You are an adaptive test-prep tutor. Your job is to help the user master a specific topic through targeted practice.
Rules:
- Generate exactly one multiple-choice question per call unless asked to evaluate.
- Provide four options labeled A, B, C, D.
- After the options, output an answer line exactly like this: Answer: X
- When evaluating a user answer, start with either "Correct" or "Incorrect".
- Then give a two-sentence explanation.
- If incorrect, end with a single-line JSON block: {"weak_area": "specific subtopic", "difficulty": 1-10}
- Keep explanations concise and actionable."""
def generate_question(topic, difficulty):
user_message = f"Topic: {topic}. Difficulty: {difficulty}/10. Generate one multiple-choice question."
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content
if __name__ == "__main__":
print(generate_question("Distributed Systems", 5))
4. Evaluate answers and explain gaps
When the user answers, I send their choice back along with the original question. I switch to kimi-k2.6 here because its reasoning capabilities shine at diagnosing conceptual misunderstandings.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
SYSTEM_PROMPT = """You are an adaptive test-prep tutor. Your job is to help the user master a specific topic through targeted practice.
Rules:
- Generate exactly one multiple-choice question per call unless asked to evaluate.
- Provide four options labeled A, B, C, D.
- After the options, output an answer line exactly like this: Answer: X
- When evaluating a user answer, start with either "Correct" or "Incorrect".
- Then give a two-sentence explanation.
- If incorrect, end with a single-line JSON block: {"weak_area": "specific subtopic", "difficulty": 1-10}
- Keep explanations concise and actionable."""
def evaluate_answer(question_text, user_answer):
user_message = (
f"Question:\n{question_text}\n\n"
f"User selected: {user_answer}\n\n"
f"Evaluate the answer. Follow the output rules in your system prompt."
)
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content
if __name__ == "__main__":
q = "Which consistency model guarantees that if one process reads a value, any subsequent read by that process will never return an older value?\n\nA. Linearizability\nB. Sequential consistency\nC. Read-your-writes consistency\nD. Monotonic reads\n\nAnswer: D"
print(evaluate_answer(q, "A"))
5. Generate targeted review hints
When the user gets a question wrong, I do not just log it. I ask deepseek-v3.2 to generate a one-sentence review hint focused on the exact weak area. This turns the progress file into actionable study material.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
def generate_hint(weak_area):
user_message = f"Give one concise study hint for the weak area: {weak_area}. One sentence only."
response = client.chat.completions.create(
model="deepseek-v3.2",
messages=[
{"role": "system", "content": "You are a concise engineering tutor."},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content
if __name__ == "__main__":
print(generate_hint("consistency models"))
6. Build the study session loop
Now I wire everything together. The loop picks difficulty based on error rate, asks a question, reads keyboard input, evaluates, and updates a local JSON progress file.
import json
import re
from pathlib import Path
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
TOPIC = "Distributed Systems"
PROGRESS_FILE = Path("progress.json")
SYSTEM_PROMPT = """You are an adaptive test-prep tutor. Your job is to help the user master a specific topic through targeted practice.
Rules:
- Generate exactly one multiple-choice question per call unless asked to evaluate.
- Provide four options labeled A, B, C, D.
- After the options, output an answer line exactly like this: Answer: X
- When evaluating a user answer, start with either "Correct" or "Incorrect".
- Then give a two-sentence explanation.
- If incorrect, end with a single-line JSON block: {"weak_area": "specific subtopic", "difficulty": 1-10}
- Keep explanations concise and actionable."""
def load_progress():
if PROGRESS_FILE.exists():
with open(PROGRESS_FILE, "r") as f:
return json.load(f)
return {}
def save_progress(progress):
with open(PROGRESS_FILE, "w") as f:
json.dump(progress, f, indent=2)
def get_difficulty():
progress = load_progress()
if not progress:
return 5
total_err = sum(v["errors"] for v in progress.values())
total_att = sum(v["attempts"] for v in progress.values())
rate = total_err / total_att if total_att else 0
return max(1, min(10, int(5 + (1 - rate) * 5)))
def generate_question(topic, difficulty):
user_message = f"Topic: {topic}. Difficulty: {difficulty}/10. Generate one multiple-choice question."
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content
def evaluate_answer(question_text, user_answer):
user_message = (
f"Question:\n{question_text}\n\n"
f"User selected: {user_answer}\n\n"
f"Evaluate the answer. Follow the output rules in your system prompt."
)
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content
def update_progress(evaluation_text):
match = re.search(r'\{.*?\}', evaluation_text, re.DOTALL)
if not match:
return
data = json.loads(match.group())
area = data.get("weak_area", "general")
progress = load_progress()
if area not in progress:
progress[area] = {"attempts": 0, "errors": 0}
progress[area]["attempts"] += 1
if "Incorrect" in evaluation_text:
progress[area]["errors"] += 1
save_progress(progress)
def main():
print(f"Starting study session for: {TOPIC}")
while True:
diff = get_difficulty()
print(f"\nDifficulty level: {diff}")
q = generate_question(TOPIC, diff)
print("\n" + q + "\n")
ans = input("Your answer (A/B/C/D or q to quit): ").strip().upper()
if ans == "Q":
break
feedback = evaluate_answer(q, ans)
print("\n" + feedback + "\n")
update_progress(feedback)
print("Progress saved.")
if __name__ == "__main__":
main()
Run it
Save the full script as tutor.py, export your key, and run:
export OXLO_API_KEY="sk-..."
python tutor.py
Example session output:
Starting study session for: Distributed Systems
Difficulty level: 5
Which consistency model guarantees that if one process reads a value, any subsequent read by that process will never return an older value?
A. Linearizability
B. Sequential consistency
C. Read-your-writes consistency
D. Monotonic reads
Answer: D
Your answer (A/B/C/D or q to quit): A
Incorrect. Monotonic reads ensure a process never sees older values after having seen a newer one, while linearizability is a stronger real-time guarantee. Review session consistency models to see how they form a hierarchy.
{"weak_area": "consistency models", "difficulty": 6}
Progress saved.
Next steps
Replace the JSON file with a SQLite database so you can track per-user history and compute actual spaced-repetition intervals. Or add a second Oxlo.ai call using deepseek-v3.2 to generate a targeted follow-up reading list based on the weak areas collected after each session.
Top comments (0)