DEV Community

shashank ms
shashank ms

Posted on

Building Effective Test Preparation Tools with LLMs: Best Practices

We are building a command line test preparation tutor that generates adaptive practice questions, evaluates answers, and tracks weak spots across topics. It is useful for anyone building edtech tooling or students who want a personalized study assistant they can extend. Oxlo.ai is a natural fit here because its request-based pricing does not scale with input length, so detailed system prompts and multi-turn evaluation loops stay affordable. See https://oxlo.ai/pricing for plan details.

What you'll need

  • Python 3.10 or newer
  • The OpenAI SDK: pip install openai
  • An Oxlo.ai API key from https://portal.oxlo.ai
  • A topic you want to study. I will use distributed systems as the example.

1. Set up the Oxlo.ai client

We initialize the OpenAI-compatible client and verify the connection with a lightweight request. I use llama-3.3-70b because Oxlo.ai serves it with no cold starts.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Say OK"},
    ],
)
print(response.choices[0].message.content)

2. Lock in the system prompt

The system prompt enforces the output format so we can parse questions and feedback without brittle regex. I validate it with a quick call to qwen-3-32b.

SYSTEM_PROMPT = """You are an adaptive test-prep tutor. Your job is to help the user master a specific topic through targeted practice.

Rules:
- Generate exactly one multiple-choice question per call unless asked to evaluate.
- Provide four options labeled A, B, C, D.
- After the options, output an answer line exactly like this: Answer: X
- When evaluating a user answer, start with either "Correct" or "Incorrect".
- Then give a two-sentence explanation.
- If incorrect, end with a single-line JSON block: {"weak_area": "specific subtopic", "difficulty": 1-10}
- Keep explanations concise and actionable."""

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

response = client.chat.completions.create(
    model="qwen-3-32b",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "Confirm you understand the rules by saying 'Ready.'"},
    ],
)
print(response.choices[0].message.content)

3. Generate adaptive questions

I add a helper that asks the model for a new question at a chosen difficulty level. The difficulty slider is what makes the tool adaptive.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are an adaptive test-prep tutor. Your job is to help the user master a specific topic through targeted practice.

Rules:
- Generate exactly one multiple-choice question per call unless asked to evaluate.
- Provide four options labeled A, B, C, D.
- After the options, output an answer line exactly like this: Answer: X
- When evaluating a user answer, start with either "Correct" or "Incorrect".
- Then give a two-sentence explanation.
- If incorrect, end with a single-line JSON block: {"weak_area": "specific subtopic", "difficulty": 1-10}
- Keep explanations concise and actionable."""

def generate_question(topic, difficulty):
    user_message = f"Topic: {topic}. Difficulty: {difficulty}/10. Generate one multiple-choice question."
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    print(generate_question("Distributed Systems", 5))

4. Evaluate answers and explain gaps

When the user answers, I send their choice back along with the original question. I switch to kimi-k2.6 here because its reasoning capabilities shine at diagnosing conceptual misunderstandings.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are an adaptive test-prep tutor. Your job is to help the user master a specific topic through targeted practice.

Rules:
- Generate exactly one multiple-choice question per call unless asked to evaluate.
- Provide four options labeled A, B, C, D.
- After the options, output an answer line exactly like this: Answer: X
- When evaluating a user answer, start with either "Correct" or "Incorrect".
- Then give a two-sentence explanation.
- If incorrect, end with a single-line JSON block: {"weak_area": "specific subtopic", "difficulty": 1-10}
- Keep explanations concise and actionable."""

def evaluate_answer(question_text, user_answer):
    user_message = (
        f"Question:\n{question_text}\n\n"
        f"User selected: {user_answer}\n\n"
        f"Evaluate the answer. Follow the output rules in your system prompt."
    )
    response = client.chat.completions.create(
        model="kimi-k2.6",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    q = "Which consistency model guarantees that if one process reads a value, any subsequent read by that process will never return an older value?\n\nA. Linearizability\nB. Sequential consistency\nC. Read-your-writes consistency\nD. Monotonic reads\n\nAnswer: D"
    print(evaluate_answer(q, "A"))

5. Generate targeted review hints

When the user gets a question wrong, I do not just log it. I ask deepseek-v3.2 to generate a one-sentence review hint focused on the exact weak area. This turns the progress file into actionable study material.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

def generate_hint(weak_area):
    user_message = f"Give one concise study hint for the weak area: {weak_area}. One sentence only."
    response = client.chat.completions.create(
        model="deepseek-v3.2",
        messages=[
            {"role": "system", "content": "You are a concise engineering tutor."},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    print(generate_hint("consistency models"))

6. Build the study session loop

Now I wire everything together. The loop picks difficulty based on error rate, asks a question, reads keyboard input, evaluates, and updates a local JSON progress file.

import json
import re
from pathlib import Path
from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

TOPIC = "Distributed Systems"
PROGRESS_FILE = Path("progress.json")

SYSTEM_PROMPT = """You are an adaptive test-prep tutor. Your job is to help the user master a specific topic through targeted practice.

Rules:
- Generate exactly one multiple-choice question per call unless asked to evaluate.
- Provide four options labeled A, B, C, D.
- After the options, output an answer line exactly like this: Answer: X
- When evaluating a user answer, start with either "Correct" or "Incorrect".
- Then give a two-sentence explanation.
- If incorrect, end with a single-line JSON block: {"weak_area": "specific subtopic", "difficulty": 1-10}
- Keep explanations concise and actionable."""

def load_progress():
    if PROGRESS_FILE.exists():
        with open(PROGRESS_FILE, "r") as f:
            return json.load(f)
    return {}

def save_progress(progress):
    with open(PROGRESS_FILE, "w") as f:
        json.dump(progress, f, indent=2)

def get_difficulty():
    progress = load_progress()
    if not progress:
        return 5
    total_err = sum(v["errors"] for v in progress.values())
    total_att = sum(v["attempts"] for v in progress.values())
    rate = total_err / total_att if total_att else 0
    return max(1, min(10, int(5 + (1 - rate) * 5)))

def generate_question(topic, difficulty):
    user_message = f"Topic: {topic}. Difficulty: {difficulty}/10. Generate one multiple-choice question."
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

def evaluate_answer(question_text, user_answer):
    user_message = (
        f"Question:\n{question_text}\n\n"
        f"User selected: {user_answer}\n\n"
        f"Evaluate the answer. Follow the output rules in your system prompt."
    )
    response = client.chat.completions.create(
        model="kimi-k2.6",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

def update_progress(evaluation_text):
    match = re.search(r'\{.*?\}', evaluation_text, re.DOTALL)
    if not match:
        return
    data = json.loads(match.group())
    area = data.get("weak_area", "general")
    progress = load_progress()
    if area not in progress:
        progress[area] = {"attempts": 0, "errors": 0}
    progress[area]["attempts"] += 1
    if "Incorrect" in evaluation_text:
        progress[area]["errors"] += 1
    save_progress(progress)

def main():
    print(f"Starting study session for: {TOPIC}")
    while True:
        diff = get_difficulty()
        print(f"\nDifficulty level: {diff}")
        q = generate_question(TOPIC, diff)
        print("\n" + q + "\n")
        ans = input("Your answer (A/B/C/D or q to quit): ").strip().upper()
        if ans == "Q":
            break
        feedback = evaluate_answer(q, ans)
        print("\n" + feedback + "\n")
        update_progress(feedback)
        print("Progress saved.")

if __name__ == "__main__":
    main()

Run it

Save the full script as tutor.py, export your key, and run:

export OXLO_API_KEY="sk-..."
python tutor.py

Example session output:

Starting study session for: Distributed Systems

Difficulty level: 5

Which consistency model guarantees that if one process reads a value, any subsequent read by that process will never return an older value?

A. Linearizability
B. Sequential consistency
C. Read-your-writes consistency
D. Monotonic reads

Answer: D

Your answer (A/B/C/D or q to quit): A

Incorrect. Monotonic reads ensure a process never sees older values after having seen a newer one, while linearizability is a stronger real-time guarantee. Review session consistency models to see how they form a hierarchy.

{"weak_area": "consistency models", "difficulty": 6}

Progress saved.

Next steps

Replace the JSON file with a SQLite database so you can track per-user history and compute actual spaced-repetition intervals. Or add a second Oxlo.ai call using deepseek-v3.2 to generate a targeted follow-up reading list based on the weak areas collected after each session.

Top comments (0)