Demystifying LLMs: A Beginner's Guide

#learnai #oxlo #ai

We are going to build a command-line tutor that uses a large language model to explain complex AI topics in plain English. If you are new to LLMs, this project will show you exactly how a system prompt and a few lines of Python shape the answers you get back. By the end you will have a working repl that keeps conversation context and lets you compare models on Oxlo.ai.

What you'll need

An Oxlo.ai API key from https://portal.oxlo.ai. The free tier includes 60 requests per day and a 7-day full-access trial, which is plenty for this project.
Python 3.10 or newer.
The OpenAI SDK: pip install openai.

Step 1: Make your first call

Before we build the tutor, let us verify that the Oxlo.ai endpoint and your API key are working. The following script sends a single question to Llama 3.3 70B and prints the reply.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "What is a large language model?"},
    ],
)

print(response.choices[0].message.content)

Run the script. If you see a coherent paragraph explaining LLMs, your setup is ready.

Step 2: Define the tutor's personality

LLMs pay close attention to the system prompt. We will give our tutor a specific voice: patient, concise, and focused on analogies. Store the prompt in a constant so it is easy to tweak later.

SYSTEM_PROMPT = """You are a patient technical tutor explaining AI concepts to a beginner.
Follow these rules:
1. Answer in two short paragraphs or less.
2. Use an analogy from everyday life.
3. Avoid hype or vague buzzwords.
4. If the user asks a follow-up, build on your previous explanation."""

Now we wire that prompt into the request. Notice how the model stays in character compared with the generic answer in Step 1.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor explaining AI concepts to a beginner.
Follow these rules:
1. Answer in two short paragraphs or less.
2. Use an analogy from everyday life.
3. Avoid hype or vague buzzwords.
4. If the user asks a follow-up, build on your previous explanation."""

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "What is a large language model?"},
    ],
)

print(response.choices[0].message.content)

Step 3: Add conversation memory

Real tutors remember what you asked two minutes ago. We will store the message history in a list and append each new exchange so the model can reference earlier context.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor explaining AI concepts to a beginner.
Follow these rules:
1. Answer in two short paragraphs or less.
2. Use an analogy from everyday life.
3. Avoid hype or vague buzzwords.
4. If the user asks a follow-up, build on your previous explanation."""

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "What is a large language model?"},
]

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=messages,
)

answer = response.choices[0].message.content
print(answer)

# Append the exchange so the next question has context
messages.append({"role": "assistant", "content": answer})
messages.append({"role": "user", "content": "How is it different from a search engine?"})

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=messages,
)

print(response.choices[0].message.content)

Step 4: Build the interactive loop

Hard-coding questions is not practical. Let us wrap the logic in a small repl that keeps the history alive until the user types exit.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor explaining AI concepts to a beginner.
Follow these rules:
1. Answer in two short paragraphs or less.
2. Use an analogy from everyday life.
3. Avoid hype or vague buzzwords.
4. If the user asks a follow-up, build on your previous explanation."""

messages = [{"role": "system", "content": SYSTEM_PROMPT}]

print("AI Tutor is ready. Ask anything about LLMs, or type 'exit' to quit.")

while True:
    user_input = input("\nYou: ").strip()
    if user_input.lower() in {"exit", "quit"}:
        break

    messages.append({"role": "user", "content": user_input})

    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=messages,
    )

    reply = response.choices[0].message.content
    print(f"Tutor: {reply}")

    messages.append({"role": "assistant", "content": reply})

Step 5: Swap models with one line

One advantage of Oxlo.ai is the breadth of models available under the same endpoint. If you want to see how a reasoning-oriented model handles the same prompt, change the model string. The code below uses DeepSeek V3.2, which is strong at coding and reasoning, and sits on the free tier.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor explaining AI concepts to a beginner.
Follow these rules:
1. Answer in two short paragraphs or less.
2. Use an analogy from everyday life.
3. Avoid hype or vague buzzwords.
4. If the user asks a follow-up, build on your previous explanation."""

messages = [{"role": "system", "content": SYSTEM_PROMPT}]

print("AI Tutor is ready. Ask anything about LLMs, or type 'exit' to quit.")

while True:
    user_input = input("\nYou: ").strip()
    if user_input.lower() in {"exit", "quit"}:
        break

    messages.append({"role": "user", "content": user_input})

    response = client.chat.completions.create(
        model="deepseek-v3.2",
        messages=messages,
    )

    reply = response.choices[0].message.content
    print(f"Tutor: {reply}")

    messages.append({"role": "assistant", "content": reply})

Run it

Here is a sample session using the tutor from Step 4. I started with Llama 3.3 70B and asked about transformers.

$ python tutor.py
AI Tutor is ready. Ask anything about LLMs, or type 'exit' to quit.

You: What is a transformer?
Tutor: Think of a transformer as a very fast librarian who can read every book in a library at once, not just one shelf at a time. Instead of reading word by word in order, it looks at every word in a sentence simultaneously and decides which other words are most important for understanding the current one. This "attention" mechanism is why modern LLMs can keep context over long paragraphs and translate languages so fluently.

You: Why is attention important?
Tutor: Attention is important because it lets the model focus on the right clues no matter how far apart they are in a sentence. In our librarian analogy, it is like being able to instantly pull three related books from different floors because you know they share a common idea. Without attention, the model would forget the beginning of a long paragraph by the time it reached the end.

You: exit

Wrap-up

You now have a working conversational tutor that demonstrates how system prompts and message history control an LLM. Two concrete next steps: add streaming responses by setting stream=True in the create call so the tutor types back in real time, or experiment with JSON mode on Oxlo.ai by requesting a structured schema for quiz questions instead of plain text.