David Mezzetti for NeuML

Posted on Jan 15 • Originally published at neuml.hashnode.dev

Distilling Knowledge into Tiny LLMs

#ai #llm #rag #vectordatabase

Large Language Models (LLMs) are the magic behind AI. These massive billion and trillion parameter models have been shown to generalize well when trained on enough data.

A big problem is that they are hard to run and expensive. So many just call LLMs through APIs such as OpenAI or Claude. Additionally, in many instances, developers spend a lot of time with complex prompt logic hoping to cover all the edge cases and believe they need a model that's large enough to handle all the rules.

If you truly want control over your business processes, running a local model is a better choice. And the good news is that it doesn't have to be a giant and expensive multi-billion parameter model. We can finetune LLMs to handle our specific business logic, which helps us take control and limit prompt complexity.

This article will show how we can distill knowledge into tiny LLMs.

Install dependencies

Install txtai and all dependencies.

pip install txtai[pipeline-train] datasets

The LLM

We'll use a 600M parameter Qwen3 model for this example. Our target task will be translating user requests into linux commands.

from txtai import LLM

llm = LLM("Qwen/Qwen3-0.6B")

Let's try one with the base model as it is.

llm("""
Translate the following request into a linux command. Only print the command.

Find number of logged in users
""", maxlength=1024)

ps -e

As we can see, the model actually has a good understanding and at least prints a command. But in this case it's not correct. Let's get to fine-tuning!

Finetuning the LLM with knowledge

Yes, 600M parameters is small and we can't possibly expect it to do well with everything. But the good news is that we can distill knowledge into this tiny LLM and make it better. We'll use this linux commands dataset from the Hugging Face Hub. We'll also use this training pipeline from txtai.

First, we'll create the training dataset. We'll use the same prompt strategy from above.

"""
Translate the following request into a linux command. Only print the command.

{user request}
"""

from datasets import load_dataset
from transformers import AutoTokenizer

# LLM path
path = "Qwen/Qwen3-0.6B"
tokenizer = AutoTokenizer.from_pretrained(path)

# Load the training dataset
dataset = load_dataset("mecha-org/linux-command-dataset", split="train")

def prompt(row):
    text = tokenizer.apply_chat_template([
        {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
        {"role": "user", "content": row["input"]},
        {"role": "assistant", "content": row["output"]}
    ], tokenize=False, enable_thinking=False)

    return {"text": text}

# Map to training prompts
train = dataset.map(prompt, remove_columns=["input", "output"])

from txtai.pipeline import HFTrainer

# Load the training pipeline
trainer = HFTrainer()

# Train the model
# Set output_dir to save, trained in memory for this example
model = trainer(
    "Qwen/Qwen3-0.6B",
    train,
    task="language-generation",
    maxlength=512,
    bf16=True,
    per_device_train_batch_size=4,
    num_train_epochs=1,
    logging_steps=50,
)

from txtai import LLM

llm = LLM(model)

llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "Find number of logged in users"}
])

who | wc -l

llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "List the files in my home directory"}
])

ls ~/

llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "Zip the data directory with all it's contents"}
])

zip -r data.zip data

It even works well without the system prompt.

llm("Calculate the total amount of disk space used for my home directory. Only print the total.")

du -sh ~

Wrapping up

This article demonstrated how it's very straightforward to distill knowledge into LLMs with txtai. Don't always go for the giant LLM, spend a little time finetuning a tiny LLM, it is well worth it!

DEV Community

Distilling Knowledge into Tiny LLMs

Install dependencies

The LLM

Finetuning the LLM with knowledge

Wrapping up

Top comments (0)