DEV Community

David Mezzetti for NeuML

Posted on • Originally published at neuml.hashnode.dev

Distilling Knowledge into Tiny LLMs

Large Language Models (LLMs) are the magic behind AI. These massive billion and trillion parameter models have been shown to generalize well when trained on enough data.

A big problem is that they are hard to run and expensive. So many just call LLMs through APIs such as OpenAI or Claude. Additionally, in many instances, developers spend a lot of time with complex prompt logic hoping to cover all the edge cases and believe they need a model that's large enough to handle all the rules.

If you truly want control over your business processes, running a local model is a better choice. And the good news is that it doesn't have to be a giant and expensive multi-billion parameter model. We can finetune LLMs to handle our specific business logic, which helps us take control and limit prompt complexity.

This article will show how we can distill knowledge into tiny LLMs.

Install dependencies

Install txtai and all dependencies.

pip install txtai[pipeline-train] datasets
Enter fullscreen mode Exit fullscreen mode

The LLM

We'll use a 600M parameter Qwen3 model for this example. Our target task will be translating user requests into linux commands.

from txtai import LLM

llm = LLM("Qwen/Qwen3-0.6B")
Enter fullscreen mode Exit fullscreen mode

Let's try one with the base model as it is.

llm("""
Translate the following request into a linux command. Only print the command.

Find number of logged in users
""", maxlength=1024)
Enter fullscreen mode Exit fullscreen mode
ps -e
Enter fullscreen mode Exit fullscreen mode

As we can see, the model actually has a good understanding and at least prints a command. But in this case it's not correct. Let's get to fine-tuning!

Finetuning the LLM with knowledge

Yes, 600M parameters is small and we can't possibly expect it to do well with everything. But the good news is that we can distill knowledge into this tiny LLM and make it better. We'll use this linux commands dataset from the Hugging Face Hub. We'll also use this training pipeline from txtai.

First, we'll create the training dataset. We'll use the same prompt strategy from above.

"""
Translate the following request into a linux command. Only print the command.

{user request}
"""
Enter fullscreen mode Exit fullscreen mode
from datasets import load_dataset
from transformers import AutoTokenizer

# LLM path
path = "Qwen/Qwen3-0.6B"
tokenizer = AutoTokenizer.from_pretrained(path)

# Load the training dataset
dataset = load_dataset("mecha-org/linux-command-dataset", split="train")

def prompt(row):
    text = tokenizer.apply_chat_template([
        {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
        {"role": "user", "content": row["input"]},
        {"role": "assistant", "content": row["output"]}
    ], tokenize=False, enable_thinking=False)

    return {"text": text}

# Map to training prompts
train = dataset.map(prompt, remove_columns=["input", "output"])
Enter fullscreen mode Exit fullscreen mode
from txtai.pipeline import HFTrainer

# Load the training pipeline
trainer = HFTrainer()

# Train the model
# Set output_dir to save, trained in memory for this example
model = trainer(
    "Qwen/Qwen3-0.6B",
    train,
    task="language-generation",
    maxlength=512,
    bf16=True,
    per_device_train_batch_size=4,
    num_train_epochs=1,
    logging_steps=50,
)
Enter fullscreen mode Exit fullscreen mode
from txtai import LLM

llm = LLM(model)

llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "Find number of logged in users"}
])
Enter fullscreen mode Exit fullscreen mode
who | wc -l
Enter fullscreen mode Exit fullscreen mode
llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "List the files in my home directory"}
])
Enter fullscreen mode Exit fullscreen mode
ls ~/
Enter fullscreen mode Exit fullscreen mode
llm([
    {"role": "system", "content": "Translate the following request into a linux command. Only print the command."},
    {"role": "user", "content": "Zip the data directory with all it's contents"}
])
Enter fullscreen mode Exit fullscreen mode
zip -r data.zip data
Enter fullscreen mode Exit fullscreen mode

It even works well without the system prompt.

llm("Calculate the total amount of disk space used for my home directory. Only print the total.")
Enter fullscreen mode Exit fullscreen mode
du -sh ~
Enter fullscreen mode Exit fullscreen mode

Wrapping up

This article demonstrated how it's very straightforward to distill knowledge into LLMs with txtai. Don't always go for the giant LLM, spend a little time finetuning a tiny LLM, it is well worth it!

Top comments (0)