Shrestha Pandey

Posted on Jun 7

I Fine-Tuned Llama 3.2 on My Own Writing Style Using LoRA, Unsloth, and a Free Colab GPU

#vickybytes #ollama #ai #llm

For a long time, fine-tuning language models felt like something that multiple people talked about, but very few actually tried themselves.

Whenever I saw posts about fine-tuning, I saw things like massive datasets, GPU setups, and much more, which made me realise it’s out of reach. But it got added to my list of things to explore.

Recently, while browsing Creator Labs on VickyBytes, I came across one of those labs which sounded simple yet practical to someone like me, who wants to explore. It was: taking a small language model and fine-tune it on your own writing style. The main goal here was, Could a model learn to write more like me?

As someone who spends a lot of time creating technical content, that question caught my attention.

I assumed fine-tuning still needed expensive hardware, large datasets, and a significant amount of machine learning knowledge. Instead, I tried to lower the barrier and discovered tools like using Google Colab notebook, LoRA, and a small dataset built from my own content.

This article documents the entire process from start to finish, including environment setup, dataset preparation, LoRA fine-tuning, evaluation, GGUF conversion, and the lessons I learned along the way.

What Does "Training on Your Writing Style" Actually Mean?

A language model doesn’t understand who you are. It learns the statistical patterns present in your writing.

For example:

Do you prefer short or long paragraphs?
Do you use analogies?
Do you use emojis in your writing?
Do you end posts with questions?
Do you write formally or conversationally?

When enough examples are provided, the model begins reproducing those patterns.

Thus, fine-tuning on writing style is less about teaching a model who you are and more about teaching it how you tend to communicate.

Architecture Overview

Choosing a Model

The lab suggested working with models between 1B and 7B parameters. Initially, I considered using one of the newer Qwen models. However, after exploring the available Unsloth notebooks, I decided to use:

Llama 3.2 3B Instruct

Three reasons for this decision were:

First, the model was small enough to fine-tune comfortably on free Colab resources. Second, Unsloth provides a mature training notebook for Llama models. Third, the resulting model can easily be exported to GGUF and run locally through Ollama.

At this point, I wanted a model that could actually learn from a relatively small dataset and allow me to complete the entire worlflow on consumer-grade hardware.

Setting Up the Environment

I wanted the entire project to work on free resources. So instead of renting a GPU or using paid cloud infrastructure, I used Google Colab because it provides access to a free NVIDIA T4 GPU, which is sufficient for LoRA fine-tuning small language models such as Llama 3.2 3B.

The first step was GPU acceleration.

From the Colab runtime settings, I selected a GPU runtime as T4 GPU.

Once the runtime was ready, I opened the official Unsloth notebook for Llama 3.2 and executed the setup cells.

Unsloth handles much of the optimization automatically, which means there is very little configuration required from the user.

Building and Preparing the Dataset

This was the most important step in the entire project.

Most of us think focus on discussing models instead of discussing more about the data. I experienced that building the dataset was more challenging as the result depends on the data we provide.

Since the goal was to teach the model my writing style, I built the dataset using content I had already written over time, including LinkedIn posts, Instagram captions, technical explanations, and educational content. I focused on examples that showed how I naturally write and explain technical concepts.

One challenge I encountered was preparing the data in the format expected by the training pipeline. So, I converted each example into an instruction-response pair.

Example:

{
  "instruction":"Write a LinkedIn post about Kubernetes",
  "response":"Kubernetes is one of those technologies..."
}

The notebook expected a JSONL dataset, where each example is stored as a separate JSON object. Initially, I spent some time on dataset loading errors before realizing the issue was with the structure of the data. To simplify, I stored the examples as a Python string inside the notebook and generated the dataset.jsonl file before loading it.

Once the dataset was formatted correctly, it loaded successfully and became the foundation for the rest of the fine-tuning process. It looked simple but it was one of the most important parts of the entire project because the model can only learn the patterns that exist in the data it receives.

Loading the Dataset

After correcting the JSONL structure, loading the dataset became simple.

from datasets import load_dataset

dataset = load_dataset(
"json",
data_files="dataset.jsonl",
split="train"
)

To verify everything loaded correctly:

print("Rows:",len(dataset))

Preparing the Dataset for Llama 3.2

The raw dataset still wasn’t ready for training because Llama expeects conversational data rather than instruction-response pairs.

To solve this, I used Unsloth’s chat template utilities.

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(

tokenizer,

chat_template="llama-3.1"

)

This converts the dataset into the same format used by Llama during instruction tuning.

A single training example now looks something like this:

<|start_header_id|>user
Write a LinkedIn post about Docker

<|start_header_id|>assistant
Docker revolutionized...

At this stage, the model finally has data in a format it understands.

Fine-Tuning with Unsloth

Once the dataset was completely ready, the training process became surprisingly easy.

Using Unsloth’s SFTrainer, I configured LoRA training and launched the run.

trainer.train()

During training, the model repeatedly sees examples from the dataset and adjusts the LoRA adapter weights to better reproduce the expected responses. The original Llama weights remain frozen throughout the process.

Did the Model Actually Learn My Writing Style?

Now, after training, the main question was whether the model learned anything useful. To test this, I intentionally used prompts that were not included in the dataset.

Examples included:

Prompt:
Explain LLMs in short

Generated Output:
Most AI today runs on a small change. Instead of building LLM itself, you give an AI model large amounts of text data and tell it to only remember things it decides on its own. This creates a system where the AI learns by responding to other people’s queries, and where it has to think for itself in order to provide an accurate answer. That’s a major pattern behind the AI today. Instead of fine-tuning a model that thinks like a dictionary and a research paper in one, you fine-tune a model that thinks like a person. And instead of giving it years of training data, you give it hundreds of thousands of your own pages that it will decide what to keep in memory on its own. That is the system most AI use today, and it has worked well for a lot of things.

The output were not identical to my writing. That would actually be a bad sign because it would suggest memorization. The generated posts were more conversational, and more focused on helping readers understand concepts rather than simply describing them.

That was the behavior I expected to achieve.

Avoiding Overfitting

One challenge with small datasets is overfitting.

With only twenty examples, there is always a risk that the model memorizes responses instead of learning general writing patterns.

To reduce that risk, I intentionally included multiple content formats and multiple technical topics. Using diverse topics helped push the model toward learning style patterns instead of topic-specific answers.

Exporting the Model to GGUF

After training completed, I expected to see a complete model. But LoRA only trains adapters. Those adapters must be merged with the original model before deployment.

The workflow looks like this:

Using Unsloth’s export utilities, I generated a GGUF version of the model using Q4_K_M quantization. This format is widely used by tools such as Ollama and llama.cpp.

Running the Model Locally with Ollama

One of the coolest part was realizing that the model no longer needed Colab. Once exported as GGUF, it could run locally.

Using Ollama, the fine-tuned model can be loaded directly from a laptop without relying on external APIs.

This means the writing-style model becomes self-hosted. The same personalized behavior learned during fine-tuning can now be accessed locally whenever needed.

Ethical Considerations

Since this project involved training on personal content, it’s worth discussing about the privacy. I only used the content that I personally wrote and was comfortable using for experimentation.

I avoided using private messages, confidential conversations, or any data that could create privacy concerns.

The ability to train models on personal data is powerful, but consent and security should always be considered before building such AI systems.

Lessons Learned

While working on this project, I learnt a few things.

The most challenging parts were:

Building a useful dataset
Correctly formatting JSONL files
Understanding chat templates
Understanding LoRA adapters
Evaluating whether the model genuinely learned anything

The actual training process ended up being the easiest step. Modern tools such as Unsloth have simplified fine-tuning workflows. So the actual bottleneck is data quality, not just the hardware.

Final Thoughts

Before starting this project, fine-tuning felt like something made only for machine learning engineers.

After completing it, I think every developer should try it once.

Something that surprised me was how strongly the dataset influenced the final results. Even with a small collection of examples, the model began reproducing many of the patterns that consistently appear in my writing.

And after spending several days building, debugging, training, evaluating, and exporting the model, I’m convinced that the quality of the examples matters a lot.

Would I use this model for real content creation? Probably not yet. The dataset is still too small, and the outputs occasionally differ from my writing style.

However, the experiment successfully demonstrated that modern fine-tuning tools have lowered the barrier significantly. Something that once felt like a machine learning research project can now be completed on a free Colab GPU over a week.

Resources

Fine-Tuning Framework: Unsloth
Model Notebooks: Unsloth Notebook
Running Models Locally: Ollama
GitHub Repository: Fine-tuning-LLM-to-write-in-your-own-style
Lab Inspiration: VickyBytes Creator Labs

DEV Community