DEV Community

Cover image for Fine-tune your first large language model (LLM) with LoRA, llama.cpp, and KitOps in 5 easy steps
Rajat Raina for KitOps

Posted on with Gorkem Ercan • Edited on • Originally published at jozu.com

Fine-tune your first large language model (LLM) with LoRA, llama.cpp, and KitOps in 5 easy steps

Getting started with LLMs can be intimidating. In this tutorial we will show you how to fine-tune a large language model using LoRA, facilitated by tools like llama.cpp and KitOps.

LoRA, or Low-Rank Adaptation, is a technique for efficiently adapting pre-trained models with minimal computational overhead. We will walk you through the setup, adaptation process, and deployment using easy-to-follow steps.

0. Prerequisites

Before we get started, you’ll need to install Kit CLI and download an executable from Llama.

Kit CLI installed on your system–Installation Guide
finetune executable from the llama.cpp project–Download here

1. Environment setup

In step 1, we will, set up your environment

Create a project directory:
mkdir /lora_finetuning

Log in to the GitHub Container Registry using the Kit CLI. This will require your GitHub username and password. Alternatively, you can use any other registry that is compatible with OCI artifacts:

kit login ghcr.io

2. Create your Kitfile and refer to your base model

In your project directory (/lora_finetuning), create a Kitfile with the following contents:

manifestVersion: "1.0"
package:
name: llama3 fine-tuned
version: 3.0.0
authors: ["Jozu AI"]
model:
name: llama3-8B-instruct-q4_0
path: ghcr.io/jozu-ai/llama3:8B-instruct-q4_0
description: Llama 3 8B instruct model
license: Apache 2.0
Enter fullscreen mode Exit fullscreen mode

Note: Notice how the path refers to another ModelKit? In this case ghcr.io/jozu-ai/llama3:8B-instruct-q4_0.

Let's create our initial ModelKit by invoking.

kit pack /lora_finetuning -t fine-tuning:untuned

Now let's unpack our ModelKit to our work folder. Notice that Kit CLI will also unpack the model that you have referenced to your work folder so that you can work with it.

kit unpack fine-tuning:untuned -d /lora_finetuning --overwrite

3. Create your Lora Adapter

It is recommended to use the same prompt format as the base model, but any text file can be used. Ideally, provide a common string to separate each example that isn't found in your training data. e.g., '<s>'

<s> Example one text here.
<s> Example two text here.
Run the fine-tuning command from your project directory:
cd /lora_finetuning

llama.cpp\finetune.exe --model-base ./llama3-8B-instruct-q4_0.gguf --train-data ./training-data.txt --threads 8 --sample-start "<s>" --lora-out lora_adapter.gguf
Enter fullscreen mode Exit fullscreen mode

4. Update the Kitfile with your LoRA adapter

Next, we’re going to update the Kitfile to include the LoRA adapter:

manifestVersion: "1.0"
package:
  name: llama3 fine-tuned
  version: 3.0.0
  authors: ["Jozu AI"]
model:
  name: llama3-8B-instruct-q4_0
  path: ghcr.io/jozu-ai/llama3:8B-instruct-q4_0
  description: Llama 3 8B instruct model
  license: Apache 2.0
  parts:
    - path: ./lora-adapter.gguf
      type: lora-adapter
datasets:
  - name: fine-tune-data
    path: ./training-data.txt
Enter fullscreen mode Exit fullscreen mode

We recommend including training data in your ModelKits as a best practice that enables further iterations of training to be easier, however, this is not required for production deployments.

5. Pack and ship your Adapter in a ModelKit

Package your the tuned model
kit pack /lora_finetuning -t fine-tuning:tuned

Tag your model for remote repository
kit tag fine-tuning:tuned ghcr.io/jozuuser/finetuned:latest

Push the ModelKit to a Remote Repository.
kit push ghcr.io/jozuuser/finetuned:latest

Conclusion

Congratulations, your fine-tuned model is now packaged and ready for deployment. You've successfully updated the Kitfile, packed the model, and pushed it to a remote repository, making it accessible for implementation in various applications.

If you found this tutorial informative, we encourage you to join the KitOps Discord server and Star the KitOps GitHub repo to support the project.

Top comments (4)

Collapse
 
matijasos profile image
Matija Sosic

nice stuff! In which case does it pay off to fine tune an LLM vs just using off-the-shelf solution?

Collapse
 
bmicklea profile image
Brad Micklea

I'm sure the author will have a richer answer but I think one of the main things is that fine tuning can result in a model that's more specialized. So taking a general purpose LLM and fine-tuning it with your company's internal data would allow it to outperform when asked customer questions that are specific to your product for example.

Collapse
 
nidahasan profile image
nida

Great question! Fine-tuning an LLM often pays off when you have specific requirements or niche data that off-the-shelf models don't handle well.

It allows you to tailor the model's behavior more closely to your specific needs, potentially leading to better performance on your tasks.

If general use cases are well-covered by existing models, sticking with those might be more cost-effective and less time-consuming.

Collapse
 
rrajat profile image
Rajat Raina

Hey @matijasos thanks for your query, Every domain or task has its own unique language patterns, terminologies, and contextual nuances, so if your LLM is not sufficiently pretrained on your domain or task then it's better to finetune a model than using an off the shelf solution.