DEV Community

Jun Yamog
Jun Yamog

Posted on

Fine-tuning LLAMA 3 for Text Classification with Limited Resources

I recently needed to classify sentences for a particular use case at work. Remembering Jeremy Howard's Lesson 4: Getting started with NLP for absolute beginners, I first adapted his notebook to fine-tune DEBERTA.

It worked, but not to my satisfaction, so I was curious what would happen if I used a LLM like LLAMA 3. The problem? Limited GPU resources. I only had access to a Tesla/Nvidia T4 instance.

Research led me to QLORA. This tutorial on Fine tuning LLama 3 LLM for Text Classification of Stock Sentiment using QLoRA was particularly useful. To better understand the tutorial, I adapted Lesson 4 into the QLORA tutorial notebook.

QLORA uses two main techniques:

  1. Quantization: Reduces model precision, making it smaller.
  2. LORA (Low-Rank Adaptation): Adds small, trainable layers instead of fine-tuning the whole model.

This allowed me to train LLAMA 3 8B on a 16GB VRAM T4, using about 12GB of VRAM. The results were surprisingly good, with prediction accuracy over 90%.

Confusion Matrix:
[[83  4]
[ 4  9]]
Classification Report:
              precision    recall  f1-score   support
         0.0       0.95      0.95      0.95        87
         1.0       0.69      0.69      0.69        13
    accuracy                           0.92       100
   macro avg       0.82      0.82      0.82       100
weighted avg       0.92      0.92      0.92       100
Balanced Accuracy Score: 0.8231653404067196
Accuracy Score: 0.92
Enter fullscreen mode Exit fullscreen mode

Here's the iPython notebook detailing the process.

This approach shows it's possible to work with large language models on limited hardware. Working with constraints often leads to creative problem-solving and learning opportunities. In this case, the limitations pushed me to explore and implement more efficient fine-tuning techniques.

Top comments (0)