Optimizing LLM with Few Shot

koshin takeuchi — Sat, 31 Aug 2024 10:45:48 +0000

Hello 👋
I'd like to introduce Few-Shot learning, which is one of the basic and powerful technique for large language model!

What is Few Shot

Few-Shot learning is a technique in prompt engineering that allows you to optimize the responses generated by large language models (LLM) for specific tasks by adding a few examples within the context.

In the first image, you can see that using Few-Shot has changed it from Zero-Shot, and that it has been optimized for specific tasks such as “color and food type” and “introduction to emoji”.

According to the paper “Language Models are Few-Shot Learners”¹, the larger the model, the more effective it is to use it. Incidentally, in this paper, the definition of Few-Shot learning is given as giving 10 to 100 shots to match the context window (prompt to be input to LLM) of GPT-3.

from "Language Models are Few-Shot Learners"

Difference from fine-tuning

When you hear the phrase “optimizing AI models”, you may think of fine-tuning. In fact, fine-tuning often achieves higher performance in benchmark tests.
However, it is necessary to make trade-offs with many conditions.

Fine-tuning requires several thousand to several hundred thousand labeled training data.
It requires a lot of money. When using a service like the fully managed OpenAI API, the price is often higher than for a normal request. Also, when using a custom AI model, fine-tuning requires a lot of GPU resources because it involves recalculating the weights of the neural network to suit the task.
The possibility of using out-of-distribution generalization or spurious features in the training data is higher than with Few-Shot learning.

The OpenAI article also recommends that if you want to perform optimization, you should first consider few-shot learning, and then consider fine-tuning.

from https://platform.openai.com/docs/guides/optimizing-llm-accuracy/llm-optimization-context

Summary

You was able to learn about how to optimize using Few-Shot learning and how to proceed when using it.
I hope this article will help you optimize your LLM.
Thank you for reading!

https://arxiv.org/abs/2005.14165 ↩

DEV Community: koshin takeuchi

Optimizing LLM with Few Shot

What is Few Shot

Difference from fine-tuning

Summary