Amal Shaji

Posted on Sep 22, 2023 • Originally published at amal.sh

Fine-tuning gpt: Building a better and cheaper assistant?

#gpt3 #chatgpt #openai #finetune

Fine-tuning lets you get more out of the models available through the API by providing:

Higher quality results than prompting

Ability to train on more examples than can fit in a prompt

Token savings due to shorter prompts

Lower latency requests

In this post, we will look at how to build an assistant with and without fine-tuning in this experiment and compare the results.

Code

https://github.com/amalshaji/finetune-gpt

Use-case

Let's build an assistant capable of generating template messages for engaging with our users. The assistant will generate messages based on the required tone and use pre-defined template tags wherever necessary.

e.g.:

Happy Birthday {{ first_name }}! We're thrilled to celebrate this special day with you. As a valued customer, your happiness is our top priority. Enjoy this day to the fullest, and here's to another year of great experiences with us. Cheers to you on your birthday!

Assume the following tags are valid: first_name, email, last_name

Without fine-tuning

Generate a message in less than 50 words using the following parameters:
    occasion: {occasion}
    tone: {tone}

Use the following template tags as placeholders wherever necessary
| tags | description |
|---|---|
| {{ first_name }} | User's first name |
| {{ last_name }} | User's last name |
| {{ email }} | User's email |

The program will set the occasion and tone before passing to the OpenAI endpoint.

import asyncio
import openai
import os
from string import Template

openai.api_key = os.getenv("OPENAI_API_KEY")

prompt = Template(
    """
Generate a message in less than 50 words using the following parameters:
    occasion: $occasion
    tone: $tone

Use the following template tags as placeholders wherever necessary
| tags | description |
|---|---|
| {{ first_name }} | User's first name |
| {{ last_name }} | User's last name |
| {{ email }} | User's email |
"""
)


async def run(occasion: str, tone: str):
    response = await openai.ChatCompletion.acreate(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": prompt.substitute({"occasion": occasion, "tone": tone}),
            }
        ],
    )
    print(response)


asyncio.run(run(occasion="birthday", tone="texas accent"))

The prompt is set to generate a birthday message in a Texas accent
Chat completion API used for accessing the gpt3.5 model
Using gpt-3.5 as fine-tuning does not support the gpt-4 model

Output

{
  "id": "chatcmpl-817dhqQoNvTmBHlDMKDkiQIYezgRn",
  "object": "chat.completion",
  "created": 1695278385,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Happy birthday, {{ first_name }}! Hope y'all have a rootin' tootin' day filled with joy and good ol' Texas charm. Yeehaw! \ud83e\udd20"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 99,
    "completion_tokens": 39,
    "total_tokens": 138
  }
}

Time to complete: ~3s
Total cost for this execution: (99/1000 x 0.0015 + 39/1000x0.002) = $0.0002265.

Fine-tuning

Preparing the dataset

OpenAI fine-tuning requires a dataset of at least ten examples to work. It accepts the input as a jsonl (json lines) file, where each line is a valid json. e.g.:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."},{"role": "user", "content": "What's the largest planet in our solar system?"},{"role": "assistant", "content": "Jupiter, but don't worry, it's not like it's taking up half the solar system or anything."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."},{"role": "user", "content": "Who discovered penicillin?"},{"role": "assistant", "content": "Alexander Fleming. But don't get too excited; it's just one of the most important discoveries in medical history."}]}

Here is the training data we will use to fine-tune our model.

Create a new file

async def create_training_file():
    file_create_output = await openai.File.acreate(
        file=open("data.jsonl", "rb"), purpose="fine-tune"
    )
    print(f"Created training file: {file_create_output}")
    return file_create_output

The response should contain the file id and status. If the status is not processed, keep checking the status manually await openai.File.aretrieve(file_id)

Fine-tune

We can schedule a fine-tuning job with OpenAI using the uploaded file ID.


async def start_finetune_job(training_file: str):
    ft_job = await openai.FineTuningJob.acreate(
        training_file=training_file, model="gpt-3.5-turbo"
    )
    print(f"Created fine tuning job: {ft_job}")
    return ft_job

Navigate to https://platform.openai.com/finetune to see the job status. Once complete, you'll see the relevant metrics, including the tokens used for training.

Tokens used for training: 12,350
Total cost: (12350 / 1000 x 0.0080) = $0.0988.

Working with the new model

We run something similar to the non-fine-tuned version, replacing the model with our custom model.

async def run(occasion: str, tone: str):
    start = timeit.default_timer()
    response = await openai.ChatCompletion.acreate(
        model="ft:gpt-3.5-turbo-0613:personal::819PHd1U",
        messages=[
            {
                "role": "system",
                "content": prompt,
            },
            {
                "role": "user",
                "content": f"occasion: {occasion}, tone: {tone}",
            },
        ],
    )
    print(response)
    print(f"Time taken: {timeit.default_timer() - start}s")


asyncio.run(run(occasion="birthday", tone="informal"))

imports and other stuff are the same as the non-fine-tuned version

{
  "id": "chatcmpl-81AzImR0c0f9h3Kud3sHeHo8rmRyl",
  "object": "chat.completion",
  "created": 1695291256,
  "model": "ft:gpt-3.5-turbo-0613:personal::819PHd1U",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Happy birthday, {{ first_name }}! Hope your special day is filled with lots of cake, presents, and non-stop fun."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 96,
    "completion_tokens": 26,
    "total_tokens": 122
  }
}

There's a slight reduction in the number of tokens, and the total cost is around $0.000196.

Comparison

	non fine-tuned	fine-tuned
Cost	If you need to pass examples in the system commands, it will incur a higher cost	cheaper, as the model can be fine-tuned on the examples
Availability	Always available	After training, the API raised a lot of server overload exceptions before giving the correct response.
The inference speed	is slow; on average, it takes ~3 seconds	faster, and most queries are finished in less than a second. Some outliers took more than 10 seconds.
Ease of use	Straightforward, write prompt and query	Prepare training and validation data, create files, start training, and wait for training to finish (Obviously, you can automate all these).
Limitations	No known limitations (Using this as the base for comparison)	gpt-4, function calling not available (later this year)

Conclusion

In this experiment, we improved the output of our prompt by fine-tuning gpt-3.5 on a small dataset. You must pass examples with each prompt in the non-fine-tuned version to achieve the same. Imagine the cost of doing this for 100,000-1,000,000 API calls. Although this experiment did not show any difference in results(as the task was easy for gpt), a complex task would've benefited from fine-tuning.

Fine-tuning GPT-3 models can significantly improve their performance on complex tasks, making them ideal for building low-cost, highly efficient assistants. While the process of fine-tuning requires some additional effort, the benefits in terms of cost and speed make it a worthwhile investment. As support for GPT-4 and function call is added, the potential applications of fine-tuned models will only continue to expand. However, it's essential to carefully consider whether fine-tuning is the right approach for your specific use case. By researching and weighing the pros and cons, you can decide whether fine-tuning is the best way to achieve your goals.

Forem

Fine-tuning gpt: Building a better and cheaper assistant?

Code

Use-case

Without fine-tuning

Output

Fine-tuning

Preparing the dataset

Create a new file

Fine-tune

Working with the new model

Comparison

Conclusion

References

Top comments (0)

Read next

How AI is Shaping Smarter Games and Simulated Worlds

Building a Voice Transcription and Translation App with OpenAI Whisper and Streamlit

How to Create a Conversational AI Voice Agent with OpenAI Realtime API A Step-by-Step Guide for Next JS 15

Network Security, CDN Technologies and Performance Optimization