Mateusz Kowalewski

Posted on Dec 19

Harnessing OpenAI Assistant 2.0 for Named Entity Recognition in PHP/Symfony 7

#llm #php #gpt3

Integrating large language models into real-world product backends is the latest battleground for innovation. But as with all tech trends, the real winners aren’t those who rush to know it all. Instead, it's about pausing, reflecting, and making smart decisions.

Even though AI is more accessible than ever, thinking it's a trivial task is a big mistake. The field is still in its infancy, and almost everyone in tech and business is trying to figure out how to make sense of it. The internet is overflowing with both reliable information and misleading hype.

Believe it or not, you don’t need to jump on every piece of AI gossip you hear. Take a step back and approach it thoughtfully.

If you’re a PHP developer or work with a PHP codebase, AI might feel like an alien concept. Classes, interfaces, message queues, and frameworks seem worlds apart from NLP, fine-tuning, stochastic gradient descents, LoRA, RAG, and all that jargon. I get it. To systematically learn these concepts, like anything in software engineering, we need time and good practice.

But isn’t AI, machine learning, and data science the domain of Python or R programmers? You’re partially right. Most foundational machine learning is done in Python. And honestly, Python is a joy to learn—it’s versatile and can be applied to countless interesting projects. From building and training a language model to deploying it via an API, Python has you covered.

But let’s get back to PHP.

Don’t worry—you’re not about to become obsolete as a regular software engineer. Quite the opposite. As we’re increasingly expected to understand related fields like containerization, CI/CD, systems engineering, and cloud infrastructure, AI is quickly becoming another essential skill in our toolkit. There’s no better time than now to start learning.

That said, I don’t recommend diving headfirst into building neural networks from scratch (unless you really want to). It’s easy to get overwhelmed. Instead, let me show you a practical starting point for your AI experiments.

What Can You Expect?

Here’s what we’ll cover in this guide:

Asynchronous AI Workflows
I’ll show you how to implement an AI workflow using a message queue—whether it’s RabbitMQ, Amazon SQS, or your preferred broker.

Production-Ready Solution
You’ll see a real example of the solution deployed in a production system that solves basic business requirement with AI

Symfony Integration
The solution is fully implemented within the Symfony PHP framework.

OpenAI Tools
We’ll use the OpenAI PHP library and the latest OpenAI Assistants 2.0.

What Topics Will We Cover?

Dataset Structure
How to build a dataset for training and evaluating your AI model.
Fine-Tuning OpenAI Models
Learn to prepare a proper .jsonl file and fine-tune the GPT-4o-mini model or another one from GPT family for your specific business use case.
Creating and Testing an OpenAI Assistant 2.0
Understand how to set up an OpenAI Assistant and test it in the OpenAI Playground.
Knowledge Bases
Dive into the concept of knowledge bases: why GPT doesn’t know everything and how to supply it with the right context to significantly boost accuracy.
PHP&Symfony Integration
Learn how to seamlessly connect your AI agent with your Symfony application.

Interested? Let's roll.

Prerequisites for OpenAI + PHP Projects

OpenAI Account You’ll need an account on openai.com.
Optional: Weights & Biases Account Setting up an account on wandb.ai is optional but highly recommended. It’s an excellent tool for tracking AI experiments and visualizing results.

Defining the Problem

Let’s dive into the problem we’re solving.
At its core, we’re dealing with a block of text that represents something—an address, in our case. The goal is to classify its components into predefined groups.

So, when a user sends us an address, we aim to return a JSON structure that segments the address into its parts. For example:

{
  "street": "Strada Grui",
  "block": "Bloc 7",
  "staircase": "Scara A",
  "apartment": "Apartament 6",
  "city": "Zărnești"
}

Why Does This Matter?

Well, these addresses are typed by people—our customers—and they’re often inconsistent. Here’s why we need structured, parsed data:

Consistent Formatting Ensures addresses follow a standard format for seamless processing.
Address Validation (Optional) Allows us to verify if the address is valid or exists.

To achieve this, we dissect the address into predefined groups like "street," "house," "postcode," etc., and then reassemble it into the desired sequence.

Why Not Use Regular Expressions?

At first glance, it sounds straightforward. If we enforce formatting for new customers or know they typically write addresses in a specific way, regular expressions might seem like a reasonable solution.

However, let’s consider examples of Romanian addresses:

STRADA GRUI, BLOC 7, SCARA A, APARTAMENT 6, ZĂRNEŞTI

BD. 21 DECEMBRIE 1989 nr. 93 ap. 50, CLUJ, 400124

Romanian addresses are often complex, written in no particular order, and frequently omit elements like the postcode. Even the most sophisticated regular expressions struggle to handle such variability reliably.

This is where AI models like GPT-3.5 Turbo or GPT-4o-mini shine—they can manage inconsistencies and complexities far beyond what static rules like regex can handle.

A Smarter AI Workflow

Yes, we’re developing an AI workflow that significantly improves on the traditional approach.

Every machine learning project boils down to two essentials: data and model. And while models are important, data is far more critical.

Models come pre-packaged, tested, and can be swapped out to see which one performs better. But the true game-changer is the quality of the data we provide to the model.

The Role of Data in Machine Learning
Typically, we split our dataset into two or three parts:

Training Data: Used to teach the model what it should do.
Testing Data: Used to evaluate how well the model performs.

For this project, we want to split an address into its components—making this a classification task. To get good results with a large language model (LLM) like GPT, we need at least 100 examples in our training dataset.

Structuring the Dataset

The raw format of our dataset doesn’t matter much, but I’ve chosen a structure that’s intuitive and easy to use:

INPUT: <what goes into the LLM>
OUTPUT: <what the LLM should produce>

Here’s an example:

input: STRADA EREMIA GRIGORESCU, NR.11 BL.45B,SC.B NR 11/38, 107065 PLOIESTI
output: {{"street": "Eremia Grigorescu", "house_number": "11", "flat_number": "38", "block": "45B", "staircase": "45B", floor: "", "apartment": "", "landmark": "", "postcode": "107065", "county": "Prahova", 'commune': '', 'village': '', "city" "Ploiesti"}}

As you can see, the goal is to produce a structured JSON response based on the input address.

Setting Up Your OpenAI API Account

First, you’ll need an OpenAI API account. It’s a straightforward process, and I recommend adding some initial funds—$10 or $20 is plenty to get started. OpenAI uses a convenient pre-paid subscription model, so you’ll have full control over your spending.

You can manage your account billing here:

Exploring the OpenAI Playground

Once your account is set up, head over to the OpenAI Playground.

Take a moment to familiarize yourself with the interface. When you’re ready, look for the "Assistants" section in the menu on the left.

This is where we’ll create our custom GPT instance.

Customizing Your GPT Assistant

We’ll tailor our GPT assistant to meet our specific needs in two stages:

Detailed System Instructions with Examples
This step helps us quickly test and validate whether the solution works. It’s the fastest way to see results.
Fine-Tuning with Knowledge Base (simple RAG)
Once we’re satisfied with the initial results, we’ll fine-tune the model. This process reduces the need to provide extensive examples in every prompt, which, in turn, decreases inference time (how long it takes for the model to respond via the API).

Both steps combined give us the most accurate and efficient results.

So, let's begin.

Designing a System Prompt for Named Entity Recognition

For our Named Entity Recognition system, we need the model to output structured JSON responses consistently. Crafting a thoughtful system prompt is critical to achieving this.

Here’s what we’ll focus on:

Structuring the initial "dirty" system prompt to produce clear results.
Refining the prompt later to improve efficiency and minimize costs.

Starting with the Few-Shot Technique

The "few-shot" technique is a powerful approach for this task. It works by providing the model with a few examples of the desired input-output relationship. From these examples, the model can generalize and handle inputs it hasn’t seen before.

Key Elements of a Few-Shot Prompt:

The prompt consists of several parts, which should be thoughtfully structured for best results. While the exact order can vary, the following sections are essential to include:

1. Clear Objective

The first part of the prompt provides a general overview of what we expect the model to accomplish.

For example, in this use case, the input is a Romanian address, which may include spelling mistakes and incorrect formatting. This context is essential as it sets the stage for the model, explaining the kind of data it will process.

2. Formatting instruction

After defining the task, we guide the AI on how to format its output.

Here, the JSON structure is explained in detail, including how the model should derive each key-value pair from the input. Examples are included to clarify expectations. Additionally, any special characters are properly escaped to ensure valid JSON.

3. Examples

Examples are the backbone of the "few-shot" technique. The more relevant examples we provide, the better the model’s performance.

Thanks to GPT’s extensive context window (up to 16K tokens), we can include a large number of examples.

To build the example set:

Start with a basic prompt and evaluate the model’s output manually.
If the output contains errors, correct them and add the fixed version to the example set.
This iterative process improves the assistant’s performance over time.

4. Corrective information

It’s inevitable—your assistant will make mistakes. Sometimes, it might repeatedly make the same error.

To address this, include clear and polite corrective information in your prompt. Clearly explain:

What mistakes the assistant is making.
What you expect from the output instead.
This feedback helps the model adjust its behavior and generate better results.

Configuring and Testing Your Assistant

Now that we’ve crafted the initial prompt for our proof-of-concept few-shot solution, it’s time to configure and test the Assistant.

Step1

Head over to the OpenAI dashboard
From the menu on the left, select "Assistants"
You’ll be redirected to the Assistant creation page. Go back to OpenAI dashboard and from menu on the left, choose "Assistants". You will be redirected to the Assistant creator.

Step 2: Configure Your Assistant

Give Your Assistant a Name: Choose a descriptive name for your Assistant that reflects its purpose (e.g., "Address Parser").
Paste the System Instructions: Copy your entire crafted prompt and paste it into the System Instructions input box. This defines your Assistant’s behavior and guides its outputs.
Save Your Assistant: Once you’ve pasted the prompt, click Save to store the configuration.

Step 3: Choose the Right Model

For this project, I chose GPT-4o-mini because:

It’s cheaper than GPT-3.5 Turbo
It's more accurate :)

That said, you should always select your model based on a balance between accuracy and cost. Run or search for benchmarks to determine what works best for your specific task.

Defining the Output Schema

With the initial configuration in place, we can now specify the output schema directly in the Assistant creator.

Step 1: Using the "Generate" Option
Instead of creating the schema manually, I used the "Generate" option provided by the Assistant creator. Here’s how:

Take the JSON output structure from your prompt (from one of the examples)
Paste it into the "Generate" field.

The tool does a great job of automatically generating a schema that matches your structure.

--
To ensure consistent and predictable responses, set the temperature as low as possible.

What is Temperature?

Temperature controls the randomness of the model’s responses. A low temperature value means the model produces more predictable and deterministic outputs.

For our use case, this is exactly what we want. When given the same address as input, the model should always return the same, correct response. Consistency is key for reliable results.

Testing the Assistant

With all the parameters in place, head to the Playground to test your solution.

The Playground provides a console where you can put your Assistant to work. This is where the fun begins—you can rigorously test your bare-bones solution to uncover:

Where the model is hallucinating (generating irrelevant or incorrect information).
Potential shortcomings in your prompt or schema.

These findings will help you refine the Corrective Information section of your prompt, making your Assistant more robust.

Below is an example of a result from one of my manual tests:

INPUT: BD. 21 DECEMBRIE 1989 nr. 93 ap. 50, CLUJ, 400124
OUTPUT:
{
  "street": "Bdul 21 Decembrie 1989",
  "house": "93",
  "flat": "50",
  "block": "",
  "staircase": "",
  "floor": "",
  "apartment": "",
  "landmark": "",
  "intercom": "",
  "postcode": "400124",
  "county": "Cluj",
  "commune": "",
  "village": "",
  "city": "Cluj"
}

Why Manual Testing Matters ?
Old-school, hands-on testing is the foundation of building a reliable solution. By evaluating the model’s outputs manually, you’ll quickly spot issues and understand how to fix them. While automation will come later, manual testing is an invaluable step in creating a solid proof-of-concept.

Plugging the Assistant into Your PHP Symfony Application

Now it’s time to integrate everything into your PHP Symfony application. The setup is straightforward and follows a classic asynchronous architecture.

Here’s a breakdown of the flow:

1.Frontend Interaction

In general, what we have here is a classic asynchronous setup with message queue. We have 2 instances of Symfony application running inside Docker container. The first one is interacting with the frontend client.

For example, when a customer types:

STR. IORGA NICOLAE nr. 20 et 1, CONSTANŢA, 900622

The application processes the input address and packages it into a Message object.
The Message object is then wrapped in a Symfony Messenger Envelope instance. The message is serialized into JSON format, with additional metadata for processing.

Symfony Messenger is perfect for handling asynchronous tasks. It allows us to offload time-consuming operations, like calling the OpenAI API, to background processes.
This approach ensures:

Responsiveness: The frontend remains fast and responsive.
Scalability: Tasks can be distributed across multiple workers.

Below is the message class used for our system:

<?php

declare(strict_types=1);

namespace App\Messenger\Message\Customer;

use Symfony\Component\Validator\Constraints as Assert;

final class CustomerAddressAddedMessage
{
    public function __construct(
        #[Assert\NotBlank]
        #[Assert\Type('int')]
        public readonly int $customerId,
        #[Assert\NotBlank]
        #[Assert\Type('string')]
        public readonly string $inputAddress,
    ) {
    }
}

The application connects to a RabbitMQ instance using the amqp PHP extension. To set this up, you need to define the transport and message binding in your messenger.yaml configuration file.

Refer to the official Symfony Messenger documentation for detailed guidance:
Symfony Messenger Transport Configuration

Documentation: https://symfony.com/doc/current/messenger.html#transport-configuration

Once the message is pushed to the broker (e.g., RabbitMQ, AmazonMQ, or AWS SQS), it’s picked up by the second instance of the application. This instance runs the messenger daemon to consume messages, as marked 3 in the architecture schema.

The consuming process is handled by running:

bin/console messenger:consume

is working as it's process.

The daemon picks up the message from the configured queue, deserializes it back into the corresponding Message class, and forwards it to the Message Handler for processing.

Here’s the Message Handler where the interaction with the OpenAI Assistant occurs:

<?php

declare(strict_types=1);

namespace App\Messenger\MessageHandler\Customer;

use App\Entity\CustomerAddressNormalized;
use App\Messenger\Message\Customer\CustomerAddressAddedMessage;
use App\Service\Address\Normalization\OpenAI\OpenAIAddressNormalizationService;
use Doctrine\ORM\EntityManagerInterface;
use Psr\Log\LoggerInterface;
use Symfony\Component\Messenger\Handler\MessageHandlerInterface;

final class CustomerAddressAddedHandler implements MessageHandlerInterface
{
    public function __construct(
        private readonly LoggerInterface $logger,
        private readonly EntityManagerInterface $entityManager,
        private readonly OpenAIAddressNormalizationService $addressNormalizationService,
    ) {
    }

    public function __invoke(CustomerAddressAddedMessage $customerAddressAddedMessage)
    {
        $this->logger->info(
            'Start processing "{message}", customerId:{customerId}.',
            [
                'message'    => \get_class($customerAddressAddedMessage),
                'customerId' => $customerAddressAddedMessage->customerId,
            ],
        );

        $normalizationString = $this->addressNormalizationService->createAddressNormalization($customerAddressAddedMessage->inputAddress);
        $addressNormalized   = new CustomerAddressNormalized(
            $customerAddressAddedMessage->customerId,
            $customerAddressAddedMessage->inputAddress,
            $normalizationString,
            false,
        );

        $this->entityManager->persist($addressNormalized);
        $this->entityManager->flush();
    }
}

Key Points of the Handler

Logging
Logs the start of processing for traceability and debugging.
Normalization Service
The OpenAIAddressNormalizationService is called to process the input address via the Assistant.
Persistence
The normalized address is saved in the database using Doctrine's EntityManager.

The message handler may seem like standard Symfony code, but the core is located in this line:

private readonly OpenAIAddressNormalizationService $addressNormalizationService,

This service is responsible for interacting with OpenAI via the designated PHP library.

OpenAI PHP Client

Let’s dive into the implementation of the service with the usage of PHP client.

Here’s the full implementation of the OpenAIAddressNormalizationService:

<?php

declare(strict_types=1);

namespace App\Service\Address\Normalization\OpenAI;

use OpenAI;
use OpenAI\Client;
use OpenAI\Responses\Threads\Runs\ThreadRunResponse;
use Psr\Log\LoggerInterface;
use RuntimeException;

class OpenAIAddressNormalizationService
{
    private Client $openAIClient;

    public function __construct(
        private readonly string $openAIApiKey,
        private readonly string $openAIOrganizationID,
        private readonly LoggerInterface $logger,
    ) {
        $this->openAIClient = OpenAI::client($this->openAIApiKey, $this->openAIOrganizationID);
    }

    public function createAddressNormalization(string $inputAddress): string
    {
        $assistant = $this->openAIClient->assistants()->retrieve('asst_VKQLz1xaIY5cFP8dlk2SPZkg');
        $threadRun = $this->openAIClient->threads()->createAndRun(
            [
                'assistant_id' => $assistant->id,
                'thread'       => [
                    'messages' => [
                        [
                            'role'    => 'user',
                            'content' => $inputAddress,
                        ],
                    ],
                ],
            ],
        );

        $normalizationResponse = $this->loadAnswer($threadRun);
        $this->logger->warning($normalizationResponse);

        return json_encode($normalizationResponse, JSON_THROW_ON_ERROR);
    }

    private function loadAnswer(ThreadRunResponse $threadRun): string
    {
        while (in_array($threadRun->status, ['queued', 'in_progress'])) {
            $threadRun = $this->openAIClient->threads()->runs()->retrieve(
                threadId: $threadRun->threadId,
                runId: $threadRun->id,
            );
        }

        if ('completed' !== $threadRun->status) {
            throw new RuntimeException('OpenAI request failed, please try again.');
        }

        $messageList = $this->openAIClient->threads()->messages()->list(threadId: $threadRun->threadId);

        return $messageList->data[0]->content[0]->text->value;
    }
}

The workflow for getting an inference (or completion)—essentially the response from GPT—using the openai-php/client library follows three main stages:

1. Initialization of the OpenAI Client and Assistant retrieval

The first step is setting up the OpenAI client and retrieving the configured Assistant:

$this->openAIClient = OpenAI::client($this->openAIApiKey, $this->openAIOrganizationID);

$assistant = $this->openAIClient->assistants()->retrieve('asst_VKQLz1xaIY5cFP8dlk2SPZkg');

Client Initialization: The OpenAI::client() method creates a new client using your API key and organization ID.
Assistant Retrieval: The retrieve() method connects to the specific Assistant instance configured for your task, identified by its unique ID.

1. Creating and Running a Thread

Once the client and Assistant are initialized, you can create and run a thread to begin the interaction. A thread acts as the communication pipeline, handling the exchange of messages between the user and the Assistant.

Here’s how the thread is initiated:

$threadRun = $this->openAIClient->threads()->createAndRun([
    'assistant_id' => $assistant->id,
    'thread'       => [
        'messages' => [
            [
                'role'    => 'user',
                'content' => $inputAddress,
            ],
        ],
    ],
]);

Assistant ID: The ID of the Assistant determines which model instance will process the input.
Messages: Each thread includes a sequence of messages. In this case, the message has Role which indicates whether the message is from the user (input) or system (response). And the second is Content - contains the actual input text (e.g., an address).

3. Handling the Thread Response

After initiating the thread, the application handles the response. Since OpenAI processes may not complete instantly, you need to poll the thread’s status until it’s marked as 'completed':

while (in_array($threadRun->status, ['queued', 'in_progress'])) {
    $threadRun = $this->openAIClient->threads()->runs()->retrieve(
        threadId: $threadRun->threadId,
        runId: $threadRun->id,
    );
}

Polling: The application repeatedly checks the thread’s status (queued or in_progress). Once the status changes to 'completed', the thread contains the final output.

Once the thread is complete, the application retrieves the response messages to extract the normalized address:

$messageList = $this->openAIClient->threads()->messages()->list(threadId: $threadRun->threadId);
return $messageList->data[0]->content[0]->text->value;

The normalized address can now be returned to the frontend client for immediate use or stored in the database as a structured entity, like CustomerAddressNormalized, for future processing.

When you run this setup, you should be able to extract and save structured output from OpenAI for Named Entity Recognition and other classification or generation tasks.

Make Assistant Blazing Accurate with Fine Tuning and Knowledge Base (Retrieval Augmented Generation)

In some cases, basic AI solutions aren’t enough. When regulatory compliance and business requirements demand high accuracy, we need to go the extra mile to ensure the output is factual and reliable.

For example, when generating a JSON structure, we need to guarantee that its contents align with reality. A common risk is the model hallucinating information—like inventing a place based on a provided postal code. This can cause serious issues, particularly in high-stakes environments.

Ground Truth with Knowledge Base

To eliminate hallucinations and ensure factual accuracy, we need to supply the Assistant with a ground truth knowledge base. This acts as the definitive reference for the model, ensuring it uses accurate information during inference.

My Approach: A Postal Code Knowledge Base
I created a simple (but quite large—around 12 MB) JSON file containing the full demarcation of all postal codes in Romania. This dictionary-like structure provides:

Postal Code: The input value.
Validated Information: Facts like the corresponding city, county, and overall place name.
This knowledge base serves as a reference point for the Assistant when performing Named Entity Recognition.

Structure of the Knowledge Base

Here’s an example snippet of the knowledge base JSON structure:

{
   "ro_places": [
    {
        "place_mark": "Râmnicu Vâlcea, Vâlcea, 240454, RO",
        "country_code": "RO",
        "place_name": "Râmnicu Vâlcea",
        "county": "Vâlcea",
        "postal_code": "240454",
        "admin_code": "39",
        "coordinates": {
            "lat": 45.1,
            "lon": 24.3667
        }
    },
    {
        "place_mark": "Râmnicu Vâlcea, Vâlcea, 240493, RO",
        "country_code": "RO",
        "place_name": "Râmnicu Vâlcea",
        "county": "Vâlcea",
        "postal_code": "240493",
        "admin_code": "39",
        "coordinates": {
            "lat": 45.1,
            "lon": 24.3667
        }
    }, 
..... #around 40 000 post codes
}

With the knowledge base ready, it’s time to instruct the Assistant to use it effectively. While you could embed this directly into the prompt, this approach increases token usage and costs. This makes it an excellent moment to introduce fine-tuning.

What is Fine-Tuning?

Fine-tuning involves modifying the outermost layers (specifically the weight matrices) of a pre-trained model, adapting it for a specific task. In our case, Named Entity Recognition (NER) is a perfect candidate.

A fine-tuned model:

Requires Smaller Prompts: Reducing the need for lengthy instructions or embedded examples.
Cuts Costs: Shorter prompts and fewer examples lower token consumption.
Improves Inference Time: Faster responses mean better real-time performance.

The main goal is to align the model more closely with the nuances of the real-world data it will process. Training on domain-specific data makes the model better at understanding and generating appropriate responses. It also reduces the need for post-processing or error correction by the application.

Preparing the Fine-Tuning Dataset

To fine-tune the model, we need a properly formatted dataset in .jsonl (JSON Lines) format, as required by OpenAI. Each entry in the dataset includes:

System Prompt: The initial instruction for the Assistant.
User Input: The raw input data (e.g., a messy address).
Expected Output: The desired structured response.

This dataset provides the Assistant with domain-specific examples, enabling it to learn how to respond to similar prompts in the future.

Here’s an example of how to structure a fine-tuning entry:

{
   "messages":[
      {
         "role":"system",
         "content":"You are a system that <does something>. Use the knowledge base provided for accurate results."
      },
      {
         "role":"user",
         "content":"TOOLS \n-------\n\n\n\nRESPONSE FORMAT INSTRUCTIONS\n--------------------\n\n\n--------------------\nUSER'S INPUT\n--------------------\n"
      },
      {
         "role":"assistant",
         "content":"{\"key\": \"value\"}"
      }
   ]
}

Let’s break down the structure of a fine-tuning entry in the .jsonl file, using our Romanian address processing case as an example:

Each entry is designed to simulate a real interaction between the user and the Assistant, encapsulating:

The Assistant’s purpose and scope.
The user’s input scenario.
The expected structured output.

This approach helps the model learn the desired behavior in real-world contexts.

1. System Role Message

Describes the assistant’s capabilities and the scope of its functionality, setting expectations for the types of entities it should recognize and extract, such as street names, house numbers, and postal codes.
Example: The system explains that the assistant is designed to function as an Named Entity Recognition model for Romanian addresses, detailing the components it should extract and classify.

2. User Role Message

Provides a detailed scenario or query in which the user gives a specific address input. This part of the data entry is crucial as it directly influences how the model will learn to respond to similar inputs in operational settings.

3. Assistant Role Message

Contains the expected response from the assistant, formatted in JSON. This response is crucial as it trains the model on the desired output format and precision.

Creating a Validation File for Fine-Tuning

Once you’ve created the training file, the next step is to prepare a validation file. This file evaluates the accuracy of the fine-tuned Assistant on real data. The structure is similar to the training file, which makes automating the creation of both files more straightforward.

The validation file is designed to test the model’s generalization capabilities. It ensures that the fine-tuned Assistant can handle new inputs and perform consistently, even when faced with unfamiliar or challenging examples.

Structure of the Validation File

System Message:
To maintain consistency with the training process, the system message should be identical to the one used in the training file.

User Message:
Introduces a new input (e.g., an address) that was not included in the training file. The new example should be realistic and domain-specific, as well as challenge the model by including minor errors or complexities.

Assistant Message:
Provides the expected structured response for the new user input.
This serves as the gold standard for validating the model’s accuracy.

To streamline the creation of both training and validation files, automation scripts can be used. These scripts generate properly formatted .jsonl files based on input datasets.

Visit the repository containing scripts to generate training and validation files:

Automated .jsonl generation scripts

It's a Python version of fine-tuning automation. PHP version is coming soon.

Why Validation Matters?

Accuracy Testing: Measures how well the fine-tuned model performs on unseen data.
Generalization: Validates the model’s ability to handle new, complex, or error-prone inputs.
Feedback Loop: Helps identify areas where additional fine-tuning or data is needed.

Fine-Tuning Your Assistant Manually

If you prefer, you can fine-tune the Assistant manually using OpenAI's graphical user interface (GUI). Once you’ve prepared both the training and validation files, follow these steps to get started:

Step 1: Access the Fine-Tuning Wizard

Go to the Dashboard and click on "Fine-Tuning" in the menu on the left-hand side.
Click the green "Create" button to open the fine-tuning menu.

Step 2: Configure Fine-Tuning

In the fine-tuning menu, update the following settings:

Base model: At the time of writing, the most cost-effective and high-performing base model is gpt-4o-mini-2024-07-18. Choose the base model that best aligns with your performance and budget requirements.
Training and Validation Data - Upload the training data and validation data files you created earlier.
Number of epochs - Set the number of epochs (iterations of training process over the entire dataset). I recommend starting with 3 epochs, but you can experiment with more epochs if your budget allows.

Step 3: Monitor the Fine-Tuning Process

Once the fine-tuning process starts, you can track its progress in the Fine-Tuning Dashboard:

The dashboard provides real-time updates and displays several metrics to help you monitor and evaluate the training process.

Key Metrics to Understand

Training loss Measures how well the model is fitting the training data. A lower training loss indicates the model is effectively learning patterns within the dataset.
Full Validation Loss Indicates performance on unseen data from the validation dataset. A lower validation loss suggests the model will generalize well to new inputs.
Steps and Time Training steps are number of iterations where model weights are updated based on batches of data.
The time stamp indicates when each step was processed, which helps to monitor the duration of training and intervals between evaluations.

Interpreting the Metrics

Monitoring below metrics helps you determine whether the fine-tuning process is converging correctly or requires adjustments.

Decreasing Loss: Both training and validation losses should ideally decrease over time, eventually stabilizing. Overfitting: Occurs when training loss keeps decreasing while validation loss starts to increase or fluctuate. This indicates the model is over-specialized to the training data and performs poorly on unseen data.

Underfitting: Happens when both losses remain high, showing the model isn’t learning effectively.

Plug in the Model & Evaluate

With your Assistant trained and fine-tuned, it’s time to integrate it into your application and start evaluating its performance.

Start Simple.

Begin with manual tests, either in the Assistants Playground or directly within your application. Avoid overcomplicating your evaluation at this stage; focus on ensuring the basics work as expected.

For example, you can use a simple tool like Google Sheets to manually compare the input and output, as shown here:

This is the first step.
AI and ML solutions require ongoing evaluation to ensure performance remains consistent. For tasks like Named Entity Recognition, automated tools such as the PromptFlow extension for Visual Studio Code can help streamline testing and validation.

References

OpenAI Fine-Tuning Documentation
Symfony Messenger Documentation
OpenAI PHP Client Library
PromptFlow Extension for VS Code

Thank You!

Thank you for taking the time to explore this guide! I hope it provides you with a solid foundation to build and fine-tune your AI-powered solutions. If you have any questions or suggestions, feel free to reach out or leave a comment.

Happy coding and best of luck with your AI projects! 🚀

DEV Community