Andres Urdaneta

Posted on Feb 24, 2025

Classify & Label Customer Service Chats with Python, OpenAI and Langchain

#ai #webdev #langchain #python

LLMs with Structured Outputs make classifying and labeling text incredibly easy. Today, we’ll write a script to do exactly that—classifying a customer service chat.

We’ll extract and label key details from the chat, including the user’s name, sentiment, language, and the topic of their conversation with a customer service representative.

Before jumping into the code, let’s review the tools and requirements:

Python installed locally
OpenAI API Key
A text file containing the conversation between the representative and the client

Create The Project Directory

Now, let’s go ahead and create our project directory and cd into it.

mkdir text_classification && cd text_classification

Let’s make sure we’ve got our tools and project prepped. So let’s create the entry point file of the project within the root directory

touch main.py

Now, create a virtual environment to keep our dependencies contained so they don’t mess with system-wide Python packages, and then activate it.

python -m venv .venv
source .venv/bin/activate

Installing Dependencies

With our virtual environment in place, let’s go ahead and install our dependencies. I’m using pipso I’ll run the following command in my project terminal

pydantic, a widely used Python validation library. It will let us declare the schema of the structured output response we expect from the LLM.
langchain , is a framework and package that makes it easier to work with LLMs in Python.
langchain-openai, a Langchain package that provides seamless integration with OpenAI’s models.

I’m using pip to install these dependencies so I’ll run the following command in my project terminal:

pip install pydantic langchain langchain-openai

ENVs & Chat Text File

Great! Now that we've installed our dependencies, let's create a .env file to store our OpenAI API key.

OPENAI_API_KEY=sk….

And before finally getting our hands dirty, let’s not forget about the chat data!

I exported a conversation between a customer service representative and a customer from WhatsApp into a text file.

You can use my sample chat file or bring your own.

I’ll drop the text file in the project’s root directory, so we can later pass the contents of that file to the LLM.

Getting Started

Now that everything’s set up, it’s time to crack open main.py and get coding.

First things first, let’s make sure the OpenAI API key is actually set—otherwise we won’t get LLM responses.

import os

if not os.getenv("OPENAI_API_KEY"):
   raise ValueError("OPENAI_API_KEY is not set")

Then, let’s run in our terminal the following command:

python main.py

If no errors pop up, you're good to go.

But in case you get an error, try closing and re-opening your terminal. A new terminal will load your variables in the .env file.

Next, let’s import the other modules we’re going to need for this script

from pydantic import BaseModel, Field
from langchain.chat_models import init_chat_model

The Classification Schema

Now let’s see the meat and potatoes of this classification and labeling script.

We’ll use Pydantic to define a Classification class that serves as the schema we’ll pass later on to the LLM so it knows what information to extract and label from the chat.

class Classification(BaseModel):
   name: str = Field(description=“The name of the user”)
   sentiment: str = Field(
       description="The sentiment of the user",
       enum=["positive", "negative", "neutral"],
     )
   language: str = Field(
       description="The language of the user",
       enum=["spanish", "english"],
   )
   issue: str = Field(
       description="The issue of the user",
       enum=["technical", "billing", "account", "other"],
   )

As you can see, the schema’s name field only has a description attribute that specifies to get the client’s name. However, the other fields also include an enum.

The reason for adding enum to some fields is to make sure the model classifies only within predefined categories, reducing ambiguity and making the data easier to store and analyze.

Structured Outputs

Now, let’s create a chat model. We’ll use the with_structured_output to pass our Classification schema.

Under the hood, Langchain’s with_structured_output method makes sure the LLM has Structured Output enabled.

llm = init_chat_model("gpt-4o-mini", model_provider="openai").with_structured_output(Classification)

Next, we’ll save the chat contents from chat.txt and create a full prompt for the LLM.

with open("chat.txt", "r") as f:
   chat_text = f.read()

prompt = """Extract the desired information from the following chat.
Only extract the properties mentioned in the 'Classification' function.
Conversation:\n"""

Finally, let’s invoke the LLM with our prompt and chat text contents and print the results

response = llm.invoke(prompt + chat_text)
print(response)

In our terminal, let’s run the script

> python main.py
name='Andres Urdaneta' sentiment='neutral' language='spanish' issue='other'

Wrap-up & Next Steps

And there you have it—a fully functional script that classifies and labels customer service chats using Python, OpenAI API, and Langchain!

With just a few lines of code, we structured an unorganized conversation into clear, actionable data.

This setup can serve as the foundation for automating customer insights, building smarter chatbots, or even integrating AI-driven analytics into your workflow.

Try tweaking the classification schema, adding more categories, or even chaining multiple prompts together, and have fun!

If you have any questions or would like to connect, hit me up on X or LinkedIn

Read this article on my website

DEV Community