DEV Community: Daniele

Advanced prompting techniques: Code Interpreter

Daniele — Wed, 03 Jan 2024 08:00:00 +0000

My company operates in an industry where turnarounds are pretty quick. Sometimes, listings can be gone in a matter of minutes! It's in our best interest to optimize our listing process so that items can be listed on the marketplace very quickly.

In theory, this should be straightforward – just offer an intuitive flow to sellers, and voila! Well… That doesn't work for us, and here's why:

We work with wholesale quantities, not single items. This means listings can contain tens of thousands of different SKUs. The time it takes to upload a listing increases linearly with the number of items in the listing.
The industry in which we operate does not have specific standards or well established best practices. No two catalogs we receive are the same. Sometimes the same seller can give us two catalogs in two different formats.
Most of our sellers do not have the means or the time to convert their format to ours. This is an opportunity for us – our white glove approach is one of reasons sellers like to work with us.
Even if we had the perfect catalog in the perfect format, our taxonomy needs to align with the seller. Sometimes we receive items in the "Shoes" category, which we'll need to translate to "Footwear", which is our equivalent in our taxonomy. And we'll have to repeat this across hundreds of categories and tens of thousands of lines.

So far, our team has resorted to a combination of traditional prompting techniques and process automation to speed up our listing process. The results were promising but we didn't significantly increase our speed. Besides, these initial attempt was heavy on the code, which didn't make our solution suitable for people with limited coding skills.

Thankfully, OpenAI introduced Code Interpreter earlier in 2023. In short, Code Interpreter allows you to define assistants that can write their own code, and execute it in a sandboxed, firewalled environment. It works beautifully in our context:

It removes the need to install Python environment and to deal with the command line.
There is little coding involved, so most of the iteration is at the prompt level.
We can still pass inputs (such as our taxonomy) into the prompt, keeping our prompt up to date.
With a detailed prompt, we can automate our team's operation end-to-end, removing the need to run a piecemeal process (which in turn improves our listing turnaround time)

Prompting for Code Interpreter

Prompting for Code Interpreter is not different than prompting for other contexts. In fact, the only difference is in how we explicitly create an Assistant for that purpose:

file = client.files.create(
  file=open('listing.csv', 'rb'),
  purpose='assistants'
)

instructions="""
You are a data entry operator who is tasked to reorganize the information of a CSV file provided by the user. Do your best to reorganize the information based on the user input, but do not ask for the user's help.
"""

assistant = client.beta.assistants.create(
  instructions=instructions,
  model="gpt-4-1106-preview",
  tools=[{"type": "code_interpreter"}],
  file_ids=[file.id]
)

Here we first uploaded a file containing our listing data, so it can be referenced by the Assistant. We then created a new Assistant, and we gave it code_interpreter tool it can use. This tool comes with no argument specification; since Code Interpreter is built into the model, the Assistant will already know how to invoke it.

As far as prompting goes, here's how we're going to proceed:

We'll first get our updated taxonomy before each run, and feed it to the prompt.
We'll give the prompt precise instructions about the relationship about items in our taxonomy (so that, for example, the Assistant does not misgender sizes or categories)
We specify the output columns we want and the output format (an Excel file in this case)

Here's a simplified version of the initial version of our prompt:

prompt = f"""I have a file I'll need to reorganize in a different format. The input file will contains product information, such as:
{columns}

The file can contain additional information that can be relevant to the task, such as:
{file_specific_info}

Give then input file, create an output file with the following columns:
{output_columns}

Do the following for each row:

- Check if each cell in the row contains the above information. If so, start extracting this information and add them in the right column. For example, if the product description contains the item's brand, move the brand in the Brand column for that item.

- Check if the product description or any other information in the row contains brand information. For example, the product description may contain a brand name. If you find a brand in that row, add it to the Brand Column.

- Check if a URL is present in the row, and check is formatted in a way that seems to be pointing to an image. If that's the case, add it to the Image URL column for that row. If not, leave blank.

- Read the SKU Title or product description carefully, then infer the value for Category Group from the information available in that same row, then choose one of these output values:
{taxonomy.category_groups}

If you do not have enough information to determine a Category Group for an item, leave its output value empty.

- For each row, infer the value for Department from the information available in that same row, then choose one of these output values:
{taxonomy.departments}

Leave Department empty if Category Group is Apparel or Footwear. Also leave empty if you do not have enough information determine its output value.

- If sizes are provided, translate each size to one of the following values, or leave empty if you cannot determine the right sizing:
{taxonomy.sizes}

Note that the input file may list multiple sizes for the same item. If that's the case, separate each size into its own row for the same item.

- If two rows end up having the same combination of SKU Title, Size, Price, and MSRP, append one or more additional information (such as Style or SKU) to the Style column so that it creates an unique value.

Some other suggestions for you:
Try to understand what type of data each input column contains, then map it to the relevant output column. When necessary, make assumptions using both conventional knowledge and specific knowledge of the retail industry.

If some values are not available, or if you cannot determine them with confidence, simply leave them blank.

The output should be an Excel file."""

Few shots, less flaws

Our basic prompt structure only provides task context, inputs, and the immediate task description. Even so, the prompt is fairly accurate at determining input and output data, as well as applying some heuristics to infer missing information like categories and style descriptions, when missing.

In our tests, a few-shots variant of the prompt performs significantly better, both in terms of accuracy and latency.

It takes time to gain time

It may sound counterintuitive, but it takes time to shave time off our processes. And that's okay: in our case, prompt engineering is a fixed time cost that pays off dividends in a short amount of time. And with the time we saved in making our processes more efficient, we were able to scale our operations, which in turn freed up more bandwidth to optimize our prompts.

Advanced prompting techniques: function calling

Daniele — Tue, 26 Dec 2023 17:48:42 +0000

LLMs cannot rely on real-time knowledge. For example, an AI can't tell you what time is it right now, or change their prediction of a winning team given the live score of a football game. Is there a way to overcome this limitation?

As a user, prompting is probably the most important aspect when dealing with AIs. Writing the right prompt not only can make an LLM more accurate, but it will confer superpowers it can't provide out of the box.

One of these superpowers is to feed external data to in the prompt. This gives the assistant additional context to provide a more accurate answer. As the model gathers information, we can include these details in the prompt itself (a technique called prompt chaining), so that the model's context grows as it thinks through an accurate answer.

These techniques work well when we provide data to the prompt at the beginning of the conversation. For example, the model can help us categorize data based on criteria we add to the prompt at the beginning of the conversation. But what if results change during the conversation? And what if the model needs to dynamically access data we couldn't provide?

We can use a variation of those techniques to let LLMs know they can use tools, so that they won't have to think for themselves when they cannot provide an accurate answer. This is a technique on itself, and it's called function calling (or tool use).

In tool use, we first tell the model that there are tools available. We describe what are those tools and how they can be invoked. We encourage the model to call those tools whenever it cannot think of an accurate answer, or in situations when it needs access to data it doesn't have. The model will decide what function it needs, and it will output a function call when needed. On the client side, the code will detect a function call, execute it, and return its result in the next prompt. The model can then use those results and decide what to do next (including making another function call).

Here, I'll use Claude from Anthropic. I choose it for a few reasons:

Anthropic takes AI safety seriously, which makes it easier to deal with harmful responses. Claude is trained using a combination of RLHF and Constitutional AI, which makes it quite resilient out-of-the-box from harmful prompts.
Anthropic just released a new Messages API (in beta at the time of writing), and I wanted to try it out!

What's the weather like?

Now, suppose we want to get the weather in a specific location. The model will first need to convert the user-provided location to a set of coordinates, then get the weather for those coordinates. We will code two functions:

get_lat_long: converts a location name to a set of coordinates (latitude and longitude)
get_weather: gets the weather for that particular set of coordinates.

We'll need to define how those functions will work. In the code, we'll add a specification to describe what the function does, along with its name and arguments. Here's how the specification for get_weather looks like:

get_weather_description_json = {
  "name": "get_weather",
  "description": "Returns weather data for a given latitude and longitude.",
  "parameters": [{
      "name": "latitude",
      "type": "string",
      "description": "The latitude coordinate as a string"
    },
    {
      "name": "longitude",
      "type": "string",
      "description": "The longitude coordinate as a string"
    }
  ]
}

We'll likely have a specification like this for each one of the functions we want Claude to use. We will then chain all these descriptions together and encourage Claude to use them. This is done in the prompt:

In this environment you have access to a set of tools you can use to answer the user's question. You may call them like this:
<function_calls>
{
  "function_calls": [
    {
      "tool_name": "$TOOL_NAME",
      "parameters": {
        "$PARAMETER_NAME": "$PARAMETER_VALUE"
      }
    }
  ]
}
</function_calls>


Only invoke one function at a time and wait for the results before invoking another function:

<functions>
{
  "function_calls": [
    {
      "name": "get_weather",
      "description": "Returns weather data for a given latitude and longitude.",
      "parameters": [{
        "name": "latitude",
        "type": "string",
        "description": "The latitude coordinate as a string"
      },
      {
        "name": "longitude",
        "type": "string",
        "description": "The longitude coordinate as a string"
      }
    ]
  },
  {
    "name": "get_lat_long",
    "description": "Returns the latitude and longitude for a given place name.",
    "parameters": [{
      "name": "place",
      "type": "string",
      "description": "The place name to geocode and get coordinates for."
  }]
]}
</functions>

The user will ask something like this:

What's the weather like in San Francisco?

JSON meets XML

Anthropic recommends enclosing instruction in XML tags, but one hidden gem about Claude is that it can understand JSON quite simply too. In my test, I tried writing specifications completely in XML (which is Anthropic usually does), and when I converted it into JSON, I found the model was equally good at understanding instructions and making calls. The XML/JSON combination also has the advantage of simplifying the parsing logic in the client, which is nice!

AI calls, function responds

When we run this prompt, Claude will output something like this:

<function_calls>
{"function_calls": [{"tool_name": "get_lat_long", "parameters": {"place": "San Francisco"}}]}
</function_calls>
To get the weather for San Francisco, I first needed to find the latitude and longitude coordinates. I used the get_lat_long tool to geocode "San Francisco" and get its geographic coordinates.
Let's check the response to see if we got the coordinates:

Claude correctly understood how to call a function, and it is in fact asking to call one. We see JSON wrapped in XML tags, but we can't parse it as is. We'll need to get rid of the unwanted output, and then isolate and parse the JSON string. To get rid of the more talkative part, we configure a stop sequence in the API request. Stop sequences will stop any output starting at the location the sequence is encountered. Configuring a stop sequence of </function_calls> will automatically skip any output that starts with </function_calls>.

The output will then look like this:

<function_calls>
{"function_calls": [{"tool_name": "get_lat_long", "parameters": {"place": "San Francisco"}}]}

Now we just have to remove the initial tag and parse the JSON. We can do something like this:

tag = "<function_calls>"
tools_string_index = content.find(tag)

tools_string = content[tools_string_index + len(tag):]

tools_to_invoke = json.loads(tools_string)

function_call = tools_to_invoke[0]
# function call will be {"tool_name": "get_lat_long", "parameters": {"place": "San Francisco"}

call_function(**function_call)

Now we have a dictionary with the name of the function and its parameters. In our code, we have a receiver function that will accept Claude's arguments and return a result:

def call_function(tool_name, parameters):
    func = getattr(tools, tool_name)
    output = func(**parameters)
    return output

We take the result, serialize it back into the prompt, and feed it back to Claude.

Claude will then choose to invoke the get_weather function with the latitude and longitude provided by get_lat_long (which is now part of the prompt). This will return the correct answer:

The weather data shows it is currently 12.3°C in San Francisco, with 5.2km/h winds from the west-southwest direction. The weather code indicates cloudy skies.

Let me know if you need any other weather details for San Francisco!

Imagine the possibilities

Function calling is a very powerful technique, and it can be used to go far beyond making API calls.

We could describe a SQL schema, and Claude could perform SQL queries to obtain data from a database
The model could ask for a specific file in the local filesystem
The model could execute OS commands
The model could come up with some search keywords, and the client can search the internet for those. This technique is usually employed as Retrieval Augmented Generation.

With the right prompt, your model can put all its knowledge to great use, and expand the realm of usefulness you can derive from it.