DEV Community

Cover image for OpenAI Realtime API: Points to keep in mind when executing function calling
M Sea Bass
M Sea Bass

Posted on • Edited on

2

OpenAI Realtime API: Points to keep in mind when executing function calling

This time, I encountered a small issue when executing functions using the Realtime API, so I’m leaving a note here as a solution. The actual code is available in the GitHub repository below, so feel free to check it out for reference.

Voice Chat Repository

The above code is based on an Azure sample. I have provided an explanation of the Azure sample code here as well, so feel free to read it if you're interested.

Flow of Function Execution

1. Preparing the Function

First, specify the Tool to execute the function. In the following code, a function called get_your_info is defined.

TOOLS = [
    {
        "type": "function",
        "name": "get_your_info",
        "description": "Get the information of your own.(e.g. name, age, hobby, favorite food, favorite color, favorite music, favorite movie, favorite game, favorite sport, favorite team, favorite player, favorite programming language)",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The query to get the information",
                },
            },
            "required": ["query"],
        },
    }
]
TOOL_CHOICE = "auto"
Enter fullscreen mode Exit fullscreen mode

2. Setting Up the Session and Preparing the Function Call

Next, set up the session and prepare to call the function. When calling the function, use the specified tool (TOOLS) and automatic selection mode (TOOL_CHOICE = "auto").

await client.send(
    SessionUpdateMessage(
        session=SessionUpdateParams(
            model=model,
            modalities=["text", "audio"],
            input_audio_format="pcm16",
            output_audio_format="pcm16",
            turn_detection=ServerVAD(type="server_vad", threshold=0.5, prefix_padding_ms=200, silence_duration_ms=200),
            input_audio_transcription=InputAudioTranscription(model="whisper-1"),
            voice=VOICE_TYPE,
            instructions=INSTRUCTIONS,
            temperature=TEMPERATURE,
            max_response_output_tokens=MAX_RESPONSE_OUTPUT_TOKENS,
            tools=TOOLS,
            tool_choice=TOOL_CHOICE,
        )
    )
)
Enter fullscreen mode Exit fullscreen mode

3. Executing the Function and Retrieving Arguments

When the function is called, you can retrieve the arguments via response.function_call_arguments. Below is the processing for this step.

message = await client.recv()
...
match message.type:
    case "response.function_call_arguments.done":
        print("Response Function Call Arguments Done Message")
        print(f"  Response Id: {message.response_id}")
        print(f"  Arguments: {message.arguments}")
        try:
            arguments = json.loads(message.arguments)
            await call_tool(client, message.item_id, message.call_id, message.name, arguments)
Enter fullscreen mode Exit fullscreen mode

4. Sending the Function Result

The function result is sent using conversation.item.create. However, simply sending this result will not trigger the subsequent processes, so you need to trigger response generation using response.create.

async def call_tool(client: RTLowLevelClient, previous_item_id: str, call_id: str, tool_name: str, arguments: dict):
    tool_func = TOOL_MAP[tool_name]
    tool_output = tool_func(**arguments)
    print(f"tool_output: {tool_output}")
    await client.send(
        ItemCreateMessage(
            item=FunctionCallOutputItem(
                call_id=call_id, 
                output=tool_output,
            ),
            previous_item_id=previous_item_id,
        )
    )
    await client.send(
        ResponseCreateMessage(
            response=ResponseCreateParams(
            )
        )
    )
Enter fullscreen mode Exit fullscreen mode

At this point, the content of the response passed to response.create can be empty. This ensures that the response is generated successfully and that subsequent processing will be executed.

Summary

The key point here is that after passing the function result, you need to trigger response generation with response.create. If this step is skipped, subsequent processing will not take place, so it's important to be mindful of this. Although simple, I hope this serves as a reference for anyone facing similar issues.

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay