DEV Community

Cover image for Summary of LLM Function Calling
Kyungsu Kang
Kyungsu Kang

Posted on

4 4 4 4 3

Summary of LLM Function Calling

Image description

What I’m discussing here is how Wrtn Technologies is building open source software.

I’ll explain in several parts how our team is preparing for the AI era.

If you’re not a developer, you can safely skip over the code details without missing the main ideas.

If you’re curious about our technology, please check out our open source repository at https://github.com/wrtnlabs/agentica.

Introduction

‘AI’ is the hottest keyword right now.

We often hear stories of liberation from labor through AI, which can be both hopeful and daunting,

while others express a strong desire not to return to a time without AI.

However, discussions about AI tend to be so entangled with complex metrics that it becomes difficult to understand what is really being said.

So, from a pure backend developer’s perspective, let’s talk about how AI is transforming our lives, particularly the field of development.

I’ve tried to write this as simply as possible so that even those with no development background can grasp some of the insights.

The Past of Backend Development, and Function Calling

Backend development is essentially server development, and server development is all about designing contracts.

A server is a collection of promises like “if you give me A, I will give you B”, which is why I often liken it to a vending machine.

Imagine a vending machine built from a series of promises about how much money to insert, which button to press, and what drink will come out.

However, with the advent of AI, one element has changed: the entity responsible for calling APIs.

Until now, API calls were triggered by the pages developed by frontend developers or user actions.

A user would click a button, and just like pressing a button on a vending machine to get a drink, the process was straightforward.

But the emergence of AI has introduced a scenario where the function can be “called” without any user clicking a button or scrolling.

We call this concept “Function Calling”.

Defining Function Calling

Enable models to fetch data and take actions.

**Function calling* provides a powerful and flexible way for OpenAI models to interface with your code or external services, and has two primary use cases:*

  • Fetching Data
  • Taking Action

According to OpenAI’s definition of “function calling,” it enables models to fetch data or perform actions.

After all, fetching data in web development is essentially an HTTP GET method, so you could say that the action is essentially the same.

This definition means that at the moment when needed, the model will make a GET or POST request, effectively “firing” an API call.

So, how exactly does the model autonomously “fire” an API call?

OpenAI SDK and the Principle of Function Calling

To explain this, let’s first discuss how we interact with APIs through LLMs.

Calling an API via an LLM gives the impression of having a “conversation.”

But for those who understand the underlying structure, it’s really just asking “What would you say in this situation?” at every moment.

Consider this pseudo-code:

  • The function’s name is “What would you say in this situation?”
  • The function’s argument is the entire conversation history:
    • This includes what the user said, what the AI said, and any additional contextual information.

That’s it.

Thus, a conversation through an LLM simply accumulates the conversation history as parameters,

and the impression of dialogue is merely the result of continuously appending previous messages.

It might sound trivial, but this very mechanism allows us to talk about function calling.

Since you can manipulate the conversation history by asking “What would you say in this situation?”,

isn’t it possible for the API call result to simply be inserted into the conversation as if the AI had made the call?

For example:

  • User: “What should I eat today?”
  • LLM: “I received the result *** from your request.”*
  • LLM: “Based on a search through a mapping app, the best restaurant around Gangnam Station is A!”

Function Calling at the Code Level

import { OpenAI } from "openai";

const openai = new OpenAI();

const tools = [{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current temperature for a given location.",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "City and country e.g. Bogotá, Colombia"
        }
      },
      "required": [
        "location"
      ],
      "additionalProperties": false
    },
    "strict": true
  }
}];

const completion = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "What is the weather like in Paris today?" }],
  tools, // When you pass tools along with the prompt, GPT can choose from these tools!
  store: true,
});
Enter fullscreen mode Exit fullscreen mode

The LLM itself isn’t executing the API call.

So, while it may seem like the API is being fired, in reality, the LLM is simply choosing a tool and filling in the parameters for an API request.

Please pay close attention to the commented part in the code!

💡 When you provide tools while prompting GPT, it may select one of those tools!

The essence of GPT’s Function Calling is that the AI, based solely on the conversation, can only fill in the arguments for the chosen tool—it does not actually call the function itself.

So what remains is: if GPT’s output appears to be a conversational response, it is shown directly to the user;

if GPT’s output is the selection of a tool with the filled-in parameters, then that tool should be called instead,

and the LLM is made to appear as though it has performed the call.

Below is the complete pseudo-code written in TypeScript.

TypeScript Pseudo-code for Function Calling

const histories = [{ role: "user", content: "What is the weather like in Paris today?" }];

// 1. Provide the conversation history to the LLM and ask for a response.
const completion = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: histories, // Provide context to the GPT by including the conversation history.
  tools, // Pass tools to GPT so that it might choose one!
  store: true,
});

// 2-1. If the output indicates that a tool was selected:
if (isTool(completion)) { 
    // Extract the parameters,
    const { params, query, body } = getPrameters(completion); 

    // Call the function on behalf of the LLM,
    const called = await functionCall(url, params, query, body);

    // Add the result of the function call to the conversation history.
    histories.push(getContents(called));

    const completion = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: histories, // Provide GPT with the updated conversation including the API call result.
    store: true,        
    })

    histories.push(getContents(completion));
} else {
    // 2-2. If a tool wasn’t chosen, simply show the conversational response to the user and continue the dialogue.
    histories.push(getContents(completion));
}
Enter fullscreen mode Exit fullscreen mode

In summary, function calling hinges on having a client or server make the function call on behalf of the LLM.

The Future Paradigm of Backend Development

Until now, backend developers designed servers, wrote documentation, and then passed it to frontend developers.

Frontend developers would interpret that documentation, coordinate with designers, and connect the functionality to the UI.

But now that we can let an LLM make API calls using just a set of tools, much can be accomplished through simple chat.

Imagine being able to say:

“Send an email to kakasoo!”

and having the email sent without navigating through a separate email service.

Similarly, in commerce or advertising, if a user lacks domain knowledge, they can perform tasks with assistance.

A single chat interface can provide the necessary UI to achieve the desired results.

This leads to a question: what will the role of backend developers be?

I believe that it’s unlikely for backend developers to simply document “tools” to hand off to others.

// Creating tools that match the API is such a tedious task...
// Isn’t there a way to improve this?
const tools = [{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current temperature for a given location.",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "City and country e.g. Bogotá, Colombia"
        }
      },
      "required": [
        "location"
      ],
      "additionalProperties": false
    },
    "strict": true
  }
}];
Enter fullscreen mode Exit fullscreen mode

Manually creating these tools is impractical, and any changes would be a nightmare to manage.

This led us to think: what if we could automate even this process?

Then, wouldn’t it be possible to automatically integrate the API every time it’s created?

At Wrtn Technologies, our team has researched and developed this automation, and we’ve released it as an open source project.

We call this library Agentica, which provides tools to automate agent development.

import { Agentica, createHttpLlmApplication } from "@agentica/core";
import { OpenApi } from "@samchon/openapi";

const document: OpenApi.IDocument; // This represents a JSON-formatted Swagger document.
const tools = createHttpLlmApplication({ model: "chatgpt", document });
const agent: Agentica = new Agentica({
  controllers: [
    {
      name: "shopping",
      application: tools, // Convert Swagger documentation to tools!
      connection: { host: "http://localhost:3000" },
    },
  ],
  ...
});
await agent.conversate("I wanna buy MacBook Pro");
Enter fullscreen mode Exit fullscreen mode

We’ve made the code so simple that any TypeScript developer can understand and use it,

resulting in an interface that connects LLMs immediately with your API through Swagger!

Conclusion

From working on this open source project, I’ve drawn several key insights:

  1. People who think in terms of domain knowledge and service-centric design are more likely to thrive.
  2. The ability to design systems that align with business needs may become more important than merely writing efficient code.
  3. Unlike frontend developers, LLMs don’t tolerate errors in documentation—they will produce errors if the documentation is inaccurate, so documentation is crucial.

For example, an advertising developer might need to delve into a marketing textbook and then translate that into a workable design.

So what will become of frontend developers?

And what happens if the Swagger documentation is incorrect?

I have some thoughts on these questions as well, which I plan to share next time if you found this post interesting!

Top comments (0)