OpenAI vs Gemini : Function Calling & Autonomous Agent

#openai #java #gemini #ai

Function calling or building autonomous agents is a crucial aspect of AI application development. While most articles and research in this area have been conducted in Python, this article explores these aspects using Java. Additionally, I will share my experience in building LLM Agents with both Gemini and OpenAI using Java.
This article does not aim to compare features between the platforms, as both excel in their own capacity. Instead, it reflects my experiences and learnings from contributing to the open-source project Tools4AI

What is Function Calling

Function calling with AI, particularly in the context of large language models like ChatGPT and Gemini, refers to the ability of these AI models to interact with and execute code within a controlled environment. This capability allows the AI to perform specific tasks, such as solving mathematical problems, generating visual content, making predictions based on data, or interacting with other software interfaces in a sandboxed or secure manner. The AI effectively calls functions—blocks of code designed to perform particular tasks—based on user requests or its own decision-making processes.

This feature extends the utility of AI beyond generating and understanding text, enabling it to:

Execute Code: Run snippets of code in various programming languages to solve problems, explain concepts, or demonstrate programming techniques.

Interact with Tools and APIs: Use integrated tools or interfaces to perform tasks like browsing the internet (in a controlled way), generating images, or accessing databases, all within the constraints of privacy and security policies.

Process and Analyze Data: Perform data analysis, make calculations, and generate visualizations to support decision-making or explain complex data-driven insights.

Simulate Conversations with Systems: Simulate interactions with software systems, APIs, or hypothetical scenarios, demonstrating how systems might respond to certain inputs or actions.

These capabilities make AI models not just sources of information or conversation partners but also powerful tools for exploring a wide range of subjects and performing tasks that require interaction with systems or processing of data.

Gemini
Google Gemini is a family of multimodal large language models developed by Google DeepMind and powers a chatbot of the same name. These models are multimodal, meaning they can understand and generate content across different types of data, not just text.
Gemini has out of the box Java support allowing developers to utilize the Gemini API within Java applications. This support enables Java developers to integrate Gemini's AI capabilities, such as generating text responses and processing image inputs, directly into their Java-based projects. By leveraging the Gemini API, developers can enhance their applications with advanced AI features provided by Google's multimodal large language models. For details on how to use Gemini with Java please take a look at this article

Function calling is also supported by the Java API , you can read about how to do function calling in Java with Gemini here
Using Gemini's api here I am adding 2 functions

FunctionDeclaration functionDeclaration = FunctionDeclaration.newBuilder()
                    .setName("getRecipeTaste")
                    .setDescription("provide the taste of recipe based on name")
                    .setParameters(
                            Schema.newBuilder()
                                    .setType(Type.OBJECT)
                                    .putProperties("recipe", Schema.newBuilder()
                                            .setType(Type.STRING)
                                            .setDescription("recipe")
                                            .build()
                                    )
                                    .addRequired("recipe")
                                    .build()
                    )
                    .build();

and the 2nd function is here

FunctionDeclaration functionTaste = FunctionDeclaration.newBuilder()
        .setName("getHealthyDiet")
        .setDescription("provide the taste of recipe based on name")
        .setParameters(
                Schema.newBuilder()
                        .setType(Type.OBJECT)
                        .putProperties("taste", Schema.newBuilder()
                                .setType(Type.STRING)
                                .setDescription("taste")
                                .build()
                        )
                        .addRequired("taste")
                        .build()
        )
        .build();

The response for both the functions are passed back in this way

Content content =
        ContentMaker.fromMultiModalData(
                PartMaker.fromFunctionResponse(
                        "getRecipeTaste",
                        IndianFoodRecipes.getRecipe()),
                PartMaker.fromFunctionResponse(
                        "getHealthyDiet",
                        IndianFoodRecipes.getHealthy()));

When you call Gemini with this

String promptText = "whats the taste of paneer butter masala? and is it healthy option for someone who is on diet?";

Tools4AI with Gemini
With Tools4AI you can perform function calling in much simpler way
Write the action class in java

public class SimpleAction

Annotate with Predict annotation

@Predict

implement method and annotate with

@Action

@Action
public String whatFoodDoesThisPersonLike(String name) {
    if("vishal".equalsIgnoreCase(name))
        return "Paneer Butter Masala";
    else if ("vinod".equalsIgnoreCase(name)) {
        return "aloo kofta";
    }else
        return "something yummy";
}

Thats it , it that simple

String prompt = "My friends name is Vishal ,I dont know what to cook for him today.";
GeminiActionProcessor processor = new GeminiActionProcessor ()
String result = (String)processor.processSingleAction(prompt)

This code will automatically detect what action to take and call with required argument value ( in this case its "vishal")
Complete class is here

OpenAI
OpenAI itself does not provide an official Java SDK for directly interacting with its API. However, the OpenAI API is a RESTful service, which means it can be accessed from any programming language capable of making HTTP requests, including Java. This allows developers to integrate OpenAI's capabilities into Java applications by constructing HTTP requests to communicate with the API.
some of the libaries for Java interaction with OpenAI are

langchain4j/langchain4je
TheoKanning/openai-java
Azure/azure-sdk-for-java

Tools4AI with OpenAI
Function calling with Tools4AI is similar to how we call function for Gemini but with different processor

String prompt = "My friends name is Vishal ,I dont know what to cook for him today.";
GeminiActionProcessor processor = new GeminiActionProcessor ()
String result = (String)processor.processSingleAction(prompt)

Complete code for OpenAI Function calling with Tools4AI is here
Similar code for Gemini Function calling is here

Tools4AI
Tools4AI harnesses an innovative approach utilizing annotations and reflection to dynamically associate user prompts with a wide array of actions. This system smartly infers the required action from the prompt's details—such as the action's name, group, and descriptive elements—without necessitating explicit instructions from the user. It supports a versatile range of operations, including HTTP REST calls, interactions with REST APIs via Swagger URLs, execution of Java methods, operation on Plain Old Java Objects (POJOs), and the running of both shell and Python scripts. It can work with Spring boot based applications as well stand alone java application. This design not only simplifies the process of triggering complex functions but also enhances the flexibility and intelligence of application workflows.