Aravind Putrevu for Portkey

Posted on Jun 13, 2024

How to create LLM fallback from Gemini Flash to GPT-4o?

#ai #google #ops #aigateway

Generative AI has been the hottest technology trend from an year enterprises to startups. Almost every brand is incorporating GenAI and Large Language Models (LLM) in their solutions.

However, an under explored part of Generative AI is the managing resiliency. It is easy to build on a API provided by a LLM vendor like OpenAI, however it is hard to manage if the vendor comes across a service disruption etc.

In this blog, we will take a look at how you can create a resilient generative ai application that switches between GPT-4o to Gemini Flash by using open-source ai-gateway's fallback feature.

Before that..

What is a fallback?

In a scenario involving APIs, if the active endpoint or server goes down, as part of a fallback strategy for high availability using a load balancer, we configure both active and standby endpoints. When the active endpoint goes down, one of the configured secondary endpoints takes over and continues to serve the incoming traffic.

Why do we need fallbacks?

Basically fallbacks ensure application resiliency in disaster scenario's and help aid in quick recovery.

Note: In many cases, during recovery a loss of incoming traffic (such as HTTP requests) is a common phenomena.

Why fallbacks in LLMs?

In the context of Generative AI, having a fallback strategy is crucial to manage resiliency. A traditional server resiliency scenario is no different than in the case of Generative AI. It would imply if the active LLM becomes unavailable, one of the configured secondary LLM takes over and continues to serve incoming requests, thereby maintaining uninterrupted solution experience for users.

Challenges in creating fallbacks for LLMs

While fallbacks in concept for LLMs looks very similar to managing the server resiliency, in reality, due to the growing ecosystem and multiple standards, new levers to change the outputs etc., it is harder to simply switch over and get similar output quality and experience.

Moreover, the amount of custom logic and effort that is needed to add this functionality with changing landscape of LLMs and LLM providers will be hard for someone whose core business is not managing LLMs.

Using open-source AI Gateway to implement fallbacks

To demonstrate fallbacks feature, we'll be building a sample Node.js application and integrating Google's Gemini. We'll be using the OpenAI SDK and Portkey's open-source AI Gateway to demonstrate the fallback to GPT.

If you are new to AI Gateway, you can refer our previous post to learn features of open-source AI Gateway.

Creating Node.js Project

To start our project, we need to set up a Node.js environment. So, let's create a node project. Below command will initialize a new Node.js project.

npm init

Install Dependencies

Let's install the required dependencies of our project.

npm install express body-parser dotenv

This will install the following packages:

express: a popular web framework for Node.js
body-parser: middleware for parsing request bodies
portkey-ai: a package that enables us for accessing the multiple ai models
dotenv: loads environment variables from a .env file

Setting Environment Variables

Next, we'll create a .env folder to securely store our sensitive information such as API credentials.

//.env
GEMINI_API_KEY=YOUR_API_KEY
PORT=3000

Get API Key

Before using Gemini, we need to set up API credentials from Google Developers Console. For that, We need to sign up on our Google account and create an API key.

Once signed in, Go to Google AI Studio.

Click on the Create API key button. It will generate a unique API Key that we'll use to authenticate requests to the Google Generative AI API.

After getting the API key we'll update the .env file with our API key.

Create Express Server

Let's create a index.js file in the root directory and set up a basic express server.

const express = require("express");
const dotenv = require("dotenv");

dotenv.config();

const app = express();
const port = process.env.PORT;

app.get("/", (req, res) => {
  res.send("Hello World");
});

app.listen(port, () => {
  console.log(`Server running on port ${port}`);
});

Here, We're using the "dotenv" package to access the PORT number from the .env file.

At the top of the project, we're loading environment variables using dotenv.config() to make it accessible throughout the file.

Executing the project

In this step, we'll add a start script to the package.json file to easily run our project.

So, Add the following script to the package.json file.

"scripts": {
  "start": "node index.js"
}

The package.json file should look like below:

Let's run the project using the following command:

npm run start

Above command will start the Express server. Now if we go to this URL http://localhost:3000 we'll get this:

The Project setup is now done. Next up, we'll adding Gemini to our project in the next section.

Adding Google Gemini

Set up Route

To add the Gemini to our project, We'll create a /generate route where we'll communicate with the Gemini AI.

For that add the following code into the index.js file.

const bodyParser = require("body-parser");
const { generateResponse } = require("./controllers/index.js");

//middleware to parse the body content to JSON
app.use(bodyParser.json());

app.post("/generate", generateResponse);

Here, We're using a body-parser middleware to parse the content into a JSON format.

Configure OpenAI Client with Portkey Gateway

Let's create a controller folder and create a index.js file within it.

Here, we will create a new controller function to handle the generated route declared in the above code.

First, we'll Import the Required packages and API keys that we'll be using.

Note: Portkey adheres to OpenAI API compatibility. Using Porktey AI further enables you to communicate to any LLM using our universal API feature.

import OpenAI from 'openai';
import dotenv from "dotenv";
import { createHeaders } from 'portkey-ai'

dotenv.config();
const GEMINIKEY = process.env.GEMINI_API_KEY;

Then, we'll instantiate our OpenAI client and pass the relevant provider details.

const gateway = new OpenAI({
  apiKey: GEMINIKEY,
  baseURL: "http://localhost:8787/v1",
  defaultHeaders: createHeaders({
    provider: "google",
  })
})

Note: To integrate the Portkey gateway with OpenAI, We have

Set the baseURL to the Portkey Gateway URL

Included Portkey-specific headers such as provider and others.

Implement Controller Function

Now, we'll write a controller function generateResponse to handle the generation route (/generate) and generate a response to User requests.

export const generateResponse = async (req, res) => {
  try {
    const { prompt } = req.body;

    const completion = await gateway.chat.completions.create({
      messages: [{ role: "user", content: prompt}],
      model: 'gemini-1.5-flash-latest',
    });

    const text = completion.choices[0].message.content;

    res.send({ response: text });

  } catch (err) {
    console.error(err);
    res.status(500).json({ message: "Internal server error" });
  }
};

Here we are taking the prompt from the request body and generating a response based on the prompt using the gateway.chat.completions.create method.

Run Gateway Locally

To run the gateway locally, run the following command in your terminal

npx @portkey-ai/gateway

This will spin up the gateway locally and it’s running on http://localhost:8787/

Run the project

Now, we have to check if our app is working correctly or not!

Let's run our project using:

npm run start

Validating Gemini's Response

Next, we'll make a Post request using Postman to validate our controller function.

We'll send a POST request to http://localhost:3000/generate with the following JSON payload:

{
  "prompt": "Are you an OpenAI model?"
}

And We got our response:

{
    "response": "I am a large language model, trained by Google. \n"
}

Great! Our Gemini AI integration is Working as expected!

Adding Fallback using AI Gateway

Till now, project is working as expected. But what if Gemini's API doesn't respond?

As discussed earlier, a resilient app yields better customer experience.

That's where Portkey's AI Gateway shines. It has a fallback feature that seamlessly switch between them based on their performance or availability.

If the primary LLM fails to respond or encounters an error, AI Gateway will automatically fallback to the next LLM in the list, ensuring our application's robustness and reliability.

Now, let's add fallback feature to our project!

Create Portkey Configs

First, we'll create a Portkey configuration to define routing rules for all the requests coming to our gateway. For that, Add the following Code:

const configObj = {
  "strategy": {
    "mode": "fallback"
  },
  "targets": [
    {
      "provider": "google",
      "api_key": GEMINIKEY // Add your Gemini API Key
    },
    {
      "provider": "openai",
      "api_key": OpenAIKEY,
      "override_params": {
        "model": "gpt-4o"
      }
    }
  ]
}

This config will fallback to OpenAI's gpt-4o if Google's gemini-1.5-flash-latest fails.

Update OpenAI Client

To add the portkey config in our OpenAI client, we'll simply add the config id to the defaultHeaders object.

const gateway = new OpenAI({
  apiKey: GEMINIKEY,
  baseURL: "http://localhost:8787/v1",
  defaultHeaders: createHeaders({
    provider: "google",
    config: configObj
  })
})

Note: If we want to attach the configuration to only a few requests instead of modifying the client, we can send it in the request headers for OpenAI. For example:
let reqHeaders = createHeaders({config: configObj});
openai.chat.completions.create({
  messages: [{role: "user", content: "Say this is a test"}],
  model: "gpt-3.5-turbo"
}, {headers: reqHeaders})
Also, If you have a default configuration set in the client, but also include a configuration in a specific request, the request-specific configuration will take precedence and replace the default config for that particular request.

That's it! Our Setup is done.

Testing the Fallback

To see if our fallback feature is working or not, we'll remove the the Gemini API key from the .env file. And, We'll send a POST request to http://localhost:3000/generate with the following JSON payload:

{
  "prompt": "Are you an OpenAI model?"
}

And We'll get this response:

{
    "response": "Yes, I am powered by the OpenAI text generation model known as GPT-4o."
}

Awesome! This Means Our Fallback feature is Working perfectly!

As we have deleted the Gemini API key, the First request failed, and Portkey Automatically detected that and automatically fallback to the next LLM in the list that is OpenAI's gpt-3.5-turbo .

Conclusion

In this article, we have explored how to integrate Gemini in our node.js application, also how to leverage AI Gateway’s fallback feature when Gemini is not available.

If you want to know more about Portkey's AI Gateway and give us a star, join our LLMs in Production Discord to hear more about what other AI Engineers are building.

Happy Building!

DEV Community