Failing Over Between Gemini Models

#generativeai #gemini #genai #llm

In a recent article I highlighted how to use Google Gemini LLM for tagging content, but I also highlighted that the free tier offered has daily quota that you need to stay under. Each model that you connect to has its own quota, and with some simple error trapping we can toggle between tiers and make the most out of our daily free tier.

ℹ️ NOTE: This article is going to break down the pieces used for falling back, but if you'd prefer you can see the code directly in GitHub.

Defining Fallback Rules

In my instance, I knew that there were certain models that could accomplish the tagging task well enough and had enough daily usage to be helpful. I did want to prioritize the models that had more daily requests, and the ones that had more requests per minute. The following fallback function can be helpful for you to be able to find out what the next best model is to use for this task:

const model20lite = "gemini-2.0-flash-lite";
const model20flash = "gemini-2.0-flash";
const model15pro = "gemini-1.5-pro";
const model15flash = "gemini-1.5-flash";
const modelUnavailable = "model-unavailable";
let selectedModel = model20lite;

function getNextModelId(model: string): string {
    switch (model) {
        case model20lite:
            return model20flash;
        case model20flash:
            return model15flash;
        case model15flash:
            return model15pro;
        default:
            return modelUnavailable;
    }
}

Stopping Tagging When No Models Available

You'll notice in that first function that there is a 'modelUnavailable' option which lets us know that we have run out of quota for the day from all the models. We need to make sure we stop making requests to the model when this scenario happens.

First up, we'll enhance the checkApplicableTags_Gemini function to check which model has been requested:

if (modelId === modelUnavailable) {
    console.error('No more models available. Cannot tag product.');
    throw new Error('No more models available');
}

We throw an error here to make sure that we stop processing on this call but also that we stop all future processing requests. At our top level where we are first invoking checkApplicableTags_Gemini, we need to catch this error and stop the processing.

if (error instanceof Error && error.message === 'No more models available') {
    console.log('Stopping processing due to model unavailability.');
    return; // Stop processing further products
}

Handling Gemini Quota Messages

Numerous errors could potentially happen when invoking a hosted service, but the ones we are most worried about are the 429 "Too Many Requests" errors that get thrown. Google Gemini uses this error code for all the quota messages, so it's not specific enough for our purposes. We want to know when we've run out of quota for the current minute so that we can wait, but we also want to know when we've run out of quota for the day so that we can try another model.

Google Gemini can also return a 404, meaning that the model we have requested might not offer the API endpoint we are trying to use. There are many possible Gemini models, and many of them will not let you invoke a request like the one we are doing in the tagging function.

For our function, we're going to add some error catching. Whenever invoking the Google Gemini API, we will have checks that look for specific types of error messages.

} catch (error) {
        // If we're above the daily quota for this model, try another model
        if (error instanceof Error && error.message.includes('GenerateRequestsPerDayPerProjectPerModel-FreeTier')) {
            const nextModelId = getNextModelId(modelId);
            selectedModel = nextModelId;
            console.log(`Quota exceeded for model ${modelId}. Switching to model ${nextModelId}...`);
            return await checkApplicableTags_Gemini(title, content, targetTags, nextModelId);
        }
        // If we're going above the rate limit for this model, sleep for 2 seconds and try again
        if (error instanceof Error && error.message.includes('GenerateRequestsPerMinutePerProjectPerModel-FreeTier')) {
            console.log('Rate limit exceeded for current model. Sleeping for 2 seconds and trying again...');
            await new Promise(resolve => setTimeout(resolve, 2000));
            return await checkApplicableTags_Gemini(title, content, targetTags, modelId);
        }
        // If the model is unavailable, try another model
        if (error instanceof Error && error.message.includes('404')) {
            const nextModelId = getNextModelId(modelId);
            selectedModel = nextModelId;
            console.log(`Model ${modelId} is unavailable. Switching to model ${nextModelId}...`);
            return await checkApplicableTags_Gemini(title, content, targetTags, nextModelId);
        }

        console.error('Error checking applicable tags from Gemini:', error);
        return [];
    }

In this case, we have 4 checks:

"GenerateRequestsPerDayPerProjectPerModel-FreeTier": If the error message contains this "Per Day" code in it, we know that the message is telling us that we are out of requests for the day. We will not be able to continue with this model. Therefore, we find out what the next model is and try again with a recursive call to the function with the new model ID.\ I should note that the Gemini API is a little 'fuzzy' about when to throw this error. Sometimes it will throw it when you still have a few requests left, sometimes it will let you go over the quota by a few requests. It's close enough, though!
"GenerateRequestsPerMinutePerProjectPerModel-FreeTier": If the error message contains this "Per Minute" code, we know we are out of requests for this minute. We may have more requests in the day, but we've made too many calls too quickly. Depending on the model, there are a different number of requests that can be made each minute. By default, I put a two-second sleep between requests because the fastest model supports 30 requests per minute. In this error message, we essentially loop on retries every two seconds until we either proceed or get a different error.
"404": If we get a 404 when making a call with a model, it means that the model we selected doesn't have a valid endpoint for this call available and we need to try again with a different model.
"Error": For any other error, we essentially log that something went wrong and return an empty array to allow processing to proceed, if needed.

The Full checkApplicableTags_Gemini Function

Now we bring it all together into this function that can be called to find tags that apply to an article and cycle through Gemini models as we run out of quota.

// Gemini client configuration
const gemini = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const model20lite = "gemini-2.0-flash-lite";
const model20flash = "gemini-2.0-flash";
const model15pro = "gemini-1.5-pro";
const model15flash = "gemini-1.5-flash";
const modelUnavailable = "model-unavailable";
let selectedModel = model20lite;

function getNextModelId(model: string): string {
    switch (model) {
        case model20lite:
            return model20flash;
        case model20flash:
            return model15flash;
        case model15flash:
            return model15pro;
        default:
            return modelUnavailable;
    }
}

// New function to check applicable tags using Gemini
async function checkApplicableTags_Gemini(title: string | null, content: string | null, targetTags: string[], modelId: string): Promise<string[]> {
    if (modelId === modelUnavailable) {
        console.error('No more models available. Cannot tag product.');
        throw new Error('No more models available');
    }

    try {
        const model = gemini.getGenerativeModel({
            model: modelId,
            generationConfig: {
                temperature: 0.3,
                maxOutputTokens: 100,
            },
            systemInstruction: "You are a helpful AI assistant that specializes in content analysis. Your task is to determine which of the specified tags are applicable to the provided article content. If none are applicable, return an empty array. Do not force tags that are not applicable."
        });

        // Use a default value for title if it is null
        const safeTitle = title || "Untitled Article"; // Default value

        const prompt = `Given the following article:\n\nTitle: ${safeTitle}\n\nContent: ${content}\n\nPlease analyze the content and determine which of the following tags are applicable: ${targetTags.join(', ')}. Provide only the applicable tags in a comma-separated list with no additional text or explanation.`;
        const result = await model.generateContent(prompt);

        // Extract applicable tags from the response
        const applicableTags = result.response.text() || '';

        return applicableTags.split(',').map(tag => tag.trim()).filter(tag => tag.length > 0);
    } catch (error) {
        // If we're above the daily quota for this model, try another model
        if (error instanceof Error && error.message.includes('GenerateRequestsPerDayPerProjectPerModel-FreeTier')) {
            const nextModelId = getNextModelId(modelId);
            selectedModel = nextModelId;
            console.log(`Quota exceeded for model ${modelId}. Switching to model ${nextModelId}...`);
            return await checkApplicableTags_Gemini(title, content, targetTags, nextModelId);
        }
        // If we're going above the rate limit for this model, sleep for 2 seconds and try again
        if (error instanceof Error && error.message.includes('GenerateRequestsPerMinutePerProjectPerModel-FreeTier')) {
            console.log('Rate limit exceeded for current model. Sleeping for 2 seconds and trying again...');
            await new Promise(resolve => setTimeout(resolve, 2000));
            return await checkApplicableTags_Gemini(title, content, targetTags, modelId);
        }
        // If the model is unavailable, try another model
        if (error instanceof Error && error.message.includes('404')) {
            const nextModelId = getNextModelId(modelId);
            selectedModel = nextModelId;
            console.log(`Model ${modelId} is unavailable. Switching to model ${nextModelId}...`);
            return await checkApplicableTags_Gemini(title, content, targetTags, nextModelId);
        }


        console.error('Error checking applicable tags from Gemini:', error);
        return [];
    }
}