Taming the Cost of Prompt Chaining with GemBatch

#promptengineering #ai #llm #python

Prompt chaining is revolutionizing how we interact with large language models (LLMs). By linking multiple prompts together, we can create complex, dynamic conversations and tackle intricate tasks. But this power comes at a price — literally. Each API call to an LLM service like Google’s Gemini adds to your bill.

Many LLM providers offer a solution: batch processing. Send multiple prompts in a single request and enjoy significant discounts (often around 50%!). However, implementing batching within a prompt chain workflow can quickly turn into a coding nightmare.

The Challenges of Batch Prompt Chaining

Imagine you’re building a chatbot with a multi-step dialogue. With traditional prompt chaining, you’d send each user message and wait for the model’s response before formulating the next prompt. But to leverage batch discounts, you need to:

Accumulate prompts: Instead of sending immediately, gather prompts into batches.
Embrace asynchronicity: Send the batch and wait (potentially for a longer time) for the job to complete.
Map responses: Once the batch result arrives, correctly match each response to its corresponding prompt in the chain.

On top of this, you need to handle rate limits, errors, and retries. This can lead to convoluted code that’s hard to read, debug, and maintain.

Enter GemBatch: Your Prompt Chaining Hero

GemBatch is a Python framework designed to simplify batch prompt chaining with Google’s Gemini. It seamlessly integrates with Firebase, providing a familiar and scalable environment for your LLM applications.

Here’s how GemBatch makes your life easier:

Effortless batching: GemBatch handles the complexities of batching prompts behind the scenes. You define your prompt chains sequentially, just as you would with traditional methods.
Asynchronous handling: GemBatch manages asynchronous batch jobs, ensuring your prompt chains execute smoothly without blocking your application.
Simplified response mapping: GemBatch automatically routes responses back to the correct points in your prompt chains.
Cost savings: By leveraging Gemini’s batch discounts, GemBatch can significantly reduce your LLM API cost.

A Quick Example

import gembatch

# Define a simple prompt chain
def task_a_prompt1():
    gembatch.submit(
        {
            "contents": [
                {
                    "role": "user",
                    "parts": [{"text": "What is the capital of France?"}],
                }
            ],
        },  # prompt 1
        "publishers/google/models/gemini-1.5-pro-002",
        task_a_prompt2
    )

def task_a_prompt2(response: generative_models.GenerationResponse):
    gembatch.submit(
        {
            "contents": [
                {
                    "role": "model",
                    "parts": [{"text": response.text}],
                },
                {
                    "role": "user",
                    "parts": [{"text": f"And what is the population of {response.text}?"}],
                }
            ],
        },  # prompt 2
        "publishers/google/models/gemini-1.5-pro-002",
        task_a_output
    )

def task_a_output(response: generative_models.GenerationResponse):
    print(response.text)

# Start the prompt chain
task_a_prompt1()

This simple example demonstrates how Gembatch allows you to define a prompt chain with gembatch.submit(). Gembatch takes care of batching the requests to Gemini and managing the asynchronous responses.