DEV Community

Cover image for Gemini 50% cheaper with Batch API in Curator
Sarthak Malhotra
Sarthak Malhotra

Posted on

Gemini 50% cheaper with Batch API in Curator

Generating synthetic data at scale can be expensive. So, several LLM API providers, including Google, offer 50%-70% discounts through batch mode, which processes large requests asynchronously. However, batch API with Gemini is notoriously tricky due to many steps involved and scattered documentation.

The challenge with Gemini batch API

Let’s go over the steps required for a simple Gemini batch processing (when not using Curator):

  • Create request files in JSONL format (must follow Gemini’s request structure!).
  • Upload this file to a GCP bucket and get the cloud storage URL (and keep track of this).
  • Create a batch prediction job on Vertex AI with the same cloud storage URL.
  • Split requests exceeding 150k, repeating steps 1 and 2 for each batch.
  • Manual polling of status from Vertex using batch IDs (gets complicated when multiple batch files are uploaded).
  • Persist responses manually for basic caching.

These steps add a lot of friction, causing many users to stick to online processing and miss out on significant cost savings. Curator solves this by making Gemini’s batch APIs easy to use!

Curator Gemini Batch mode: 50% cost-efficient and infinitely easier

No manual polling, no file management, just cost-efficient batch processing in a few lines of code.

import os

from bespokelabs import curator

os.environ["HOSTED_CURATOR_VIEWER"]="1"
os.environ["GOOGLE_CLOUD_PROJECT"] = "<project-id>"
os.environ["GEMINI_BUCKET_NAME"] = "<bucket-name>"
os.environ["GOOGLE_CLOUD_REGION "] = "us-central1"  # us-central1 is default

llm = curator.LLM(model_name="gemini-1.5-flash-001", backend="gemini", batch=True)
questions = [
    {"prompt": "What is the capital of Montana?"},
    {"prompt": "Who wrote the novel 'Pride and Prejudice'?"},
    {"prompt": "What is the largest planet in our solar system?"},
    {"prompt": "In what year did World War II end?"},
    {"prompt": "What is the chemical symbol for gold?"},
]
ds = llm(questions)
Enter fullscreen mode Exit fullscreen mode

Read more about other API batch processing offered by Curator for OpenAI, Anthropic and more here.

Please give Curator feedback and show your support by starring on GitHub!

Image description

AWS GenAI LIVE image

How is generative AI increasing efficiency?

Join AWS GenAI LIVE! to find out how gen AI is reshaping productivity, streamlining processes, and driving innovation.

Learn more

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay