DEV Community

Cover image for Get Free Unlimited Gemini API
Hassann
Hassann

Posted on • Originally published at apidog.com

Get Free Unlimited Gemini API

Google’s Gemini family is a cost-effective frontier model line for high-volume workloads, but token costs can still grow quickly when a public app, side project, or hackathon demo gets real traffic. Puter.js changes the billing model: it exposes Gemini models such as 2.5 Pro, 2.5 Flash, 2.0 Flash, 3 Flash Preview, and Gemma models without requiring your Google API key. The end user signs in with Puter and covers usage from their account, while your app calls the model from the browser.

Try Apidog today

TL;DR

  • Puter.js gives browser apps access to Gemini and Gemma models without a Google API key, Google Cloud project, or backend.
  • Supported Gemini models include 2.5 Pro, 2.5 Flash, 2.5 Flash Lite, 2.0 Flash, 2.0 Flash Lite, 3 Flash Preview, plus dated previews.
  • Supported Gemma models include Gemma 2, 3, and 4 in multiple sizes.
  • Setup can be as small as one <script> tag and one puter.ai.chat() call.
  • Streaming, image input, and temperature control work from the browser.
  • Usage is charged to the signed-in Puter user, not your developer account.
  • Use Apidog to compare a Puter prototype with the official Gemini API before migrating.

How “free unlimited” works

Puter.js inverts the usual LLM billing flow.

Instead of your app holding a Google AI Studio key and paying for every token, the user signs in to Puter. Calls are made on behalf of that signed-in user and usage is charged against their Puter balance. New Puter accounts receive starter credit, and users can top up if they need more.

For developers, this means:

  • No Google Cloud project
  • No AI Studio API key
  • No server-side token proxy
  • No key rotation
  • No billing exposure from public traffic

The trade-off: Puter.js is browser-first. It assumes a user session, so it is not a clean fit for backend-only jobs such as cron tasks, batch processors, or webhooks.

Step 1: Install Puter.js

For a static page, add the CDN script:

<script src="https://js.puter.com/v2/"></script>
Enter fullscreen mode Exit fullscreen mode

That is enough to call Gemini from the browser.

For a bundled app, install the package:

npm install @heyputer/puter.js
Enter fullscreen mode Exit fullscreen mode

Then import it:

import { puter } from '@heyputer/puter.js';
Enter fullscreen mode Exit fullscreen mode

Step 2: Pick a Gemini or Gemma model

Choose the smallest model that handles your task well.

Model ID When to use
google/gemini-2.5-pro Hard reasoning, complex analysis, long-context tasks
google/gemini-2.5-flash Default choice for most app features
google/gemini-2.5-flash-lite High-volume classification, tagging, simple Q&A
google/gemini-2.0-flash Stable baseline with well-understood behavior
google/gemini-3-flash-preview Latest preview model
google/gemma-3-27b-it Open Gemma, instruction-tuned workflows
google/gemma-4-31b-it Larger open Gemma option

For most apps, start with:

google/gemini-2.5-flash
Enter fullscreen mode Exit fullscreen mode

Use google/gemini-2.5-pro only when the prompt needs stronger reasoning. Use Lite variants for high-volume, low-complexity tasks like classification or tagging.

Step 3: Make your first Gemini call

Create an HTML file:

<!DOCTYPE html>
<html>
<body>
  <script src="https://js.puter.com/v2/"></script>

  <script>
    puter.ai.chat(
      "Explain machine learning in three sentences",
      {
        model: "google/gemini-2.5-flash"
      }
    ).then(response => {
      puter.print(response);
    });
  </script>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Open the file in a browser.

On first use, Puter handles authentication. The user signs in or creates a free Puter account, then the response is printed to the page.

No API key. No .env file. No backend route.

Step 4: Stream the response

For chat UIs, stream tokens as they arrive:

const response = await puter.ai.chat(
  "Explain photosynthesis in detail",
  {
    model: "google/gemini-2.5-flash",
    stream: true,
  }
);

for await (const part of response) {
  if (part?.text) {
    outputDiv.innerHTML += part.text;
  }
}
Enter fullscreen mode Exit fullscreen mode

A simple UI target could look like this:

<div id="output"></div>

<script>
  const outputDiv = document.getElementById("output");
</script>
Enter fullscreen mode Exit fullscreen mode

Each part.text contains a response chunk. Append it to your UI so the user sees the answer appear progressively.

Step 5: Send image input to Gemini

Gemini supports multimodal prompts. With Puter.js, pass the prompt first, then the image URL:

puter.ai.chat(
  "What do you see in this image? Describe colors, objects, and mood.",
  "https://assets.puter.site/doge.jpeg",
  {
    model: "google/gemini-2.5-flash"
  }
).then(response => {
  puter.print(response);
});
Enter fullscreen mode Exit fullscreen mode

Practical use cases include:

  • Alt-text generation
  • Visual question answering
  • Screenshot analysis
  • OCR-style extraction
  • Accessibility tooling
  • Product image tagging
  • Diagram explanation

Step 6: Tune temperature

Pass model parameters in the options object:

const response = await puter.ai.chat(
  "Write a creative short story about a robot chef",
  {
    model: "google/gemini-2.5-flash",
    temperature: 0.2,
  }
);

console.log(response);
Enter fullscreen mode Exit fullscreen mode

Use lower values for deterministic output:

temperature: 0.0
Enter fullscreen mode Exit fullscreen mode

Good for:

  • JSON generation
  • Classification
  • Extraction
  • Factual answers
  • Structured summaries

Use higher values for more variation:

temperature: 0.8
Enter fullscreen mode Exit fullscreen mode

Good for:

  • Brainstorming
  • Creative writing
  • Marketing copy
  • Ideation

Step 7: Build multi-turn conversations

Pass an array of messages instead of a single string:

const messages = [
  {
    role: "user",
    content: "I am building a Next.js app with Postgres."
  },
  {
    role: "assistant",
    content: "Got it. What do you need help with?"
  },
  {
    role: "user",
    content: "How should I structure migrations?"
  },
];

const response = await puter.ai.chat(messages, {
  model: "google/gemini-2.5-pro",
});

console.log(response);
Enter fullscreen mode Exit fullscreen mode

For an actual chat UI, keep updating the message array:

messages.push({
  role: "user",
  content: userInput,
});

const response = await puter.ai.chat(messages, {
  model: "google/gemini-2.5-flash",
});

messages.push({
  role: "assistant",
  content: response,
});
Enter fullscreen mode Exit fullscreen mode

Gemini receives the full conversation history on each call.

Compare Gemini with other models

Puter exposes multiple model providers through one interface. You can benchmark the same prompt across models by changing only the model string:

const models = [
  "google/gemini-2.5-flash",
  "claude-sonnet-4-6",
  "gpt-5.5",
  "x-ai/grok-4.3",
];

const prompt = "Refactor this React component to use hooks: ...";

for (const model of models) {
  const start = performance.now();

  const response = await puter.ai.chat(prompt, { model });

  const elapsed = performance.now() - start;

  console.log(`${model}: ${elapsed.toFixed(0)}ms`);
  console.log(response);
  console.log("---");
}
Enter fullscreen mode Exit fullscreen mode

Use this pattern to compare:

  • Latency
  • Output quality
  • Formatting consistency
  • Instruction following
  • Coding accuracy
  • Cost profile for the user

For many apps, Gemini Flash is a strong default when latency matters. For harder prompts, benchmark against other models before choosing a production default.

What Puter.js gives you

You get:

  • Gemini 2.5, 2.0, and 3 Flash variants
  • Gemini 2.5 Pro
  • Gemma 2, 3, and 4 models
  • Multi-turn conversations
  • Streaming responses
  • Image URL input
  • Temperature control
  • max_tokens
  • System prompts
  • Browser-based production usage

You may not get, depending on the current Puter version:

  • Native Gemini function calling
  • Code execution tools
  • Google Search grounding
  • Gemini’s full 2M-token context ceiling on every model
  • Server-side use without a browser session
  • Direct Google rate-limit visibility

For agentic workflows that require code execution, grounding, or strict server-side control, the official Google Gemini API is usually the better fit. For browser-based chat, Q&A, content generation, and vision tasks, Puter.js is often enough.

When to use Puter.js vs the official Gemini API

Use Puter.js when:

  • You are building a free public app.
  • You do not want token costs attached to your developer account.
  • You are prototyping quickly.
  • You do not want to configure Google Cloud.
  • You are building a static site, hackathon app, or browser extension.
  • Your users can sign in to Puter.

Use the official Gemini API when:

  • You need backend calls.
  • You need cron jobs, batch jobs, or webhooks.
  • You need code execution.
  • You need Search grounding.
  • You need the full Gemini Pro long-context ceiling.
  • You need a direct compliance or billing relationship with Google.
  • You need fine-tuning on your own dataset.
  • Your users will not accept a Puter sign-in step.

For a standalone Gemini 3 Flash walkthrough, see How to use the Gemini 3 Flash Preview API.

Test the integration with Apidog

Puter calls happen in the browser, so you cannot test them exactly like a backend REST API. A practical workflow is:

  1. Build a small static Puter page.
  2. Accept a prompt through a query parameter.
  3. Use that page for browser-based prototype testing.
  4. Use Apidog to validate the official Gemini API surface for a future migration.
  5. Keep Puter and Gemini API configs as separate environments.

Example environment split:

Environment Base URL
puter-prototype Your localhost/static page URL
gemini-prod https://generativelanguage.googleapis.com/v1

You can download Apidog, create both environments, and keep the same prompt payloads documented in one collection.

For more API testing patterns, see API testing tool for QA engineers.

Other free LLM paths through Puter

The same user-pays model works across other providers:

The implementation pattern is the same: keep the Puter script and switch the model value.

const response = await puter.ai.chat(
  "Summarize this issue for a developer changelog.",
  {
    model: "google/gemini-2.5-flash"
  }
);
Enter fullscreen mode Exit fullscreen mode

FAQ

Is this truly unlimited?

Unlimited from the developer’s side, yes. Your app does not pay per token from your own Google account. The signed-in Puter user has whatever balance is available in their Puter account.

Do I need a Google account or Google Cloud project?

No. Puter handles the upstream relationship. Your app does not need a Google API key.

Can I use this in production?

Yes, for browser-based apps. The main product decision is whether your users are willing to sign in with Puter.

Does Gemini through Puter behave like the official API?

Puter calls Google’s API on the user’s behalf. Model behavior should be aligned with the underlying model. Latency may differ because Puter adds another layer between your browser app and the upstream model.

What about Gemini’s 2M-token context window?

Puter may not expose the full 2M-token ceiling for every model variant. If your app depends on extremely long context, use the official Google Gemini API.

Can I use Puter Gemini in a Discord bot or backend service?

Not cleanly. Puter.js is browser-first and assumes a logged-in user session. Backend services should use the official Gemini API directly.

What model should I default to?

Start with:

google/gemini-2.5-flash
Enter fullscreen mode Exit fullscreen mode

Move to:

google/gemini-2.5-pro
Enter fullscreen mode Exit fullscreen mode

for difficult reasoning tasks.

Use:

google/gemini-2.5-flash-lite
Enter fullscreen mode Exit fullscreen mode

for high-volume classification or tagging.

Is Imagen image generation supported?

Puter exposes image generation through OpenAI image models such as gpt-image-2 and DALL-E variants today, not Imagen. See Get free unlimited GPT-5.5 API for that path.

Wrapping up

Puter.js is a practical way to add Gemini to browser-based apps without managing Google Cloud, API keys, or developer-side token billing.

The basic implementation is:

<script src="https://js.puter.com/v2/"></script>
Enter fullscreen mode Exit fullscreen mode
const response = await puter.ai.chat(
  "Explain this code snippet.",
  {
    model: "google/gemini-2.5-flash"
  }
);
Enter fullscreen mode Exit fullscreen mode

Use Puter.js for prototypes, hackathon builds, free public apps, static sites, and browser extensions. Use the official Gemini API when you need backend execution, fine-tuning, code tools, Search grounding, or maximum long-context support.

Build the request once in Apidog, compare Puter with the official API, and choose the path that matches your app.

Top comments (0)