Hassann

Posted on May 9 • Originally published at apidog.com

Get Free Unlimited Gemini API

Google’s Gemini family is a cost-effective frontier model line for high-volume workloads, but token costs can still grow quickly when a public app, side project, or hackathon demo gets real traffic. Puter.js changes the billing model: it exposes Gemini models such as 2.5 Pro, 2.5 Flash, 2.0 Flash, 3 Flash Preview, and Gemma models without requiring your Google API key. The end user signs in with Puter and covers usage from their account, while your app calls the model from the browser.

Try Apidog today

TL;DR

Puter.js gives browser apps access to Gemini and Gemma models without a Google API key, Google Cloud project, or backend.
Supported Gemini models include 2.5 Pro, 2.5 Flash, 2.5 Flash Lite, 2.0 Flash, 2.0 Flash Lite, 3 Flash Preview, plus dated previews.
Supported Gemma models include Gemma 2, 3, and 4 in multiple sizes.
Setup can be as small as one <script> tag and one puter.ai.chat() call.
Streaming, image input, and temperature control work from the browser.
Usage is charged to the signed-in Puter user, not your developer account.
Use Apidog to compare a Puter prototype with the official Gemini API before migrating.

How “free unlimited” works

Puter.js inverts the usual LLM billing flow.

Instead of your app holding a Google AI Studio key and paying for every token, the user signs in to Puter. Calls are made on behalf of that signed-in user and usage is charged against their Puter balance. New Puter accounts receive starter credit, and users can top up if they need more.

For developers, this means:

No Google Cloud project
No AI Studio API key
No server-side token proxy
No key rotation
No billing exposure from public traffic

The trade-off: Puter.js is browser-first. It assumes a user session, so it is not a clean fit for backend-only jobs such as cron tasks, batch processors, or webhooks.

Step 1: Install Puter.js

For a static page, add the CDN script:

<script src="https://js.puter.com/v2/"></script>

That is enough to call Gemini from the browser.

For a bundled app, install the package:

npm install @heyputer/puter.js

Then import it:

import { puter } from '@heyputer/puter.js';

Step 2: Pick a Gemini or Gemma model

Choose the smallest model that handles your task well.

Model ID	When to use
`google/gemini-2.5-pro`	Hard reasoning, complex analysis, long-context tasks
`google/gemini-2.5-flash`	Default choice for most app features
`google/gemini-2.5-flash-lite`	High-volume classification, tagging, simple Q&A
`google/gemini-2.0-flash`	Stable baseline with well-understood behavior
`google/gemini-3-flash-preview`	Latest preview model
`google/gemma-3-27b-it`	Open Gemma, instruction-tuned workflows
`google/gemma-4-31b-it`	Larger open Gemma option

For most apps, start with:

google/gemini-2.5-flash

Use google/gemini-2.5-pro only when the prompt needs stronger reasoning. Use Lite variants for high-volume, low-complexity tasks like classification or tagging.

Step 3: Make your first Gemini call

Create an HTML file:

<!DOCTYPE html>
<html>
<body>
  <script src="https://js.puter.com/v2/"></script>

  <script>
    puter.ai.chat(
      "Explain machine learning in three sentences",
      {
        model: "google/gemini-2.5-flash"
      }
    ).then(response => {
      puter.print(response);
    });
  </script>
</body>
</html>

Open the file in a browser.

On first use, Puter handles authentication. The user signs in or creates a free Puter account, then the response is printed to the page.

No API key. No .env file. No backend route.

Step 4: Stream the response

For chat UIs, stream tokens as they arrive:

const response = await puter.ai.chat(
  "Explain photosynthesis in detail",
  {
    model: "google/gemini-2.5-flash",
    stream: true,
  }
);

for await (const part of response) {
  if (part?.text) {
    outputDiv.innerHTML += part.text;
  }
}

A simple UI target could look like this:

<div id="output"></div>

<script>
  const outputDiv = document.getElementById("output");
</script>

Each part.text contains a response chunk. Append it to your UI so the user sees the answer appear progressively.

Step 5: Send image input to Gemini

Gemini supports multimodal prompts. With Puter.js, pass the prompt first, then the image URL:

puter.ai.chat(
  "What do you see in this image? Describe colors, objects, and mood.",
  "https://assets.puter.site/doge.jpeg",
  {
    model: "google/gemini-2.5-flash"
  }
).then(response => {
  puter.print(response);
});

Practical use cases include:

Alt-text generation
Visual question answering
Screenshot analysis
OCR-style extraction
Accessibility tooling
Product image tagging
Diagram explanation

Step 6: Tune temperature

Pass model parameters in the options object:

const response = await puter.ai.chat(
  "Write a creative short story about a robot chef",
  {
    model: "google/gemini-2.5-flash",
    temperature: 0.2,
  }
);

console.log(response);

Use lower values for deterministic output:

temperature: 0.0

Good for:

JSON generation
Classification
Extraction
Factual answers
Structured summaries

Use higher values for more variation:

temperature: 0.8

Good for:

Brainstorming
Creative writing
Marketing copy
Ideation

Step 7: Build multi-turn conversations

Pass an array of messages instead of a single string:

const messages = [
  {
    role: "user",
    content: "I am building a Next.js app with Postgres."
  },
  {
    role: "assistant",
    content: "Got it. What do you need help with?"
  },
  {
    role: "user",
    content: "How should I structure migrations?"
  },
];

const response = await puter.ai.chat(messages, {
  model: "google/gemini-2.5-pro",
});

console.log(response);

For an actual chat UI, keep updating the message array:

messages.push({
  role: "user",
  content: userInput,
});

const response = await puter.ai.chat(messages, {
  model: "google/gemini-2.5-flash",
});

messages.push({
  role: "assistant",
  content: response,
});

Gemini receives the full conversation history on each call.

Compare Gemini with other models

Puter exposes multiple model providers through one interface. You can benchmark the same prompt across models by changing only the model string:

const models = [
  "google/gemini-2.5-flash",
  "claude-sonnet-4-6",
  "gpt-5.5",
  "x-ai/grok-4.3",
];

const prompt = "Refactor this React component to use hooks: ...";

for (const model of models) {
  const start = performance.now();

  const response = await puter.ai.chat(prompt, { model });

  const elapsed = performance.now() - start;

  console.log(`${model}: ${elapsed.toFixed(0)}ms`);
  console.log(response);
  console.log("---");
}

Use this pattern to compare:

Latency
Output quality
Formatting consistency
Instruction following
Coding accuracy
Cost profile for the user

For many apps, Gemini Flash is a strong default when latency matters. For harder prompts, benchmark against other models before choosing a production default.

What Puter.js gives you

You get:

Gemini 2.5, 2.0, and 3 Flash variants
Gemini 2.5 Pro
Gemma 2, 3, and 4 models
Multi-turn conversations
Streaming responses
Image URL input
Temperature control
max_tokens
System prompts
Browser-based production usage

You may not get, depending on the current Puter version:

Native Gemini function calling
Code execution tools
Google Search grounding
Gemini’s full 2M-token context ceiling on every model
Server-side use without a browser session
Direct Google rate-limit visibility

For agentic workflows that require code execution, grounding, or strict server-side control, the official Google Gemini API is usually the better fit. For browser-based chat, Q&A, content generation, and vision tasks, Puter.js is often enough.

When to use Puter.js vs the official Gemini API

Use Puter.js when:

You are building a free public app.
You do not want token costs attached to your developer account.
You are prototyping quickly.
You do not want to configure Google Cloud.
You are building a static site, hackathon app, or browser extension.
Your users can sign in to Puter.

Use the official Gemini API when:

You need backend calls.
You need cron jobs, batch jobs, or webhooks.
You need code execution.
You need Search grounding.
You need the full Gemini Pro long-context ceiling.
You need a direct compliance or billing relationship with Google.
You need fine-tuning on your own dataset.
Your users will not accept a Puter sign-in step.

For a standalone Gemini 3 Flash walkthrough, see How to use the Gemini 3 Flash Preview API.

Test the integration with Apidog

Puter calls happen in the browser, so you cannot test them exactly like a backend REST API. A practical workflow is:

Build a small static Puter page.
Accept a prompt through a query parameter.
Use that page for browser-based prototype testing.
Use Apidog to validate the official Gemini API surface for a future migration.
Keep Puter and Gemini API configs as separate environments.

Example environment split:

Environment	Base URL
`puter-prototype`	Your localhost/static page URL
`gemini-prod`	`https://generativelanguage.googleapis.com/v1`

You can download Apidog, create both environments, and keep the same prompt payloads documented in one collection.

For more API testing patterns, see API testing tool for QA engineers.

Other free LLM paths through Puter

The same user-pays model works across other providers:

The implementation pattern is the same: keep the Puter script and switch the model value.

const response = await puter.ai.chat(
  "Summarize this issue for a developer changelog.",
  {
    model: "google/gemini-2.5-flash"
  }
);

FAQ

Is this truly unlimited?

Unlimited from the developer’s side, yes. Your app does not pay per token from your own Google account. The signed-in Puter user has whatever balance is available in their Puter account.

Do I need a Google account or Google Cloud project?

No. Puter handles the upstream relationship. Your app does not need a Google API key.

Can I use this in production?

Yes, for browser-based apps. The main product decision is whether your users are willing to sign in with Puter.

Does Gemini through Puter behave like the official API?

Puter calls Google’s API on the user’s behalf. Model behavior should be aligned with the underlying model. Latency may differ because Puter adds another layer between your browser app and the upstream model.

What about Gemini’s 2M-token context window?

Puter may not expose the full 2M-token ceiling for every model variant. If your app depends on extremely long context, use the official Google Gemini API.

Can I use Puter Gemini in a Discord bot or backend service?

Not cleanly. Puter.js is browser-first and assumes a logged-in user session. Backend services should use the official Gemini API directly.

What model should I default to?

Start with:

google/gemini-2.5-flash

Move to:

google/gemini-2.5-pro

for difficult reasoning tasks.

Use:

google/gemini-2.5-flash-lite

for high-volume classification or tagging.

Is Imagen image generation supported?

Puter exposes image generation through OpenAI image models such as gpt-image-2 and DALL-E variants today, not Imagen. See Get free unlimited GPT-5.5 API for that path.

Wrapping up

Puter.js is a practical way to add Gemini to browser-based apps without managing Google Cloud, API keys, or developer-side token billing.

The basic implementation is:

<script src="https://js.puter.com/v2/"></script>

const response = await puter.ai.chat(
  "Explain this code snippet.",
  {
    model: "google/gemini-2.5-flash"
  }
);

Use Puter.js for prototypes, hackathon builds, free public apps, static sites, and browser extensions. Use the official Gemini API when you need backend execution, fine-tuning, code tools, Search grounding, or maximum long-context support.

Build the request once in Apidog, compare Puter with the official API, and choose the path that matches your app.

DEV Community