Google’s Gemini family is a cost-effective frontier model line for high-volume workloads, but token costs can still grow quickly when a public app, side project, or hackathon demo gets real traffic. Puter.js changes the billing model: it exposes Gemini models such as 2.5 Pro, 2.5 Flash, 2.0 Flash, 3 Flash Preview, and Gemma models without requiring your Google API key. The end user signs in with Puter and covers usage from their account, while your app calls the model from the browser.
TL;DR
- Puter.js gives browser apps access to Gemini and Gemma models without a Google API key, Google Cloud project, or backend.
- Supported Gemini models include 2.5 Pro, 2.5 Flash, 2.5 Flash Lite, 2.0 Flash, 2.0 Flash Lite, 3 Flash Preview, plus dated previews.
- Supported Gemma models include Gemma 2, 3, and 4 in multiple sizes.
- Setup can be as small as one
<script>tag and oneputer.ai.chat()call. - Streaming, image input, and temperature control work from the browser.
- Usage is charged to the signed-in Puter user, not your developer account.
- Use Apidog to compare a Puter prototype with the official Gemini API before migrating.
How “free unlimited” works
Puter.js inverts the usual LLM billing flow.
Instead of your app holding a Google AI Studio key and paying for every token, the user signs in to Puter. Calls are made on behalf of that signed-in user and usage is charged against their Puter balance. New Puter accounts receive starter credit, and users can top up if they need more.
For developers, this means:
- No Google Cloud project
- No AI Studio API key
- No server-side token proxy
- No key rotation
- No billing exposure from public traffic
The trade-off: Puter.js is browser-first. It assumes a user session, so it is not a clean fit for backend-only jobs such as cron tasks, batch processors, or webhooks.
Step 1: Install Puter.js
For a static page, add the CDN script:
<script src="https://js.puter.com/v2/"></script>
That is enough to call Gemini from the browser.
For a bundled app, install the package:
npm install @heyputer/puter.js
Then import it:
import { puter } from '@heyputer/puter.js';
Step 2: Pick a Gemini or Gemma model
Choose the smallest model that handles your task well.
| Model ID | When to use |
|---|---|
google/gemini-2.5-pro |
Hard reasoning, complex analysis, long-context tasks |
google/gemini-2.5-flash |
Default choice for most app features |
google/gemini-2.5-flash-lite |
High-volume classification, tagging, simple Q&A |
google/gemini-2.0-flash |
Stable baseline with well-understood behavior |
google/gemini-3-flash-preview |
Latest preview model |
google/gemma-3-27b-it |
Open Gemma, instruction-tuned workflows |
google/gemma-4-31b-it |
Larger open Gemma option |
For most apps, start with:
google/gemini-2.5-flash
Use google/gemini-2.5-pro only when the prompt needs stronger reasoning. Use Lite variants for high-volume, low-complexity tasks like classification or tagging.
Step 3: Make your first Gemini call
Create an HTML file:
<!DOCTYPE html>
<html>
<body>
<script src="https://js.puter.com/v2/"></script>
<script>
puter.ai.chat(
"Explain machine learning in three sentences",
{
model: "google/gemini-2.5-flash"
}
).then(response => {
puter.print(response);
});
</script>
</body>
</html>
Open the file in a browser.
On first use, Puter handles authentication. The user signs in or creates a free Puter account, then the response is printed to the page.
No API key. No .env file. No backend route.
Step 4: Stream the response
For chat UIs, stream tokens as they arrive:
const response = await puter.ai.chat(
"Explain photosynthesis in detail",
{
model: "google/gemini-2.5-flash",
stream: true,
}
);
for await (const part of response) {
if (part?.text) {
outputDiv.innerHTML += part.text;
}
}
A simple UI target could look like this:
<div id="output"></div>
<script>
const outputDiv = document.getElementById("output");
</script>
Each part.text contains a response chunk. Append it to your UI so the user sees the answer appear progressively.
Step 5: Send image input to Gemini
Gemini supports multimodal prompts. With Puter.js, pass the prompt first, then the image URL:
puter.ai.chat(
"What do you see in this image? Describe colors, objects, and mood.",
"https://assets.puter.site/doge.jpeg",
{
model: "google/gemini-2.5-flash"
}
).then(response => {
puter.print(response);
});
Practical use cases include:
- Alt-text generation
- Visual question answering
- Screenshot analysis
- OCR-style extraction
- Accessibility tooling
- Product image tagging
- Diagram explanation
Step 6: Tune temperature
Pass model parameters in the options object:
const response = await puter.ai.chat(
"Write a creative short story about a robot chef",
{
model: "google/gemini-2.5-flash",
temperature: 0.2,
}
);
console.log(response);
Use lower values for deterministic output:
temperature: 0.0
Good for:
- JSON generation
- Classification
- Extraction
- Factual answers
- Structured summaries
Use higher values for more variation:
temperature: 0.8
Good for:
- Brainstorming
- Creative writing
- Marketing copy
- Ideation
Step 7: Build multi-turn conversations
Pass an array of messages instead of a single string:
const messages = [
{
role: "user",
content: "I am building a Next.js app with Postgres."
},
{
role: "assistant",
content: "Got it. What do you need help with?"
},
{
role: "user",
content: "How should I structure migrations?"
},
];
const response = await puter.ai.chat(messages, {
model: "google/gemini-2.5-pro",
});
console.log(response);
For an actual chat UI, keep updating the message array:
messages.push({
role: "user",
content: userInput,
});
const response = await puter.ai.chat(messages, {
model: "google/gemini-2.5-flash",
});
messages.push({
role: "assistant",
content: response,
});
Gemini receives the full conversation history on each call.
Compare Gemini with other models
Puter exposes multiple model providers through one interface. You can benchmark the same prompt across models by changing only the model string:
const models = [
"google/gemini-2.5-flash",
"claude-sonnet-4-6",
"gpt-5.5",
"x-ai/grok-4.3",
];
const prompt = "Refactor this React component to use hooks: ...";
for (const model of models) {
const start = performance.now();
const response = await puter.ai.chat(prompt, { model });
const elapsed = performance.now() - start;
console.log(`${model}: ${elapsed.toFixed(0)}ms`);
console.log(response);
console.log("---");
}
Use this pattern to compare:
- Latency
- Output quality
- Formatting consistency
- Instruction following
- Coding accuracy
- Cost profile for the user
For many apps, Gemini Flash is a strong default when latency matters. For harder prompts, benchmark against other models before choosing a production default.
What Puter.js gives you
You get:
- Gemini 2.5, 2.0, and 3 Flash variants
- Gemini 2.5 Pro
- Gemma 2, 3, and 4 models
- Multi-turn conversations
- Streaming responses
- Image URL input
- Temperature control
max_tokens- System prompts
- Browser-based production usage
You may not get, depending on the current Puter version:
- Native Gemini function calling
- Code execution tools
- Google Search grounding
- Gemini’s full 2M-token context ceiling on every model
- Server-side use without a browser session
- Direct Google rate-limit visibility
For agentic workflows that require code execution, grounding, or strict server-side control, the official Google Gemini API is usually the better fit. For browser-based chat, Q&A, content generation, and vision tasks, Puter.js is often enough.
When to use Puter.js vs the official Gemini API
Use Puter.js when:
- You are building a free public app.
- You do not want token costs attached to your developer account.
- You are prototyping quickly.
- You do not want to configure Google Cloud.
- You are building a static site, hackathon app, or browser extension.
- Your users can sign in to Puter.
Use the official Gemini API when:
- You need backend calls.
- You need cron jobs, batch jobs, or webhooks.
- You need code execution.
- You need Search grounding.
- You need the full Gemini Pro long-context ceiling.
- You need a direct compliance or billing relationship with Google.
- You need fine-tuning on your own dataset.
- Your users will not accept a Puter sign-in step.
For a standalone Gemini 3 Flash walkthrough, see How to use the Gemini 3 Flash Preview API.
Test the integration with Apidog
Puter calls happen in the browser, so you cannot test them exactly like a backend REST API. A practical workflow is:
- Build a small static Puter page.
- Accept a prompt through a query parameter.
- Use that page for browser-based prototype testing.
- Use Apidog to validate the official Gemini API surface for a future migration.
- Keep Puter and Gemini API configs as separate environments.
Example environment split:
| Environment | Base URL |
|---|---|
puter-prototype |
Your localhost/static page URL |
gemini-prod |
https://generativelanguage.googleapis.com/v1 |
You can download Apidog, create both environments, and keep the same prompt payloads documented in one collection.
For more API testing patterns, see API testing tool for QA engineers.
Other free LLM paths through Puter
The same user-pays model works across other providers:
- Get free unlimited Claude API
- Get free unlimited GPT-5.5 API
- How to use Grok 4.3 for free
- Get free unlimited DeepSeek API
The implementation pattern is the same: keep the Puter script and switch the model value.
const response = await puter.ai.chat(
"Summarize this issue for a developer changelog.",
{
model: "google/gemini-2.5-flash"
}
);
FAQ
Is this truly unlimited?
Unlimited from the developer’s side, yes. Your app does not pay per token from your own Google account. The signed-in Puter user has whatever balance is available in their Puter account.
Do I need a Google account or Google Cloud project?
No. Puter handles the upstream relationship. Your app does not need a Google API key.
Can I use this in production?
Yes, for browser-based apps. The main product decision is whether your users are willing to sign in with Puter.
Does Gemini through Puter behave like the official API?
Puter calls Google’s API on the user’s behalf. Model behavior should be aligned with the underlying model. Latency may differ because Puter adds another layer between your browser app and the upstream model.
What about Gemini’s 2M-token context window?
Puter may not expose the full 2M-token ceiling for every model variant. If your app depends on extremely long context, use the official Google Gemini API.
Can I use Puter Gemini in a Discord bot or backend service?
Not cleanly. Puter.js is browser-first and assumes a logged-in user session. Backend services should use the official Gemini API directly.
What model should I default to?
Start with:
google/gemini-2.5-flash
Move to:
google/gemini-2.5-pro
for difficult reasoning tasks.
Use:
google/gemini-2.5-flash-lite
for high-volume classification or tagging.
Is Imagen image generation supported?
Puter exposes image generation through OpenAI image models such as gpt-image-2 and DALL-E variants today, not Imagen. See Get free unlimited GPT-5.5 API for that path.
Wrapping up
Puter.js is a practical way to add Gemini to browser-based apps without managing Google Cloud, API keys, or developer-side token billing.
The basic implementation is:
<script src="https://js.puter.com/v2/"></script>
const response = await puter.ai.chat(
"Explain this code snippet.",
{
model: "google/gemini-2.5-flash"
}
);
Use Puter.js for prototypes, hackathon builds, free public apps, static sites, and browser extensions. Use the official Gemini API when you need backend execution, fine-tuning, code tools, Search grounding, or maximum long-context support.
Build the request once in Apidog, compare Puter with the official API, and choose the path that matches your app.
Top comments (0)