We’re in an incredibly fun era for building. The friction between "I have a weird idea" and "I have a working prototype" is basically zero, especially with the release of Gemma 4, which is now available via the Gemini API and Google AI Studio.
Whether you want to deeply inspect model reasoning or you're just trying to build a pipeline to auto-caption an archive of historical web comics and obscure wiki trivia, you can now hit open-weights models directly from your code without needing to provision a massive GPU rig first.
Here’s a look at the architecture, how to use it, and how to go from the UI to production code in one click.
The Models: Apache 2.0, MoE, and 256k Context
Before we look at the API, the biggest detail about Gemma 4 is the license: it's released under Apache 2.0. This means total developer flexibility and commercial permissiveness. You can prototype with the Gemini API, and eventually run it anywhere from a local rig to your own cloud infrastructure.
The benchmarks are also genuinely impressive. The 31B model is currently sitting at #3 on the Arena AI text leaderboard, out-competing models massively larger than it.
When you drop into Google AI Studio, you'll see two primary models in the picker:
- Gemma 4 31B IT: The flagship dense model. It has a massive 256K context window — perfect for dumping in entire codebases, massive log files, or huge JSON datasets.
- Gemma 4 26B A4B IT: A Mixture-of-Experts (MoE) architecture. It's highly efficient, only activating roughly 4 billion parameters per inference. High throughput, lower cost.
(Note: There are also E2B and E4B "Edge" models meant for local on-device deployment that feature native audio input, but we're focusing on the AI Studio API today. I recommend that you go download and test the smaller models locally, though!)
Multimodal Inputs + Chain of Thought
Text is great, but Gemma 4 is natively multimodal. Let's say you want to build a pipeline to reverse-engineer prompts from a folder of distinct images.
In AI Studio, you can drop images directly into the playground alongside your prompt.
The Prompt:
"Generate descriptions of each of these images, and a prompt that I could give to an image generation model to replicate each one."
Because the Gemma models support advanced reasoning, after you click Run, you can click the Thoughts toggle to literally step through the model's chain-of-thought process before it generates its final output.
If you love understanding the "why" behind model logic, or you're trying to debug why an agent went off the rails, this level of transparency is incredibly useful.
Shipping the code
The bridge between "playing around in a UI" and "writing a script" should be exactly one click. Once you have your prompt, your images, and your reasoning configuration dialed in perfectly, click the Get Code button in the top right corner.
You can grab the exact payload required for TypeScript, Python, Go, or standard cURL. Best of all, if you toggle "Include prompt/history", it automatically handles the base64 encoding of your images and explicitly sets the thinkingConfig parameters in the code for you.
Here's what the TypeScript output looks like when you want to use Gemma 4's reasoning capabilities via the SDK:
import { GoogleGenAI } from '@google/genai';
// Initialize the client
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// Configure Gemma 4 reasoning logic
const config = {
thinkingConfig: {
thinkingLevel: 'HIGH',
}
};
const response = await ai.models.generateContent({
model: 'gemma-4-31b-it',
contents: 'Tell me a fascinating, obscure story from internet history.',
config: config
});
console.log(response.text);
Go build open-source things!
Having Apache 2.0 open-weights models accessible via a fast API completely changes the calculus for weekend projects. Whether you're building a script to summarize deeply technical whitepapers, analyze visual data natively, or wire up autonomous multi-step code generation agents—the friction is basically gone.
I can't wait to see what you build! Let me know in the comments what rabbit hole you're pointing Gemma at first. Happy hacking this weekend. :)




Top comments (1)
How does the fact that these are open-weights models change things for me as a developer of these kinds of apps?