DEV Community

Anup Karanjkar
Anup Karanjkar

Posted on • Originally published at wowhow.cloud

Nano Banana Pro (Gemini 3 Pro Image): Developer Guide & API 2026

Google shipped Nano Banana Pro to general availability in June 2026 and nobody made a big deal of it. The I/O keynote spotlight went to Gemini Omni and Managed Agents. But for anyone building an app that generates or edits images, the model formerly known as Gemini 3 Pro Image is now the most capable reasoning-driven image model with a public API — at $0.134 per 1K or 2K image, $0.24 for 4K.

The name is a Google internal codename that leaked and stuck. Nano Banana 2 (Gemini 3.1 Flash Image) is the cheaper, faster sibling. Nano Banana Pro is the high-quality lane. Both are now generally available in the Gemini API.

What Nano Banana Pro Actually Does

Most image generation models work the same way: you send a text prompt, they return pixels. Nano Banana Pro adds a layer that matters if you build anything beyond basic generation: native image editing through a joint reasoning-generation process. You don't patch pixels externally. You send the original image plus an instruction in natural language, and the model applies changes while preserving everything you didn't ask it to touch.

That sounds incremental. The specific thing it does better than the alternatives is text rendering. Accurate text inside generated images — product labels, UI mockups, infographic callouts, signage — has been an industry failure mode since the original Stable Diffusion era. Nano Banana Pro is the first model where "add the text 'Sale' in bold white on the product" reliably produces readable text rather than decorative gibberish.

Google grounds its image generation in Search data, which means when you ask for "the Eiffel Tower at sunset, autumn 2026" you get factual geometry and verified lighting, not an impressionist interpretation. For factual data visualizations and product mockups, this grounding is genuinely useful. For surreal or stylized output, it's a constraint — Imagen 4 Ultra performs better there.

Model Variants and When to Use Each

Model API ID Speed Best For Price/image

| Nano Banana Pro | gemini-3-pro-image-preview | 2–5s | Text rendering, editing, complex scenes | $0.134 (2K) |

| Nano Banana 2 | gemini-3-1-flash-image | <2s | High-volume, quick iterations | $0.02–$0.04 |

| Imagen 4 Ultra | imagen-4.0-ultra-generate-001 | 15–30s | Photorealism, portraits, product photography | $0.06 |

The speed gap is the real story. Nano Banana Pro generates in 2–5 seconds. Imagen 4 Ultra takes 15–30 seconds. A designer exploring 20–30 creative directions with Nano Banana Pro generates all of them in the time Imagen 4 Ultra takes to produce 3. For iterative workflows — agency mockups, A/B variant generation, UI wireframe illustration — that throughput difference compounds quickly.

The quality trade-off is real too. In independent user testing from June 2026, 78% of participants preferred Imagen 4 Ultra for portrait photography (skin texture, eye detail), and 73% chose it for product shots (material accuracy, lighting). But 54% preferred Nano Banana Pro for stylized and creative output. The honest read: if you need photographic realism for headshots or luxury product shots, Imagen 4 Ultra wins. If you need volume, text accuracy, or editing control, Nano Banana Pro wins.

API Setup and Generation Code

You need Python SDK version 1.52+ or the JavaScript/TypeScript SDK version 1.30+. The generation call is synchronous — unlike Veo 3.1's async video generation, images come back directly:

from google import genai
from google.genai import types
import base64

client = genai.Client()

response = client.models.generate_images(
    model='gemini-3-pro-image-preview',
    prompt='A close-up product shot of a matte black coffee mug with the text "FOCUS" in minimalist serif font, white background, studio lighting',
    config=types.GenerateImagesConfig(
        number_of_images=1,
        output_mime_type='image/png',
        aspect_ratio='1:1',
    )
)

# Save the image
for i, image in enumerate(response.generated_images):
    with open(f'output_{i}.png', 'wb') as f:
        f.write(image.image.image_bytes)
Enter fullscreen mode Exit fullscreen mode

The aspect_ratio parameter accepts '1:1', '16:9', '9:16', '4:3', and '3:4'. For 4K output, set output_image_config={'width': 4096, 'height': 4096} — billing jumps to $0.24 per image at 4K.

Image Editing: The Part Nobody Talks About

The editing model uses a separate endpoint ID: gemini-3-pro-image-preview-edit. You pass the original image as base64 alongside the instruction. The model preserves everything you didn't explicitly ask to change, which makes it genuinely useful for iterative design work:

from google import genai
from google.genai import types
import base64

client = genai.Client()

# Load existing image
with open('product_shot.png', 'rb') as f:
    image_bytes = base64.b64encode(f.read()).decode()

response = client.models.generate_images(
    model='gemini-3-pro-image-preview-edit',
    prompt='Change the background to a warm wooden kitchen countertop, keep the mug identical',
    config=types.GenerateImagesConfig(
        reference_images=[
            types.ReferenceImage(
                reference_image=types.Image(
                    image_bytes=base64.b64decode(image_bytes),
                    mime_type='image/png'
                )
            )
        ],
        number_of_images=1,
    )
)

for i, image in enumerate(response.generated_images):
    with open(f'edited_{i}.png', 'wb') as f:
        f.write(image.image.image_bytes)
Enter fullscreen mode Exit fullscreen mode

The catch: complex inpainting (editing a specific masked region while leaving the rest untouched) still behaves inconsistently if the instruction is ambiguous. "Change the background to wood" works well because the foreground subject is unambiguous. "Make the shadow slightly softer" is less reliable — the model occasionally interprets it as "change the entire lighting setup." Be literal with editing instructions. If you want targeted changes, describe exactly what you want and what should stay the same.

Vertex AI vs Gemini API: Which Path

Two API surfaces exist. The Gemini API (ai.google.dev) is simpler: one API key, no project configuration. The Vertex AI path requires GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION, and GOOGLE_GENAI_USE_VERTEXAI=True. Vertex adds enterprise features — VPC Service Controls, data residency, CMEK — plus access to the Batch/Flex route pricing.

If you're building a prototype or internal tool: use the Gemini API. If you're building a production app with >500 image generations per day, run the numbers on Vertex Batch mode first. Batch/Flex pricing cuts standard rates in half — $0.067 per 2K image instead of $0.134 — at the cost of async delivery. For non-realtime workflows (nightly product image refresh, bulk content generation), the savings stack up fast. 1,000 images per day at standard pricing costs $49/day. At Batch pricing: $24.50/day. That's $893/month savings on a modest workload.

SynthID: The Watermark You Can't See

Every image generated by Nano Banana Pro ships with an invisible SynthID watermark embedded in the pixel data — no visible mark, no impact on image quality, but detectable by Google's verification tools. This is non-optional. You cannot generate without the watermark.

For most use cases, this is a feature: you can verify your own AI-generated assets, comply with emerging disclosure requirements, and trace misuse. The one scenario where it matters negatively: if a client explicitly requires undetectable AI image generation for contractual or competitive reasons, Nano Banana Pro is not the right tool. Alternatives like Midjourney v8 or Flux Pro don't embed detectable watermarks in the same way.

Google's SynthID verification API is also public, so third-party tools can detect Nano Banana Pro output. Factor that into workflows where the AI-generated nature of images needs to stay undisclosed.

Pricing Reality Check

The per-image pricing hides some complexity. $0.134 per image applies at 1K and 2K resolution. That's because both consume approximately 1,120 output tokens in Google's billing model, and output pricing is $12.00 per million tokens. 4K images consume around 2,000 tokens, pricing them at $0.024 per thousand — which rounds to the $0.24 published rate.

The token-based billing matters if you're mixing image and text generation in a single session. Input tokens (your prompt + any reference images) bill at $2.00 per million. Complex editing prompts that include high-resolution reference images can add meaningful token cost on top of the per-image rate. For a batch pipeline: benchmark your average session token count before committing to volume pricing tiers.

Where Nano Banana Pro Fits in a Real Workflow

Three scenarios where it's clearly the right choice right now.

UI and product mockups at scale. If you're generating dozens of marketing variants, social media assets, or app screenshots, the 2–5 second generation time and reliable text rendering make Nano Banana Pro the only reasonable option. Imagen 4 is too slow for iteration; DALL-E 4 still struggles with text in most configurations.

Content production pipelines. Blogs, newsletters, and content sites that need custom illustrations for every article can automate thumbnail and header image generation. At $0.134 per image and 3 seconds per call, a site publishing 10 articles per day spends $1.34/day on image generation — effectively replacing stock photo subscriptions.

Product image variation. E-commerce teams can generate background variants, seasonal styling, and locale-specific adaptations from a single hero product shot. The editing model preserves product identity across variations with reasonable consistency.

Where it's not the right choice: photorealistic human portraits (Imagen 4 Ultra), anything requiring the surreal aesthetic typical of Midjourney v8, or use cases where SynthID detectability is a deal-breaker. The model also has no video output capability — that's Veo 3.1's lane, and the two models are separate API calls with no native chaining.

The Actual Decision Point

Nano Banana Pro is generally available today. The API is stable, pricing is published, and the editing endpoint works in production. It is not the highest-quality image model available — Imagen 4 Ultra beats it on photorealism, and Midjourney v8 beats it on artistic range. What it is: the fastest, most controllable, best-at-text-rendering model with a Gemini API key and no waitlist.

If your use case requires text in generated images, or needs volume throughput above 100 images per day at acceptable quality, start here. Run it against your specific prompts before committing. The 200 free images per day on the Google AI Studio free tier give you enough runway to evaluate it before your first invoice.

Originally published at wowhow.cloud

Top comments (0)