Yesterday, when I was trying to draw an illustration that I usually insert into my note articles, I suddenly came across the words "Nano Banana 2." Huh? Wasn't it called Nano Banana Pro? Suddenly it becomes "2". Why? Since when?
Upon further investigation, I discovered that on February 26, 2026, Google suddenly announced its latest image-generation AI model. This model was announced in a surprise move by Google. I only noticed it the next day, and was blown away by how quickly it was released…!
That's Nano Banana 2. I tried it out right away and was simply blown away by the speed of its generation and the degree of evolution. The Nano Banana Pro I've been using until now is good, but the "2" isn't bad either.
The cost of generating each image has been significantly reduced to about half that of Nano Banana Pro, and resolutions up to 4K are supported. There have also been improvements in practical aspects, including more accurate text rendering and greater character consistency.
So, let me give you a quick demo of the live chatbot to show you how everything works.
Check Video :
I'll start with the sidebar. There are three main settings. First, Resolution controls the size of the generated image. Higher resolution gives you better quality, but it also makes the API calls slower and more expensive.
Second, Text Context decides whether the full extracted text of the PDF gets added to the prompt. When this option is on, the model can read the entire document and better understand the content before making edits.
In Edit Mode, you choose the pages you want to change and write a prompt for each page. You can add as many page–prompt pairs as you want. If you add the same page more than once, the agent automatically merges the prompts into a single instruction.
You can also select style reference pages before running the edits. These are pages from the same PDF that Gemini uses as a visual guide. This helps the edited slides keep the same fonts, colors, and layout as the rest of the document.
When you click Run, the agent converts each selected page into a high-resolution image using a tool called Poppler. Then it sends all page edits to Google Gemini at the same time in parallel. That means editing five pages usually takes about the same time as editing just one.
Gemini receives the page image, the style reference images, your prompt, and optionally the full document text. It processes all of this information and generates a new image of the slide with your requested changes. Sometimes it also returns a short note explaining what it modified.
Once Gemini returns the updated image, the agent runs Tesseract OCR. Tesseract scans the image and embeds a hidden text layer behind it. This turns the image back into a real PDF page, so you can still search, highlight, and copy text from it.
As each page finishes, the agent shows a side-by-side preview in the UI. You can immediately compare the original page with the edited version and see exactly what changed before downloading anything.
After all pages are processed, the agent rebuilds the full PDF. It goes through every page of the original document and replaces only the edited ones. Each replacement keeps the same dimensions as the original page, so the layout stays perfectly aligned.
In Add Mode, instead of editing a page, you create a brand-new slide. You choose where to insert it and describe what you want it to look like. The system then generates the slide from scratch using your style references as a visual guide. If you don't select any style references, the system automatically uses page 1 of the document.
The generated slide follows the same workflow. Tesseract adds a searchable text layer, the agent inserts the slide into the correct position in the PDF, and you get a preview before downloading.
This code will be available on my Patreon because it took me a lot of time and effort. If you enjoy what I create and want to see more projects like this, supporting me on Patreon helps me keep making high-quality content. I would truly appreciate your support
Why pair Claude with Nano Banana?
Claude is an excellent text and code generation AI, but it cannot generate images by itself. On the other hand, Nano Banana is good at image generation but has limitations in managing complex contexts and iterative improvement instructions.
Combining the two:
Claude understands your intent and generates the optimal prompt → Nano Banana outputs an image
Claude evaluates the generated results and identifies problems → Autonomously regenerates and corrects them
Claude maintains context during long sessions → maintains consistent workflow
In fact, when developers tried it, they were able to complete a project that involved repeatedly generating over 100 app icons for around $45.
How Nano Banana 2 works
Nano Banana 2 uses a Multimodal Diffusion Transformer (MMDiT) architecture with a parameter scale of approximately 1.8 billion (1.8B) and Dynamic Quantisation-Aware Training (DQAT) to minimise memory footprint while maintaining high output quality.
Grouped-Query Attention (GQA) is introduced to speed up inference.
GQA is a technology that significantly reduces the amount of data movement during inference by sharing key-value pairs across groups. This allows it to run continuously without thermal throttling, even on the NPU of a mobile device.
Furthermore, instead of simple pattern matching like the original Nano Banana, the new Nano uses a multi-stage loop of "Plan → Evaluate → Improve ." First, it analyses the prompt's intent and creates a generation plan.
Next, it performs character-by-character verification of the text and checks the consistency of spatial placement. If there are any problems, they are improved before proceeding to finalize the pixels.
This loop enables complex multi-object scenes and accurate text rendering.
What has changed the most? Three points
- The biggest improvement is the ability to browse web information in real time. Gemini performs web searches and generates real-time information and images while adding a new feature called "World Knowledge."
This feature, not available in Nano Banana Pro, allows for more accurate depictions of real-world places, people, and products. It seems to work particularly well with infographics and illustrations.
- Further improvements in text rendering: Text rendering was already quite good in Nano Banana Pro, but Nano Banana 2 introduces a new system that verifies each character in a three-step loop: "Plan → Evaluate → Improve."
Even when Chinese and numbers are mixed, the text no longer breaks down, and the improvements are noticeable when you try it out.
- 4K Support Exceeds the Pro Limit. While the Nano Banana Pro's maximum resolution was 2K, the Nano Banana 2 now supports 4K. The number of aspect ratios has also increased to 14 (including 9:16 and 21:9), making it suitable for everything from social media posts to cinematic banners.
🤔 So which one should I use?
The conclusion of the usage seems to be something like this.
When to use Nano Banana 2
I want to create AI illustrations for posting on SNS and notes.
I want to generate high-quality images for free.
4K resolution required
I want to create accurate illustrations and infographics that reference web information.
I want to generate a large amount of data quickly.
Situations where Nano Banana Pro is recommended
Highest quality photorealism required
Complex commercial creative production
Tasks that require professional-level precision
It seems like the Nano Banana 2 will be able to handle most of my everyday creative and AI illustration needs. I think the Pro is more of a trump card for when I really need it!
Let's start coding:
I create an extract_full_text function that reads a PDF file and pulls out all the text inside it. First, it runs a fast external tool that converts the PDF into plain text while keeping the page layout as close as possible to the original slides.
After that, the text is split into separate pages using a special page-break marker. The function then goes through each page one by one and skips any pages that are empty.
Next, it cleans the text by removing extra spaces at the beginning and end. If a page has more than 2000 characters, the text is cut down and marked as truncated so it stays shorter.
def extract_full_text(pdf_path: str) -> str:
"""Extracts the full text from a PDF using pdftotext (via subprocess for speed/layout)."""
try:
# Using -layout to preserve some spatial structure which is good for slides
result = subprocess.run(
['pdftotext', '-layout', pdf_path, '-'],
capture_output=True,
text=True,
check=True
)
raw_text = result.stdout
# Split by form feed to get pages
pages = raw_text.split('\f')
formatted_pages = []
for i, page_text in enumerate(pages):
# Skip empty pages at the end if any
if not page_text.strip():
continue
# Strip whitespace
clean_text = page_text.strip()
# Truncate to 2000 chars
if len(clean_text) > 2000:
clean_text = clean_text[:2000] + "...[truncated]"
# Wrap in page tags (1-indexed)
formatted_pages.append(f"<page-{i+1}>\n{clean_text}\n</page-{i+1}>")
return "<document_context>\n" + "\n".join(formatted_pages) + "\n</document_context>"
except subprocess.CalledProcessError as e:
print(f"Error extracting text: {e}")
return ""
After that i made a function that converts an image into a single-page PDF. It uses an OCR tool to read the text in the image and create a PDF that includes a hidden text layer. This hidden text makes the PDF searchable and easier to process later. Then it saves the generated PDF file to the location you provided.
def rehydrate_image_to_pdf(image: Image.Image, output_pdf_path: str):
"""
Converts an image to a single-page PDF with a hidden text layer using Tesseract.
This is the 'State Preservation' step.
"""
pdf_bytes = pytesseract.image_to_pdf_or_hocr(image, extension='pdf')
with open(output_pdf_path, 'wb') as f:
f.write(pdf_bytes)
Next, I create a function that replaces specific pages in a PDF while keeping the rest of the document the same. First, it opens the original PDF and prepares a new file where the final version will be saved. Then it goes through each page in the document one by one.
If a page number appears in the replacement list, the function loads the new page that should replace it. It checks the size of the original page and resizes the new page so both pages match in width and height.
After that, the new page is added to the output document instead of the old one. If the page does not need replacement, the original page is simply copied to the new file.
def batch_replace_pages(original_pdf_path: str, replacements: dict[int, str], output_pdf_path: str):
"""
Replaces multiple pages in the original PDF.
replacements: dict mapping page_number (1-indexed) -> path_to_new_single_page_pdf
"""
reader = PdfReader(original_pdf_path)
writer = PdfWriter()
for i in range(len(reader.pages)):
page_num = i + 1
if page_num in replacements:
# This page needs replacement
original_page = reader.pages[i]
original_width = original_page.mediabox.width
original_height = original_page.mediabox.height
new_pdf_path = replacements[page_num]
new_reader = PdfReader(new_pdf_path)
new_page = new_reader.pages[0]
# Resize new page to match original dimensions
new_page.scale_to(width=float(original_width), height=float(original_height))
writer.add_page(new_page)
else:
# Keep original page
writer.add_page(reader.pages[i])
with open(output_pdf_path, 'wb') as f:
writer.write(f)
Next i made a function that adds a new page into an existing PDF at a specific position. First, it opens the original PDF and prepares a new document where the final version will be saved.
It then checks the size of the first page so the new page can match the same width and height. After that, the function loads the new page and resizes it to match the document's page size. If the position is set to 0, the new page is inserted at the beginning of the document.
Otherwise, the function goes through each page and inserts the new page right after the chosen page number. Finally, the updated PDF with the inserted page is saved to the output file.
def insert_page(original_pdf_path: str, new_page_pdf_path: str, after_page: int, output_pdf_path: str):
"""
Inserts a new page into the PDF after the specified page number.
after_page: 0 to insert at the beginning, or page number (1-indexed) to insert after.
"""
reader = PdfReader(original_pdf_path)
writer = PdfWriter()
# Get dimensions from the first page as reference
reference_page = reader.pages[0]
ref_width = reference_page.mediabox.width
ref_height = reference_page.mediabox.height
# Load the new page
new_reader = PdfReader(new_page_pdf_path)
new_page = new_reader.pages[0]
new_page.scale_to(width=float(ref_width), height=float(ref_height))
# Insert at beginning
if after_page == 0:
writer.add_page(new_page)
# Add all original pages, inserting the new one at the right position
for i in range(len(reader.pages)):
writer.add_page(reader.pages[i])
# Insert after this page if it matches
if i + 1 == after_page:
writer.add_page(new_page)
with open(output_pdf_path, 'wb') as f:
writer.write(f)
Finally i made a function to generate a new slide image using a user prompt and optional style references. It sends the instructions to an AI model, which creates the image and optional text. The function then extracts the generated image and text and returns them.
def generate_new_slide(
style_reference_images: List[Image.Image],
user_prompt: str,
full_text_context: str = "",
resolution: str = "4K",
enable_search: bool = False
) -> Tuple[Image.Image, Optional[str]]:
"""
Generates a completely new slide based on style references and a prompt.
Returns a tuple of (generated PIL Image, optional text response).
"""
client = get_client()
# Construct the prompt
prompt_parts = []
prompt_parts.append(user_prompt)
if style_reference_images:
prompt_parts.append("Match the visual style (fonts, colors, layout) of these reference images:")
for img in style_reference_images:
prompt_parts.append(img)
if full_text_context:
prompt_parts.append(f"DOCUMENT CONTEXT:\n{full_text_context}\n")
# Build config - allow both text and image output
config = types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
image_config=types.ImageConfig(
image_size=resolution
)
)
if enable_search:
config.tools = [{"google_search": {}}]
# Call the model
try:
response = client.models.generate_content(
model='gemini-3-pro-image-preview',
contents=prompt_parts,
config=config
)
except Exception as e:
error_msg = str(e).lower()
if "quota" in error_msg or "billing" in error_msg or "payment" in error_msg:
raise RuntimeError(
"Gemini API Error: This tool requires a PAID API key with billing enabled.\n"
"Free tier keys do not support image generation. Please:\n"
"1. Visit https://aistudio.google.com/api-keys\n"
"2. Enable billing on your Google Cloud project\n"
f"Original error: {e}"
)
elif "api key" in error_msg or "authentication" in error_msg or "unauthorized" in error_msg:
raise RuntimeError(
"Gemini API Error: Invalid API key.\n"
"Please check that your GEMINI_API_KEY environment variable is set correctly.\n"
f"Original error: {e}"
)
else:
raise RuntimeError(f"Gemini API Error: {e}")
# Extract image and text from the response
generated_image = None
response_text = None
if response.candidates and response.candidates[0].content.parts:
for part in response.candidates[0].content.parts:
if part.inline_data:
# Convert bytes to PIL Image
from io import BytesIO
generated_image = Image.open(BytesIO(part.inline_data.data))
elif part.text:
response_text = part.text
if not generated_image:
raise RuntimeError("No image generated by the model.")
return generated_image, response_text
What I thought after using it in the morning
To be honest, I thought "Pro is enough, isn't it?", but when I actually tried using them side by side, the difference was greater than I expected.
In particular, 4K support and web information reference are strengths unique to Nano Banana 2, and I think they are improvements not found in Pro. Also, text comprehension has improved.
In fact, today's thumbnail and illustration were both created in 2 and generated in one go.
The fact that the cost has been halved is also a welcome update, especially for AI generation users who like to experiment a lot.
Gemini has an advantage in terms of generation speed compared to Midjourney, and the fact that it's free to use is one of its strengths.
I would highly appreciate it if you
❣ Join my Patreon: https://www.patreon.com/GaoDalie_AI
Book an Appointment with me: https://topmate.io/gaodalie_ai
Support the Content (every Dollar goes back into the video):https://buymeacoffee.com/gaodalie98d
Subscribe to the Newsletter for free: https://substack.com/@gaodalie
Top comments (0)