Aviad Rozenhek

Posted on Nov 3

Building gemini-imagen with Claude Code: How AI Became My Pair Programming Partner

#ai #claude #hacking #programming

Over the past week, I built gemini-imagen — a comprehensive CLI and Python library for Google Gemini's image generation capabilities. But here's the twist: I didn't build it alone. I built it with Claude Code, Anthropic's AI-powered coding assistant.

This isn't a story about AI replacing developers. It's about AI amplifying developer productivity in ways I couldn't have imagined. Let me show you exactly how Claude Code and I worked together to ship a production-ready open-source library in record time.

Day 1: From Zero to Basic Library (October 30)

8:20 AM - The First Commit

I started with a vision: create a better developer experience for Google's Gemini image generation API. I asked Claude Code:

"Help me create a Python wrapper for Google Gemini's image generation capabilities"

Within 30 minutes, we had:

Project structure with pyproject.toml
Basic GeminiImageGenerator class
Type-safe models with Pydantic
Initial test suite

The Magic Moment: Claude Code didn't just write code — it suggested using Pydantic v2 for validation, structured the project following Python best practices, and immediately set up proper testing with pytest.

10:29 AM - CI/CD from the Start

Most developers put off CI/CD. Not with Claude Code. I said:

"Let's set up proper CI/CD"

Claude Code created:

GitHub Actions workflows for linting, testing, and building
Pre-commit hooks with ruff and mypy
Dependabot configuration for automated dependency updates
Coverage reporting with codecov

All properly configured and working on the first push.

12:27 PM - Modern Python Tooling

I mentioned I wanted to use modern Python tooling. Claude Code immediately:

Configured uv as the package manager (faster than pip)
Set up ruff for both linting and formatting (replacing 5+ tools)
Added mypy with proper configuration
Created a Makefile for common tasks

Lesson Learned: Claude Code knows the current best practices. It suggested using uv before I even knew about it.

Afternoon - The SDK Migration

Google had just released their new google-genai SDK. The old examples I found online used google-generativeai. I asked:

"Should we use the new google-genai SDK?"

Claude Code not only said yes, but:

Explained the benefits (unified API, better typing)
Migrated all the code
Updated all tests
Fixed breaking changes
Verified everything worked

3 commits later, we were on the latest SDK with full test coverage.

Day 2-3: Adding Real-World Features (October 31 - November 1)

The S3 Integration Challenge

I wanted S3 support for storing generated images. This is where Claude Code really shined:

Me: "Add S3 integration with async support"

Claude Code:

Added boto3 and aiobotocore as optional dependencies
Created s3_utils.py with async upload/download
Made S3 URIs first-class citizens (anything starting with s3://)
Added HTTP URL generation for public access
Wrote comprehensive tests with mocked S3 clients
Updated documentation with examples

But here's where it got interesting: Claude Code ran into a dependency conflict with aioboto3. Without me asking, it:

Researched the issue
Found that aiobotocore was the better choice
Refactored the code
Updated tests
Verified everything worked

3 version bumps in one afternoon as we iterated on the perfect solution.

Async Parallelization Breakthrough

I mentioned that loading/saving multiple images could be slow. Claude Code suggested:

"We should parallelize I/O operations with asyncio.gather"

It then:

Refactored image loading to use parallel async
Refactored image saving to use parallel async
Created a benchmark script to measure improvements
Wrote documentation showing 5x speedup
Added visual diagrams explaining the performance gains

I didn't ask for the benchmarks or diagrams — Claude Code proactively created them to demonstrate the value.

Day 4-5: Safety and Observability (November 2)

Safety Settings Deep Dive

Google's safety filtering is complex. I asked:

"How do we control sensitivity settings?"

Claude Code:

Read the Google Gemini documentation
Explained the safety categories and thresholds
Implemented a type-safe SafetySetting API
Added preset configurations (strict, default, relaxed, none)
Created comprehensive documentation in docs/SAFETY_FILTERING.md
Wrote integration tests

But here's the kicker: The safety tests were flaky (Google's API sometimes blocks, sometimes doesn't). Claude Code:

Identified the flakiness
Added pytest-rerunfailures plugin
Marked flaky tests appropriately
Ensured CI wouldn't randomly fail

It debugged and fixed the tests before I even noticed the problem.

LangSmith Tracing Integration

For observability, I wanted LangSmith integration. I said:

"Add LangSmith tracing for debugging"

Claude Code not only integrated LangSmith, but added:

Automatic image URL logging
Safety filter information logging
Comprehensive metadata support
Error tracking
Custom tags and run names

Then it created working examples showing exactly how to use it.

Day 6-7: The CLI Explosion (November 3)

Building a Production CLI

This is where things got wild. I said:

"Let's build a comprehensive CLI"

What I expected: A basic command-line interface with a few commands.

What Claude Code built:

8,940 lines of code added in one pull request, including:

Complete CLI with Click framework:
- imagen generate - Text-to-image generation
- imagen analyze - Image analysis
- imagen edit - Image editing
- imagen upload/download - S3 operations
- imagen config - Configuration management
- imagen keys - API key management
- imagen models - Model listing
- imagen template - Template management
Advanced Template System:
- Save reusable job templates
- Load configuration from JSON files
- Variable substitution with {variable} syntax
- Multiple keys file support
- CLI override precedence
- --dump-job for inspection
- --dry-run for testing
LangSmith Export Feature:
- Fetch traces from LangSmith API
- Extract parameters automatically
- Convert to templates and keys
- Save directly from traces
203 New Tests covering:
- Variable substitution (28 tests)
- Deep merge utilities (24 tests)
- Template storage (23 tests)
- CLI commands (multiple test files)
- Integration workflows (13 tests)
- Safety settings (extensive tests)

This wasn't just code generation — it was architectural design.

The Demo Script

To demonstrate the CLI, Claude Code created demo.sh — a 335-line comprehensive demo showing:

Every CLI command
Template workflows
Configuration management
Error handling
Best practices

I never asked for this. Claude Code created it proactively to help users understand the tool.

The Documentation Sprint

On the final day, I asked Claude Code to reorganize documentation. I said:

"Update README with features, move advanced use cases to another file, consolidate everything"

Claude Code:

Created LIBRARY.md (695 lines) - Complete Python API reference
Created ADVANCED_USAGE.md (511 lines) - Advanced features and scripting
Updated CONTRIBUTING.md (284 lines) - Development guidelines
Rewrote README.md - Focused on CLI quick start
Cross-referenced all documents
Deleted redundant files

All documentation was consistent, comprehensive, and well-organized.

Then I said:

"Write a LinkedIn post for this library"

Claude Code wrote two versions (short and long) with proper hashtags and formatting.

"Write a Medium article about building this"

It wrote a technical deep-dive article explaining architectural decisions.

The Numbers

Let's look at what we accomplished:

Commits: 88 commits in 5 days
Lines of Code: ~10,000+ lines across library, CLI, tests, docs
Test Coverage: 95%+ with both unit and integration tests
Documentation: 4 comprehensive markdown files + inline docs
Features:

Complete Python library with async support
Full-featured CLI with 10+ commands
S3 integration
LangSmith tracing
Safety settings control
Template system
Configuration management
Aspect ratio control
Multiple input/output images

CI/CD: Fully automated with GitHub Actions
Package: Published to PyPI, installable with pip install gemini-imagen

How We Actually Worked Together

My Role:

Vision and Direction - "I want S3 support"
High-level Decisions - "Use the new google-genai SDK"
Requirements - "Add safety settings control"
Testing and Validation - Running integration tests, trying the CLI
User Experience - "This error message could be clearer"

Claude Code's Role:

Implementation - Writing all the code
Best Practices - Suggesting modern tools and patterns
Testing - Creating comprehensive test suites
Documentation - Writing clear, complete docs
Problem Solving - Debugging issues, researching solutions
Proactive Improvements - Adding features I didn't think of

The Workflow

A typical interaction looked like this:

Me: "Add S3 support"

Claude Code:

I'll add S3 integration with async support. This will involve:
1. Adding boto3 as optional dependency
2. Creating s3_utils module
3. Updating GeminiImageGenerator to handle S3 URIs
4. Adding tests
5. Updating documentation

Let me start with...

Then it would:

Show me the code it's about to write
Explain the design decisions
Ask for clarification if needed
Implement everything
Run tests
Update docs
Commit with a good commit message

The key: Claude Code was proactive but never autonomous. It explained what it was doing and why.

What Surprised Me

1. Architectural Thinking

Claude Code didn't just write code — it designed systems. When I asked for a template system, it:

Designed a variable substitution module
Created a deep merge utility for configuration precedence
Built a template storage system
Integrated it all into the CLI
Added comprehensive tests

This wasn't just "write this function" — it was end-to-end feature design.

2. Documentation First

Claude Code consistently created documentation alongside code:

Docstrings for every function
README updates for every feature
Separate documentation files for complex topics
Examples showing real usage

3. Testing Discipline

Every feature came with tests:

Unit tests with mocks
Integration tests with real APIs
Edge case coverage
Error handling tests

95%+ test coverage wasn't a goal — it was automatic.

4. Modern Best Practices

Claude Code knew tools I hadn't heard of:

uv for fast package management
ruff for unified linting/formatting
aiobotocore instead of aioboto3
pytest-rerunfailures for flaky tests

It kept me on the cutting edge without me having to research.

5. Iteration Speed

When something didn't work:

Claude Code would identify the issue
Suggest a fix
Implement it
Verify it worked
Move on

No Stack Overflow. No debugging sessions. Just iterate.

What Didn't Work

Let's be honest about limitations:

1. Occasional Hallucinations

Sometimes Claude Code would suggest an API that didn't exist:

"Use this parameter in google-genai" (parameter didn't exist)
"This function does X" (it did Y)

Solution: Always verify against official docs. Claude Code caught most of its own mistakes when I asked it to verify.

2. Context Limits

With large files, Claude Code would sometimes miss details. We worked around this by:

Breaking features into smaller chunks
Asking it to re-read specific files
Using focused prompts

3. Over-Engineering Risk

Sometimes Claude Code would suggest elaborate solutions when simple would do. I had to reign it in:

Claude Code: "We should add a plugin system for extensibility"
Me: "Let's just make it work first"

The Skills I Still Needed

Claude Code is powerful, but I still needed:

1. Product Sense

What features matter?
What should the API look like?
How should the CLI feel?

2. User Experience

Is this error message helpful?
Is this workflow intuitive?
What will users struggle with?

3. Technical Judgment

When is "good enough" good enough?
Should we optimize this now or later?
What's the right trade-off?

4. Domain Knowledge

How does Google's API actually work?
What are the common use cases?
What problems will users face?

Lessons for Working with AI Coding Assistants

1. Start with Clear Intent

❌ "Make it better"
✅ "Add S3 support with async upload/download and automatic URI detection"

2. Iterate in Small Steps

Build one feature at a time. Don't ask for everything at once.

3. Verify and Test

Always run the code. AI can hallucinate. Tests catch mistakes.

4. Explain Your Reasoning

When you disagree with a suggestion, explain why. The AI learns your preferences.

5. Use It for Boilerplate

AI excels at:

Test boilerplate
Documentation
CI/CD configuration
Type definitions
Error handling

6. Keep Humans for Judgment

AI suggests. Humans decide. Don't abdicate architectural decisions.

What This Means for Software Development

The Good News

Productivity Multiplier: I built in 1 week what would normally take a month
Quality Improves: Better tests, better docs, better error handling
Learning Accelerates: Claude Code teaches you best practices
Focus on Value: Spend time on product, not boilerplate

The Reality Check

Not Autonomous: You still need to drive
Not Perfect: Verification is essential
Not a Replacement: Judgment, creativity, and vision are human

The Future

This is just the beginning. Imagine:

AI that understands your entire codebase
AI that reviews your PRs
AI that writes integration tests from requirements
AI that refactors legacy code

We're not there yet, but we're close.

Try It Yourself

The entire project is open source:

GitHub: https://github.com/aviadr1/gemini-imagen
PyPI: https://pypi.org/project/gemini-imagen/

Install it:

pip install gemini-imagen
export GOOGLE_API_KEY="your-key"
imagen generate "a serene landscape" -o output.png

Every line of code was written with Claude Code's help. The git history is there — you can see the evolution.

Conclusion

Building with Claude Code felt like having an expert pair programmer who:

Never gets tired
Knows best practices
Writes tests automatically
Documents everything
Iterates instantly

But it still needs:

Your vision
Your judgment
Your creativity
Your understanding of users

This is the future of software development: Humans focus on what and why, AI handles how.

The question isn't "Will AI replace developers?"

The question is "Are you using AI to 10x your productivity yet?"

If not, you're already behind.

Aviad Rozenhek is a software engineer exploring the intersection of AI and developer productivity. Try gemini-imagen at github.com/aviadr1/gemini-imagen. Built with Claude Code.

Appendix: The Commit History

Here's a sample of our actual commit messages (all co-authored with Claude Code):

Initial commit: gemini-imagen v0.1.0
feat: Add comprehensive CI/CD and development best practices
docs: Update project to use uv as primary package manager
test: Add comprehensive LangSmith integration tests
feat: Add aspect ratio and image size control
Add configurable safety settings support
Implement comprehensive CLI with template system for fast prompt iteration
Separate CLI and library documentation
Bump version to 0.6.0

Each commit represents a collaboration between human vision and AI implementation.

This is how software gets built in 2025.