Over the past week, I built gemini-imagen — a comprehensive CLI and Python library for Google Gemini's image generation capabilities. But here's the twist: I didn't build it alone. I built it with Claude Code, Anthropic's AI-powered coding assistant.
This isn't a story about AI replacing developers. It's about AI amplifying developer productivity in ways I couldn't have imagined. Let me show you exactly how Claude Code and I worked together to ship a production-ready open-source library in record time.
Day 1: From Zero to Basic Library (October 30)
8:20 AM - The First Commit
I started with a vision: create a better developer experience for Google's Gemini image generation API. I asked Claude Code:
"Help me create a Python wrapper for Google Gemini's image generation capabilities"
Within 30 minutes, we had:
- Project structure with 
pyproject.toml - Basic 
GeminiImageGeneratorclass - Type-safe models with Pydantic
 - Initial test suite
 
The Magic Moment: Claude Code didn't just write code — it suggested using Pydantic v2 for validation, structured the project following Python best practices, and immediately set up proper testing with pytest.
10:29 AM - CI/CD from the Start
Most developers put off CI/CD. Not with Claude Code. I said:
"Let's set up proper CI/CD"
Claude Code created:
- GitHub Actions workflows for linting, testing, and building
 - Pre-commit hooks with ruff and mypy
 - Dependabot configuration for automated dependency updates
 - Coverage reporting with codecov
 
All properly configured and working on the first push.
12:27 PM - Modern Python Tooling
I mentioned I wanted to use modern Python tooling. Claude Code immediately:
- Configured uv as the package manager (faster than pip)
 - Set up ruff for both linting and formatting (replacing 5+ tools)
 - Added mypy with proper configuration
 - Created a 
Makefilefor common tasks 
Lesson Learned: Claude Code knows the current best practices. It suggested using uv before I even knew about it.
Afternoon - The SDK Migration
Google had just released their new google-genai SDK. The old examples I found online used google-generativeai. I asked:
"Should we use the new google-genai SDK?"
Claude Code not only said yes, but:
- Explained the benefits (unified API, better typing)
 - Migrated all the code
 - Updated all tests
 - Fixed breaking changes
 - Verified everything worked
 
3 commits later, we were on the latest SDK with full test coverage.
Day 2-3: Adding Real-World Features (October 31 - November 1)
The S3 Integration Challenge
I wanted S3 support for storing generated images. This is where Claude Code really shined:
Me: "Add S3 integration with async support"
Claude Code:
- Added 
boto3andaiobotocoreas optional dependencies - Created 
s3_utils.pywith async upload/download - Made S3 URIs first-class citizens (anything starting with 
s3://) - Added HTTP URL generation for public access
 - Wrote comprehensive tests with mocked S3 clients
 - Updated documentation with examples
 
But here's where it got interesting: Claude Code ran into a dependency conflict with aioboto3. Without me asking, it:
- Researched the issue
 - Found that 
aiobotocorewas the better choice - Refactored the code
 - Updated tests
 - Verified everything worked
 
3 version bumps in one afternoon as we iterated on the perfect solution.
Async Parallelization Breakthrough
I mentioned that loading/saving multiple images could be slow. Claude Code suggested:
"We should parallelize I/O operations with asyncio.gather"
It then:
- Refactored image loading to use parallel async
 - Refactored image saving to use parallel async
 - Created a benchmark script to measure improvements
 - Wrote documentation showing 5x speedup
 - Added visual diagrams explaining the performance gains
 
I didn't ask for the benchmarks or diagrams — Claude Code proactively created them to demonstrate the value.
Day 4-5: Safety and Observability (November 2)
Safety Settings Deep Dive
Google's safety filtering is complex. I asked:
"How do we control sensitivity settings?"
Claude Code:
- Read the Google Gemini documentation
 - Explained the safety categories and thresholds
 - Implemented a type-safe SafetySetting API
 - Added preset configurations (strict, default, relaxed, none)
 - Created comprehensive documentation in 
docs/SAFETY_FILTERING.md - Wrote integration tests
 
But here's the kicker: The safety tests were flaky (Google's API sometimes blocks, sometimes doesn't). Claude Code:
- Identified the flakiness
 - Added 
pytest-rerunfailuresplugin - Marked flaky tests appropriately
 - Ensured CI wouldn't randomly fail
 
It debugged and fixed the tests before I even noticed the problem.
LangSmith Tracing Integration
For observability, I wanted LangSmith integration. I said:
"Add LangSmith tracing for debugging"
Claude Code not only integrated LangSmith, but added:
- Automatic image URL logging
 - Safety filter information logging
 - Comprehensive metadata support
 - Error tracking
 - Custom tags and run names
 
Then it created working examples showing exactly how to use it.
Day 6-7: The CLI Explosion (November 3)
Building a Production CLI
This is where things got wild. I said:
"Let's build a comprehensive CLI"
What I expected: A basic command-line interface with a few commands.
What Claude Code built:
8,940 lines of code added in one pull request, including:
- 
Complete CLI with Click framework:
- 
imagen generate- Text-to-image generation - 
imagen analyze- Image analysis - 
imagen edit- Image editing - 
imagen upload/download- S3 operations - 
imagen config- Configuration management - 
imagen keys- API key management - 
imagen models- Model listing - 
imagen template- Template management 
 - 
 - 
Advanced Template System:
- Save reusable job templates
 - Load configuration from JSON files
 - Variable substitution with 
{variable}syntax - Multiple keys file support
 - CLI override precedence
 - 
--dump-jobfor inspection - 
--dry-runfor testing 
 - 
LangSmith Export Feature:
- Fetch traces from LangSmith API
 - Extract parameters automatically
 - Convert to templates and keys
 - Save directly from traces
 
 - 
203 New Tests covering:
- Variable substitution (28 tests)
 - Deep merge utilities (24 tests)
 - Template storage (23 tests)
 - CLI commands (multiple test files)
 - Integration workflows (13 tests)
 - Safety settings (extensive tests)
 
 
This wasn't just code generation — it was architectural design.
The Demo Script
To demonstrate the CLI, Claude Code created demo.sh — a 335-line comprehensive demo showing:
- Every CLI command
 - Template workflows
 - Configuration management
 - Error handling
 - Best practices
 
I never asked for this. Claude Code created it proactively to help users understand the tool.
The Documentation Sprint
On the final day, I asked Claude Code to reorganize documentation. I said:
"Update README with features, move advanced use cases to another file, consolidate everything"
Claude Code:
- Created LIBRARY.md (695 lines) - Complete Python API reference
 - Created ADVANCED_USAGE.md (511 lines) - Advanced features and scripting
 - Updated CONTRIBUTING.md (284 lines) - Development guidelines
 - Rewrote README.md - Focused on CLI quick start
 - Cross-referenced all documents
 - Deleted redundant files
 
All documentation was consistent, comprehensive, and well-organized.
Then I said:
"Write a LinkedIn post for this library"
Claude Code wrote two versions (short and long) with proper hashtags and formatting.
"Write a Medium article about building this"
It wrote a technical deep-dive article explaining architectural decisions.
The Numbers
Let's look at what we accomplished:
Commits: 88 commits in 5 days
Lines of Code: ~10,000+ lines across library, CLI, tests, docs
Test Coverage: 95%+ with both unit and integration tests
Documentation: 4 comprehensive markdown files + inline docs
Features:
- Complete Python library with async support
 - Full-featured CLI with 10+ commands
 - S3 integration
 - LangSmith tracing
 - Safety settings control
 - Template system
 - Configuration management
 - Aspect ratio control
 - Multiple input/output images
 
CI/CD: Fully automated with GitHub Actions
Package: Published to PyPI, installable with pip install gemini-imagen
How We Actually Worked Together
My Role:
- Vision and Direction - "I want S3 support"
 - High-level Decisions - "Use the new google-genai SDK"
 - Requirements - "Add safety settings control"
 - Testing and Validation - Running integration tests, trying the CLI
 - User Experience - "This error message could be clearer"
 
Claude Code's Role:
- Implementation - Writing all the code
 - Best Practices - Suggesting modern tools and patterns
 - Testing - Creating comprehensive test suites
 - Documentation - Writing clear, complete docs
 - Problem Solving - Debugging issues, researching solutions
 - Proactive Improvements - Adding features I didn't think of
 
The Workflow
A typical interaction looked like this:
Me: "Add S3 support"
Claude Code:
I'll add S3 integration with async support. This will involve:
1. Adding boto3 as optional dependency
2. Creating s3_utils module
3. Updating GeminiImageGenerator to handle S3 URIs
4. Adding tests
5. Updating documentation
Let me start with...
Then it would:
- Show me the code it's about to write
 - Explain the design decisions
 - Ask for clarification if needed
 - Implement everything
 - Run tests
 - Update docs
 - Commit with a good commit message
 
The key: Claude Code was proactive but never autonomous. It explained what it was doing and why.
What Surprised Me
1. Architectural Thinking
Claude Code didn't just write code — it designed systems. When I asked for a template system, it:
- Designed a variable substitution module
 - Created a deep merge utility for configuration precedence
 - Built a template storage system
 - Integrated it all into the CLI
 - Added comprehensive tests
 
This wasn't just "write this function" — it was end-to-end feature design.
2. Documentation First
Claude Code consistently created documentation alongside code:
- Docstrings for every function
 - README updates for every feature
 - Separate documentation files for complex topics
 - Examples showing real usage
 
3. Testing Discipline
Every feature came with tests:
- Unit tests with mocks
 - Integration tests with real APIs
 - Edge case coverage
 - Error handling tests
 
95%+ test coverage wasn't a goal — it was automatic.
4. Modern Best Practices
Claude Code knew tools I hadn't heard of:
- uv for fast package management
 - ruff for unified linting/formatting
 - aiobotocore instead of aioboto3
 - pytest-rerunfailures for flaky tests
 
It kept me on the cutting edge without me having to research.
5. Iteration Speed
When something didn't work:
- Claude Code would identify the issue
 - Suggest a fix
 - Implement it
 - Verify it worked
 - Move on
 
No Stack Overflow. No debugging sessions. Just iterate.
What Didn't Work
Let's be honest about limitations:
1. Occasional Hallucinations
Sometimes Claude Code would suggest an API that didn't exist:
- "Use this parameter in google-genai" (parameter didn't exist)
 - "This function does X" (it did Y)
 
Solution: Always verify against official docs. Claude Code caught most of its own mistakes when I asked it to verify.
2. Context Limits
With large files, Claude Code would sometimes miss details. We worked around this by:
- Breaking features into smaller chunks
 - Asking it to re-read specific files
 - Using focused prompts
 
3. Over-Engineering Risk
Sometimes Claude Code would suggest elaborate solutions when simple would do. I had to reign it in:
Claude Code: "We should add a plugin system for extensibility"
Me: "Let's just make it work first"
The Skills I Still Needed
Claude Code is powerful, but I still needed:
1. Product Sense
- What features matter?
 - What should the API look like?
 - How should the CLI feel?
 
2. User Experience
- Is this error message helpful?
 - Is this workflow intuitive?
 - What will users struggle with?
 
3. Technical Judgment
- When is "good enough" good enough?
 - Should we optimize this now or later?
 - What's the right trade-off?
 
4. Domain Knowledge
- How does Google's API actually work?
 - What are the common use cases?
 - What problems will users face?
 
Lessons for Working with AI Coding Assistants
1. Start with Clear Intent
❌ "Make it better"
✅ "Add S3 support with async upload/download and automatic URI detection"
2. Iterate in Small Steps
Build one feature at a time. Don't ask for everything at once.
3. Verify and Test
Always run the code. AI can hallucinate. Tests catch mistakes.
4. Explain Your Reasoning
When you disagree with a suggestion, explain why. The AI learns your preferences.
5. Use It for Boilerplate
AI excels at:
- Test boilerplate
 - Documentation
 - CI/CD configuration
 - Type definitions
 - Error handling
 
6. Keep Humans for Judgment
AI suggests. Humans decide. Don't abdicate architectural decisions.
What This Means for Software Development
The Good News
- Productivity Multiplier: I built in 1 week what would normally take a month
 - Quality Improves: Better tests, better docs, better error handling
 - Learning Accelerates: Claude Code teaches you best practices
 - Focus on Value: Spend time on product, not boilerplate
 
The Reality Check
- Not Autonomous: You still need to drive
 - Not Perfect: Verification is essential
 - Not a Replacement: Judgment, creativity, and vision are human
 
The Future
This is just the beginning. Imagine:
- AI that understands your entire codebase
 - AI that reviews your PRs
 - AI that writes integration tests from requirements
 - AI that refactors legacy code
 
We're not there yet, but we're close.
Try It Yourself
The entire project is open source:
GitHub: https://github.com/aviadr1/gemini-imagen
PyPI: https://pypi.org/project/gemini-imagen/
Install it:
pip install gemini-imagen
export GOOGLE_API_KEY="your-key"
imagen generate "a serene landscape" -o output.png
Every line of code was written with Claude Code's help. The git history is there — you can see the evolution.
Conclusion
Building with Claude Code felt like having an expert pair programmer who:
- Never gets tired
 - Knows best practices
 - Writes tests automatically
 - Documents everything
 - Iterates instantly
 
But it still needs:
- Your vision
 - Your judgment
 - Your creativity
 - Your understanding of users
 
This is the future of software development: Humans focus on what and why, AI handles how.
The question isn't "Will AI replace developers?"
The question is "Are you using AI to 10x your productivity yet?"
If not, you're already behind.
Aviad Rozenhek is a software engineer exploring the intersection of AI and developer productivity. Try gemini-imagen at github.com/aviadr1/gemini-imagen. Built with Claude Code.
Appendix: The Commit History
Here's a sample of our actual commit messages (all co-authored with Claude Code):
Initial commit: gemini-imagen v0.1.0
feat: Add comprehensive CI/CD and development best practices
docs: Update project to use uv as primary package manager
test: Add comprehensive LangSmith integration tests
feat: Add aspect ratio and image size control
Add configurable safety settings support
Implement comprehensive CLI with template system for fast prompt iteration
Separate CLI and library documentation
Bump version to 0.6.0
Each commit represents a collaboration between human vision and AI implementation.
This is how software gets built in 2025.
              
    
Top comments (0)