DEV Community

Aviad Rozenhek
Aviad Rozenhek

Posted on

Launching gemini-imagen: A CLI for Google Gemini Image Generation

Introduction

When Google released their Gemini 2.0 models with image generation capabilities, I was excited to try them out. But as a developer who lives in the terminal, I wanted something more than just copying code snippets from documentation. I wanted a tool that felt natural, powerful, and ready for real-world use.

So I built gemini-imagen — a comprehensive Python library and CLI for Google Gemini's image generation capabilities. In this article, I'll walk you through the journey of building this tool, the technical decisions I made, and the lessons I learned along the way.

The Vision

I had three main goals:

  1. Developer-friendly CLI — Something you can use directly from the terminal without writing any code
  2. Production-ready library — A Python API that's type-safe, async-first, and ready for integration
  3. Real-world features — S3 storage, observability with LangSmith, proper error handling, and configuration management

Starting Simple: The Core API

I started with the Python library first. The Google GenAI SDK provides the foundation, but I wanted to add a layer that made common tasks trivial.

Here's what the simplest use case looks like:

from gemini_imagen import GeminiImageGenerator

generator = GeminiImageGenerator()

result = await generator.generate(
    prompt="A serene Japanese garden with cherry blossoms",
    output_images=["garden.png"]
)

print(f"Image saved to: {result.image_location}")
Enter fullscreen mode Exit fullscreen mode

Type Safety with Pydantic

One of my early decisions was to use Pydantic v2 for all data validation. This gives users immediate feedback if they pass invalid parameters, and enables excellent IDE autocomplete.

from pydantic import BaseModel, Field

class GenerationResult(BaseModel):
    """Result from image generation."""

    text: Optional[str] = Field(None, description="Generated text response")
    image_locations: List[str] = Field(default_factory=list, description="Local file paths")
    image_s3_uris: List[str] = Field(default_factory=list, description="S3 URIs")
    image_http_urls: List[str] = Field(default_factory=list, description="Public HTTP URLs")
    model: str = Field(description="Model used for generation")
Enter fullscreen mode Exit fullscreen mode

Type hints everywhere meant fewer bugs and better developer experience.

Async from Day One

Modern Python is async, and image generation involves I/O-heavy operations (downloading images, uploading to S3). Making everything async from the start was crucial:

async def generate(
    self,
    prompt: str,
    input_images: Optional[List[ImageSource]] = None,
    output_images: Optional[List[OutputImageSpec]] = None,
    **kwargs
) -> GenerationResult:
    # Load input images in parallel
    if input_images:
        loaded_images = await asyncio.gather(*[
            self._load_image(img) for img in input_images
        ])

    # Generate
    result = await self._call_api(...)

    # Save output images in parallel
    if output_images:
        await asyncio.gather(*[
            self._save_image(img, path) for img, path in zip(...)
        ])
Enter fullscreen mode Exit fullscreen mode

This parallelization made the library 5x faster when working with multiple images.

Building the CLI with Click

For the CLI, I chose Click — it's the gold standard for Python command-line tools. The goal was to make the CLI feel intuitive while exposing the full power of the library.

Command Structure

I organized commands by use case:

imagen generate "a sunset"        # Text-to-image
imagen analyze photo.jpg          # Image analysis
imagen edit "make it brighter"    # Image editing
imagen upload local.png s3://...  # S3 integration
imagen config set temperature 0.8 # Configuration
Enter fullscreen mode Exit fullscreen mode

Piping Support

One feature I'm particularly proud of is stdin support. You can pipe prompts directly:

echo "a robot reading a book" | imagen generate -o robot.png

cat prompts.txt | while read prompt; do
  imagen generate "$prompt" -o "$(echo $prompt | tr ' ' '_').png"
done
Enter fullscreen mode Exit fullscreen mode

This makes batch processing trivial and follows Unix philosophy.

JSON Output for Scripting

For automation, I added a --json flag to every command:

result=$(imagen generate "a landscape" -o output.png --json)
s3_uri=$(echo "$result" | jq -r '.s3_uri')
http_url=$(echo "$result" | jq -r '.http_url')
Enter fullscreen mode Exit fullscreen mode

This makes it easy to integrate with shell scripts, CI/CD pipelines, and other tools.

Real-World Features

S3 Integration

Cloud storage is essential for production use. I integrated boto3 with async support (aiobotocore) to make S3 a first-class citizen:

# Input from S3, output to S3
imagen generate "enhance this" \
  -i s3://bucket/input.jpg \
  -o s3://bucket/output.jpg

# The result includes both S3 URI and public HTTP URL
# S3 URI: s3://bucket/output.jpg
# HTTP URL: https://bucket.s3.amazonaws.com/output.jpg
Enter fullscreen mode Exit fullscreen mode

The library automatically detects S3 URIs (anything starting with s3://) and handles authentication using standard AWS credential chains.

LangSmith Tracing

Observability is crucial for debugging and monitoring. I integrated LangSmith to automatically log:

  • Input prompts and parameters
  • Generated images (as S3 URLs)
  • Model used and execution time
  • Safety filtering information
  • Custom metadata and tags
# Enable tracing
os.environ["LANGSMITH_TRACING"] = "true"

result = await generator.generate(
    prompt="A robot in a library",
    output_images=["robot.png"],
    metadata={"user_id": "demo", "session": "example"},
    tags=["demo", "robot"]
)

# View traces at https://smith.langchain.com/
Enter fullscreen mode Exit fullscreen mode

This made debugging API issues and monitoring production usage much easier.

Safety Settings

Google's safety filters can sometimes be restrictive. I added comprehensive control over safety thresholds:

# Use a preset
imagen generate "artistic photo" -o output.png --safety-setting preset:relaxed

# Fine-grained control
imagen generate "portrait" -o output.png \
  --safety-setting SEXUALLY_EXPLICIT:BLOCK_ONLY_HIGH \
  --safety-setting DANGEROUS_CONTENT:BLOCK_LOW_AND_ABOVE
Enter fullscreen mode Exit fullscreen mode

You can also set global defaults:

imagen config set safety_settings relaxed
Enter fullscreen mode Exit fullscreen mode

Configuration Management

I wanted configuration to be flexible but opinionable. The precedence is:

  1. Command-line flags (highest priority)
  2. Environment variables
  3. Config file (~/.config/imagen/config.yaml)
  4. Default values (lowest priority)

This means you can:

# Set global defaults
imagen config set default_model gemini-2.0-flash-exp
imagen config set temperature 0.8
imagen config set aspect_ratio 16:9

# Override per-command
imagen generate "test" -o out.png --temperature 0.4
Enter fullscreen mode Exit fullscreen mode

Development Experience

Modern Python Tooling

I used uv for package management — it's incredibly fast and handles dependency resolution better than pip. Combined with:

  • ruff for linting and formatting (replaces flake8, black, isort)
  • mypy for type checking
  • pytest with async support for testing
  • pre-commit hooks for automatic code quality

The development experience was smooth and fast.

Testing Strategy

I separated tests into two categories:

  1. Unit tests — Mocked, no API keys required
  2. Integration tests — Real API calls, marked with @pytest.mark.integration
@pytest.mark.integration
async def test_real_generation():
    """Test actual image generation with real API."""
    generator = GeminiImageGenerator()
    result = await generator.generate(
        prompt="A simple test image",
        output_images=["test_output.png"]
    )
    assert result.image_location.exists()
Enter fullscreen mode Exit fullscreen mode

Integration tests automatically skip if API keys aren't configured, making CI/CD easier.

CI/CD with GitHub Actions

The CI pipeline runs:

  1. Lint job — ruff and mypy on all code
  2. Test job — pytest on Python 3.12 and 3.13
  3. Integration tests — Only on main branch with secrets
  4. Build job — Package building and verification

Plus automated dependency updates with Dependabot.

Documentation Strategy

Good documentation is crucial for adoption. I organized it into four files:

  1. README.md — CLI quick start and features (for casual users)
  2. LIBRARY.md — Python API reference (for developers integrating the library)
  3. ADVANCED_USAGE.md — S3, LangSmith, scripting, automation (for power users)
  4. CONTRIBUTING.md — Development setup and guidelines (for contributors)

Each file cross-references the others, making it easy to navigate.

Lessons Learned

1. Start with Types

Using Pydantic from day one caught so many bugs early. The upfront cost of defining models pays off immediately.

2. Async is Worth It

Making everything async added complexity, but the performance gains (especially with multiple images) made it worthwhile.

3. CLI UX Matters

Small touches like:

  • Piping support
  • JSON output for scripting
  • Helpful error messages with suggestions
  • Progress indicators

These made the CLI feel professional and polished.

4. Configuration is Hard

Getting the precedence right (CLI flags > env vars > config file > defaults) took several iterations. But once done, it made the tool much more flexible.

5. Documentation Drives Design

Writing documentation early forced me to think about the user experience. If something was hard to explain, it was probably badly designed.

What's Next?

Some features I'm considering:

  • Batch processing — Built-in support for processing multiple prompts
  • Template system — Reusable prompt templates with variables
  • Webhook support — Async notifications when generation completes
  • Local model support — Fallback to local models when API is unavailable
  • More output formats — WebP, AVIF support

Try It Yourself

Getting started is simple:

pip install gemini-imagen
export GOOGLE_API_KEY="your-key-here"
imagen generate "a serene landscape" -o output.png
Enter fullscreen mode Exit fullscreen mode

The code is open source on GitHub: https://github.com/aviadr1/gemini-imagen

Conclusion

Building gemini-imagen was a journey in creating developer tools that feel right. It's not just about wrapping an API — it's about understanding how developers work and building something that fits naturally into their workflow.

Whether you're generating images in production, experimenting with AI art, or just need a quick CLI tool for image generation, I hope gemini-imagen makes your life easier.

Feel free to try it out, open issues, or contribute! I'd love to hear how you use it.


Aviad Rozenhek is a software engineer passionate about developer tools and AI. Connect on GitHub.

Top comments (0)