DEV Community

Manthan Ankolekar
Manthan Ankolekar

Posted on

3

Building an AI-Powered Image Generator with Google's Gemini API

Introduction

AI-powered image generation has taken creative applications to new heights, allowing users to generate stunning visuals from simple text prompts. In this blog, we will explore how I built a Gemini Image Generator, a Node.js application that leverages Google’s Generative AI (Gemini API) to create images based on user input.

Project Overview

The Gemini Image Generator is a lightweight REST API that allows users to send text prompts and receive AI-generated images. It is built using Node.js, Express, and Google’s Generative AI SDK.

Key Features:

✅ Accepts user text prompts to generate images.

✅ Uses Google Gemini API for AI-based image generation.

✅ Saves generated images on the server.

✅ REST API endpoints for easy integration with other applications.

Tech Stack

The project is built with:

  • Node.js - Backend runtime.
  • Express.js - Lightweight web framework.
  • Google Generative AI SDK - AI-powered image generation.
  • dotenv - Environment variable management.
  • cors - Cross-origin support.

Getting Started

1. Clone the Repository

git clone https://github.com/manthanank/gemini-image-generator.git
cd gemini-image-generator
Enter fullscreen mode Exit fullscreen mode

2. Install Dependencies

npm install
Enter fullscreen mode Exit fullscreen mode

3. Configure Environment Variables

Create a .env file in the root directory and add:

GEMINI_API_KEY=your_google_gemini_api_key
PORT=5000
Enter fullscreen mode Exit fullscreen mode

4. Start the Server

npm start
Enter fullscreen mode Exit fullscreen mode

Your server will run at http://localhost:5000 🚀


API Endpoints

Generate an Image

📌 Endpoint: POST /api/image/generate

📌 Request Body:

{
  "prompt": "a futuristic cityscape with neon lights"
}
Enter fullscreen mode Exit fullscreen mode

📌 Response:

{
  "message": "Image generated successfully",
  "imagePath": "temp/generated_image.png"
}
Enter fullscreen mode Exit fullscreen mode

Project Structure

gemini-image-generator/
├── controllers/       # Business logic
├── routes/            # API routes
├── services/          # Google Gemini AI logic
├── temp/              # Generated images
├── server.js          # Entry point
├── package.json       # Dependencies
└── .env               # Environment variables
Enter fullscreen mode Exit fullscreen mode

Core Implementation

1. Setting Up the Express Server

The server.js file initializes the app and listens for requests:

const app = require("./app");
const { port } = require("./config/env");

app.listen(port, () => {
  console.log(`Server running on http://localhost:${port}`);
});
Enter fullscreen mode Exit fullscreen mode

2. Handling Image Generation Requests

The imageController.js file manages requests:

const { generateImage } = require("../services/geminiService");

async function generateImageController(req, res) {
  const { prompt } = req.body;

  if (!prompt) {
    return res.status(400).json({ error: "Prompt is required" });
  }

  try {
    const imagePath = await generateImage(prompt);
    res.status(200).json({ message: "Image generated successfully", imagePath });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
}

module.exports = { generateImageController };
Enter fullscreen mode Exit fullscreen mode

3. Integrating the Gemini API

The geminiService.js file calls Google's AI API:

const { GoogleGenerativeAI } = require("@google/generative-ai");
const fs = require("fs");
const path = require("path");
const { geminiApiKey } = require("../config/env");

const genAI = new GoogleGenerativeAI(geminiApiKey);

async function generateImage(prompt) {
  const model = genAI.getGenerativeModel({
    model: "gemini-2.0-flash-exp-image-generation",
    generationConfig: { responseModalities: ['Text', 'Image'] }
  });

  try {
    const response = await model.generateContent(prompt);
    for (const part of response.response.candidates[0].content.parts) {
      if (part.inlineData) {
        const imageData = part.inlineData.data;
        const buffer = Buffer.from(imageData, 'base64');
        const filePath = path.join(__dirname, '../temp/generated_image.png');
        fs.writeFileSync(filePath, buffer);
        return filePath;
      }
    }
  } catch (error) {
    console.error("Error generating image:", error);
    throw new Error("Failed to generate image");
  }
}

module.exports = { generateImage };
Enter fullscreen mode Exit fullscreen mode

Conclusion

By integrating Google’s Gemini API with Node.js, we’ve built an AI-powered image generation API that can transform text into creative visuals. This project can be expanded to support image style selection, real-time previews, and cloud storage integration.

If you found this useful, ⭐ star the repo and feel free to contribute! 🚀

👉 GitHub Repository

AWS Q Developer image

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

If this post resonated with you, feel free to hit ❤️ or leave a quick comment to share your thoughts!

Okay