Manthan Ankolekar

Posted on Mar 18

Building an AI-Powered Image Generator with Google's Gemini API

#node #webdev #javascript #programming

Introduction

AI-powered image generation has taken creative applications to new heights, allowing users to generate stunning visuals from simple text prompts. In this blog, we will explore how I built a Gemini Image Generator, a Node.js application that leverages Google’s Generative AI (Gemini API) to create images based on user input.

Project Overview

The Gemini Image Generator is a lightweight REST API that allows users to send text prompts and receive AI-generated images. It is built using Node.js, Express, and Google’s Generative AI SDK.

Key Features:

✅ Accepts user text prompts to generate images.

✅ Uses Google Gemini API for AI-based image generation.

✅ Saves generated images on the server.

✅ REST API endpoints for easy integration with other applications.

Tech Stack

The project is built with:

Node.js - Backend runtime.
Express.js - Lightweight web framework.
Google Generative AI SDK - AI-powered image generation.
dotenv - Environment variable management.
cors - Cross-origin support.

Getting Started

1. Clone the Repository

git clone https://github.com/manthanank/gemini-image-generator.git
cd gemini-image-generator

2. Install Dependencies

npm install

3. Configure Environment Variables

Create a .env file in the root directory and add:

GEMINI_API_KEY=your_google_gemini_api_key
PORT=5000

4. Start the Server

npm start

Your server will run at http://localhost:5000 🚀

API Endpoints

Generate an Image

📌 Endpoint: POST /api/image/generate

📌 Request Body:

{
  "prompt": "a futuristic cityscape with neon lights"
}

📌 Response:

{
  "message": "Image generated successfully",
  "imagePath": "temp/generated_image.png"
}

Project Structure

gemini-image-generator/
├── controllers/       # Business logic
├── routes/            # API routes
├── services/          # Google Gemini AI logic
├── temp/              # Generated images
├── server.js          # Entry point
├── package.json       # Dependencies
└── .env               # Environment variables

Core Implementation

1. Setting Up the Express Server

The server.js file initializes the app and listens for requests:

const app = require("./app");
const { port } = require("./config/env");

app.listen(port, () => {
  console.log(`Server running on http://localhost:${port}`);
});

2. Handling Image Generation Requests

The imageController.js file manages requests:

const { generateImage } = require("../services/geminiService");

async function generateImageController(req, res) {
  const { prompt } = req.body;

  if (!prompt) {
    return res.status(400).json({ error: "Prompt is required" });
  }

  try {
    const imagePath = await generateImage(prompt);
    res.status(200).json({ message: "Image generated successfully", imagePath });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
}

module.exports = { generateImageController };

3. Integrating the Gemini API

The geminiService.js file calls Google's AI API:

const { GoogleGenerativeAI } = require("@google/generative-ai");
const fs = require("fs");
const path = require("path");
const { geminiApiKey } = require("../config/env");

const genAI = new GoogleGenerativeAI(geminiApiKey);

async function generateImage(prompt) {
  const model = genAI.getGenerativeModel({
    model: "gemini-2.0-flash-exp-image-generation",
    generationConfig: { responseModalities: ['Text', 'Image'] }
  });

  try {
    const response = await model.generateContent(prompt);
    for (const part of response.response.candidates[0].content.parts) {
      if (part.inlineData) {
        const imageData = part.inlineData.data;
        const buffer = Buffer.from(imageData, 'base64');
        const filePath = path.join(__dirname, '../temp/generated_image.png');
        fs.writeFileSync(filePath, buffer);
        return filePath;
      }
    }
  } catch (error) {
    console.error("Error generating image:", error);
    throw new Error("Failed to generate image");
  }
}

module.exports = { generateImage };

Conclusion

By integrating Google’s Gemini API with Node.js, we’ve built an AI-powered image generation API that can transform text into creative visuals. This project can be expanded to support image style selection, real-time previews, and cloud storage integration.

If you found this useful, ⭐ star the repo and feel free to contribute! 🚀

👉 GitHub Repository

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

DEV Community