DEV Community

Connie Leung
Connie Leung

Posted on

Getting started with Gemini API using NestJS

Introduction

I create different Generative AI examples in this blog post using NestJS and Gemini API. The examples generate text from 1) a text input 2) a prompt and an image, and 3) a prompt and two images to analyze them. Google team provided the examples in NodeJS and I ported several of them to NestJS, my favorite framework that builds on top of the popular Express framework.

Generate Gemini API Key

Go to https://aistudio.google.com/app/apikey to generate an API key on a new or an existing Google Cloud project

Create a new NestJS Project

nest new nestjs-gemini-api-demo
Enter fullscreen mode Exit fullscreen mode

Install dependencies

npm i --save-exact @google/generative-ai @nestjs/swagger class-transformer class-validator dotenv compression

npm i --save-exact --save-dev @types/multer
Enter fullscreen mode Exit fullscreen mode

Generate a Gemini Module

nest g mo gemini
nest g co gemini/presenters/http/gemini --flat
nest g s gemini/application/gemini --flat
Enter fullscreen mode Exit fullscreen mode

Create a Gemini module, a controller, and a service for the API.

Define Gemini environment variables

In .env.example, it has environment variables for the Gemini API Key, Gemini Pro model, Gemini Pro Vision model, and port number

// .env.example

GEMINI_API_KEY=<google_gemini_api_key>
GEMINI_PRO_MODEL=gemini-pro
GEMINI_PRO_VISION_MODEL=gemini-pro-vision
PORT=3000
Enter fullscreen mode Exit fullscreen mode

Copy .env.example to .env and replace the placeholder of GEMINI_API_KEY with the real API Key.

Add .env to the .gitignore file to ensure we don't accidentally commit the Gemini API Key to the GitHub repo

// .gitignore

.env
Enter fullscreen mode Exit fullscreen mode

Add configuration files

The project has 3 configuration files. validate.config.ts validates the payload is valid before any request can route to the controller to execute

// validate.config.ts

import { ValidationPipe } from '@nestjs/common';

export const validateConfig = new ValidationPipe({
  whitelist: true,
  stopAtFirstError: true,
});
Enter fullscreen mode Exit fullscreen mode

env.config.ts extracts the environment variables from process.env and stores the values in the env object.

// env.config.ts

import dotenv from 'dotenv';

dotenv.config();

export const env = {
  PORT: parseInt(process.env.PORT || '3000'),
  GEMINI: {
    KEY: process.env.GEMINI_API_KEY || '',
    PRO_MODEL: process.env.GEMINI_PRO_MODEL || 'gemini-pro',
    PRO_VISION_MODEL: process.env.GEMINI_PRO_VISION_MODEL || 'gemini-pro-vision',
  },
};
Enter fullscreen mode Exit fullscreen mode

gemini.config.ts defines the options for the Gemini API

// gemini.config.ts

import { GenerationConfig, HarmBlockThreshold, HarmCategory, SafetySetting } from '@google/generative-ai';

export const GENERATION_CONFIG: GenerationConfig = { maxOutputTokens: 1024, temperature: 1, topK: 32, topP: 1 };

export const SAFETY_SETTINGS: SafetySetting[] = [
  {
    category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_HARASSMENT,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
];
Enter fullscreen mode Exit fullscreen mode

Bootstrap the application

// main.ts

function setupSwagger(app: NestExpressApplication) {
  const config = new DocumentBuilder()
    .setTitle('Gemini example')
    .setDescription('The Gemini API description')
    .setVersion('1.0')
    .addTag('google gemini')
    .build();
  const document = SwaggerModule.createDocument(app, config);
  SwaggerModule.setup('api', app, document);
}

async function bootstrap() {
  const app = await NestFactory.create<NestExpressApplication>(AppModule);
  app.enableCors();
  app.useGlobalPipes(validateConfig);
  app.use(express.json({ limit: '1000kb' }));
  app.use(express.urlencoded({ extended: false }));
  app.use(compression());
  setupSwagger(app);
  await app.listen(env.PORT);
}
bootstrap();
Enter fullscreen mode Exit fullscreen mode

The bootstrap function registers middleware to the application, sets up Swagger documentation, and uses a global pipe to validate payloads.

I have laid down the groundwork and the next step is to add routes to receive Generative AI inputs to generate text

Example 1: Generate text from a prompt

// generate-text.dto.ts

import { ApiProperty } from '@nestjs/swagger';
import { IsNotEmpty, IsString } from 'class-validator';

export class GenerateTextDto {
  @ApiProperty({
    name: 'prompt',
    description: 'prompt of the question',
    type: 'string',
    required: true,
  })
  @IsNotEmpty()
  @IsString()
  prompt: string;
}
Enter fullscreen mode Exit fullscreen mode

The DTO accepts a text prompt to generate text.

// gemini.constant.ts

export const GEMINI_PRO_MODEL = 'GEMINI_PRO_MODEL';
export const GEMINI_PRO_VISION_MODEL = 'GEMINI_PRO_VISION_MODEL';
Enter fullscreen mode Exit fullscreen mode
// gemini.provider.ts

import { GenerativeModel, GoogleGenerativeAI } from '@google/generative-ai';
import { Provider } from '@nestjs/common';
import { env } from '~configs/env.config';
import { GENERATION_CONFIG, SAFETY_SETTINGS } from '~configs/gemini.config';
import { GEMINI_PRO_MODEL, GEMINI_PRO_VISION_MODEL } from './gemini.constant';

export const GeminiProModelProvider: Provider<GenerativeModel> = {
  provide: GEMINI_PRO_MODEL,
  useFactory: () => {
    const genAI = new GoogleGenerativeAI(env.GEMINI.KEY);
    return genAI.getGenerativeModel({
      model: env.GEMINI.PRO_MODEL,
      generationConfig: GENERATION_CONFIG,
      safetySettings: SAFETY_SETTINGS,
    });
  },
};

export const GeminiProVisionModelProvider: Provider<GenerativeModel> = {
  provide: GEMINI_PRO_VISION_MODEL,
  useFactory: () => {
    const genAI = new GoogleGenerativeAI(env.GEMINI.KEY);
    return genAI.getGenerativeModel({
      model: env.GEMINI.PRO_VISION_MODEL,
      generationConfig: GENERATION_CONFIG,
      safetySettings: SAFETY_SETTINGS,
    });
  },
};
Enter fullscreen mode Exit fullscreen mode

I define two providers to provide the Gemini Pro model and the Gemini Pro Vision model respectively. Then I can inject these providers into the Gemini service.

// content.helper.ts

import { Content, Part } from '@google/generative-ai';

export function createContent(text: string, ...images: Express.Multer.File[]): Content[] {
  const imageParts: Part[] = images.map((image) => {
    return {
      inlineData: {
        mimeType: image.mimetype,
        data: image.buffer.toString('base64'),
      },
    };
  });

  return [
    {
      role: 'user',
      parts: [
        ...imageParts,
        {
          text,
        },
      ],
    },
  ];
}
Enter fullscreen mode Exit fullscreen mode

createContent is a helper function that creates the content for the model.

// gemini.service.ts

// ... omit the import statements to save space

@Injectable()
export class GeminiService {

  constructor(
    @Inject(GEMINI_PRO_MODEL) private readonly proModel: GenerativeModel,
    @Inject(GEMINI_PRO_VISION_MODEL) private readonly proVisionModel: GenerativeModel,
  ) {}

  async generateText(prompt: string): Promise<GenAiResponse> {
    const contents = createContent(prompt);

    const { totalTokens } = await this.proModel.countTokens({ contents });
    const result = await this.proModel.generateContent({ contents });
    const response = await result.response;
    const text = response.text();

    return { totalTokens, text };
  }

  ...
}
Enter fullscreen mode Exit fullscreen mode

generateText method accepts a prompt and calls the Gemini API to generate the text. The method returns the total number of tokens and the text to the controller

// gemini.controller.ts

// omit the import statements to save space

@ApiTags('Gemini')
@Controller('gemini')
export class GeminiController {
  constructor(private service: GeminiService) {}

  @ApiBody({
    description: 'Prompt',
    required: true,
    type: GenerateTextDto,
  })
  @Post('text')
  generateText(@Body() dto: GenerateTextDto): Promise<GenAiResponse> {
    return this.service.generateText(dto.prompt);
  }

  ... other routes....
}
Enter fullscreen mode Exit fullscreen mode

Example 2: Generate text from a prompt and an image

This example needs both the prompt and the image file

// gemini.service.ts

// ... omit the import statements to save space

@Injectable()
export class GeminiService {

  constructor(
    @Inject(GEMINI_PRO_MODEL) private readonly proModel: GenerativeModel,
    @Inject(GEMINI_PRO_VISION_MODEL) private readonly proVisionModel: GenerativeModel,
  ) {}

 ... other methods ...

async generateTextFromMultiModal(prompt: string, file: Express.Multer.File): Promise<GenAiResponse> {
    try {
      const contents = createContent(prompt, file);

      const { totalTokens } = await this.proVisionModel.countTokens({ contents });
      const result = await this.proVisionModel.generateContent({ contents });
      const response = await result.response;
      const text = response.text();

      return { totalTokens, text };
    } catch (err) {
      if (err instanceof Error) {
        throw new InternalServerErrorException(err.message, err.stack);
      }
      throw err;
    }
  }
}
Enter fullscreen mode Exit fullscreen mode
// file-validator.pipe.ts

import { FileTypeValidator, MaxFileSizeValidator, ParseFilePipe } from '@nestjs/common';

export const fileValidatorPipe = new ParseFilePipe({
  validators: [
    new MaxFileSizeValidator({ maxSize: 1 * 1024 * 1024 }),
    new FileTypeValidator({ fileType: new RegExp('image/[jpeg|png]') }),
  ],
});
Enter fullscreen mode Exit fullscreen mode

Define fileValidatorPipe to validate that the uploaded file is either a JPEG or a PNG file, and that the file does not exceed 1MB.

// gemini.controller.ts

  @ApiConsumes('multipart/form-data')
  @ApiBody({
    schema: {
      type: 'object',
      properties: {
        prompt: {
          type: 'string',
          description: 'Prompt',
        },
        file: {
          type: 'string',
          format: 'binary',
          description: 'Binary file',
        },
      },
    },
  })
  @Post('text-and-image')
  @UseInterceptors(FileInterceptor('file'))
  async generateTextFromMultiModal(
    @Body() dto: GenerateTextDto,
    @UploadedFile(fileValidatorPipe)
    file: Express.Multer.File,
  ): Promise<GenAiResponse> {
    return this.service.generateTextFromMultiModal(dto.prompt, file);
  }
Enter fullscreen mode Exit fullscreen mode

file is the key that provides the binary file in the form data

Example 3: Analyze two images

This example is similar to example 2 except it needs a prompt and 2 images for comparison and contrast.

// gemini.service.ts

async analyzeImages({ prompt, firstImage, secondImage }: AnalyzeImage): Promise<GenAiResponse> {
    try {
      const contents = createContent(prompt, firstImage, secondImage);

      const { totalTokens } = await this.proVisionModel.countTokens({ contents });
      const result = await this.proVisionModel.generateContent({ contents });
      const response = await result.response;
      const text = response.text();

      return { totalTokens, text };
    } catch (err) {
      if (err instanceof Error) {
        throw new InternalServerErrorException(err.message, err.stack);
      }
      throw err;
    }
  }
Enter fullscreen mode Exit fullscreen mode
// gemini.controller.ts

@ApiConsumes('multipart/form-data')
  @ApiBody({
    schema: {
      type: 'object',
      properties: {
        prompt: {
          type: 'string',
          description: 'Prompt',
        },
        first: {
          type: 'string',
          format: 'binary',
          description: 'Binary file',
        },
        second: {
          type: 'string',
          format: 'binary',
          description: 'Binary file',
        },
      },
    },
  })
  @Post('analyse-the-images')
  @UseInterceptors(
    FileFieldsInterceptor([
      { name: 'first', maxCount: 1 },
      { name: 'second', maxCount: 1 },
    ]),
  )
  async analyseImages(
    @Body() dto: GenerateTextDto,
    @UploadedFiles()
    files: {
      first?: Express.Multer.File[];
      second?: Express.Multer.File[];
    },
  ): Promise<GenAiResponse> {
    if (!files.first?.length) {
      throw new BadRequestException('The first image is missing');
    }

    if (!files.second?.length) {
      throw new BadRequestException('The second image is missing');
    }
    return this.service.analyzeImages({ prompt: dto.prompt, firstImage: files.first[0], secondImage: files.second[0] });
  }
Enter fullscreen mode Exit fullscreen mode

first is the key that provides the first binary file in the form data and second is another key that provides the second binary file.

Test the endpoints

I can test the endpoints with Postman or Swagger documentation after starting the application

npm run start:dev
Enter fullscreen mode Exit fullscreen mode

The URL of the Swagger documentation is http://localhost:3000/api

(Bonus) Deploy to Google Cloud Run

Install the gcloud CLI on the machine according to the official documentation. On my machine, the installation path is ~/google-cloud-sdk.

Then, I open a new terminal and change to the root of the project. On the command line, I update the environment variables before the deployment

$ ~/google-cloud-sdk/bin/gcloud run deploy \ 
--update-env-vars GEMINI_API_KEY=<replace with your own key>,GEMINI_PRO_MODEL=gemini-pro,GEMINI_PRO_VISION_MODEL=gemini-pro-vision
Enter fullscreen mode Exit fullscreen mode

If the deployment is successful, the NestJS application will run on Google Cloud Run.

This is the end of the blog post that analyzes data retrieval patterns in Angular. I hope you like the content and continue to follow my learning experience in Angular, NestJS, and other technologies.

Resources:

Top comments (0)