Posted onFeb 26, 2024

Getting started with Gemini API using NestJS

#nestjs #typescript #tutorial #ai

Introduction

I create different Generative AI examples in this blog post using NestJS and Gemini API. The examples generate text from 1) a text input 2) a prompt and an image, and 3) a prompt and two images to analyze them. Google team provided the examples in NodeJS and I ported several of them to NestJS, my favorite framework that builds on top of the popular Express framework.

Generate Gemini API Key

Go tohttps://aistudio.google.com/app/apikey to generate an API key on a new or an existing Google Cloud project

Create a new NestJS Project

nest new nestjs-gemini-api-demo

Install dependencies

npm i--save-exact @google/generative-ai @nestjs/swagger class-transformer class-validator dotenv compressionnpm i--save-exact--save-dev @types/multer

Generate a Gemini Module

nest g mo gemininest g co gemini/presenters/http/gemini--flatnest g s gemini/application/gemini--flat

Create a Gemini module, a controller, and a service for the API.

Define Gemini environment variables

In.env.example, it has environment variables for the Gemini API Key, Gemini Pro model, Gemini Pro Vision model, and port number

// .env.exampleGEMINI_API_KEY=<google_gemini_api_key>GEMINI_PRO_MODEL=gemini-proGEMINI_PRO_VISION_MODEL=gemini-pro-visionPORT=3000

Copy.env.example to.env and replace the placeholder ofGEMINI_API_KEY with the real API Key.

Add.env to the.gitignore file to ensure we don't accidentally commit the Gemini API Key to the GitHub repo

// .gitignore.env

Add configuration files

The project has 3 configuration files.validate.config.ts validates the payload is valid before any request can route to the controller to execute

// validate.config.tsimport{ValidationPipe}from'@nestjs/common';exportconstvalidateConfig=newValidationPipe({whitelist:true,stopAtFirstError:true,});

env.config.ts extracts the environment variables fromprocess.env and stores the values in theenv object.

// env.config.tsimportdotenvfrom'dotenv';dotenv.config();exportconstenv={PORT:parseInt(process.env.PORT||'3000'),GEMINI:{KEY:process.env.GEMINI_API_KEY||'',PRO_MODEL:process.env.GEMINI_PRO_MODEL||'gemini-pro',PRO_VISION_MODEL:process.env.GEMINI_PRO_VISION_MODEL||'gemini-pro-vision',},};

gemini.config.ts defines the options for the Gemini API

// gemini.config.tsimport{GenerationConfig,HarmBlockThreshold,HarmCategory,SafetySetting}from'@google/generative-ai';exportconstGENERATION_CONFIG:GenerationConfig={maxOutputTokens:1024,temperature:1,topK:32,topP:1};exportconstSAFETY_SETTINGS:SafetySetting[]=[{category:HarmCategory.HARM_CATEGORY_HATE_SPEECH,threshold:HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,},{category:HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,threshold:HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,},{category:HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,threshold:HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,},{category:HarmCategory.HARM_CATEGORY_HARASSMENT,threshold:HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,},];

Bootstrap the application

// main.tsfunctionsetupSwagger(app:NestExpressApplication){constconfig=newDocumentBuilder().setTitle('Gemini example').setDescription('The Gemini API description').setVersion('1.0').addTag('google gemini').build();constdocument=SwaggerModule.createDocument(app,config);SwaggerModule.setup('api',app,document);}asyncfunctionbootstrap(){constapp=awaitNestFactory.create<NestExpressApplication>(AppModule);app.enableCors();app.useGlobalPipes(validateConfig);app.use(express.json({limit:'1000kb'}));app.use(express.urlencoded({extended:false}));app.use(compression());setupSwagger(app);awaitapp.listen(env.PORT);}bootstrap();

Thebootstrap function registers middleware to the application, sets up Swagger documentation, and uses a global pipe to validate payloads.

I have laid down the groundwork and the next step is to add routes to receive Generative AI inputs to generate text

Example 1: Generate text from a prompt

// generate-text.dto.tsimport{ApiProperty}from'@nestjs/swagger';import{IsNotEmpty,IsString}from'class-validator';exportclassGenerateTextDto{@ApiProperty({name:'prompt',description:'prompt of the question',type:'string',required:true,})@IsNotEmpty()@IsString()prompt:string;}

The DTO accepts a text prompt to generate text.

// gemini.constant.tsexportconstGEMINI_PRO_MODEL='GEMINI_PRO_MODEL';exportconstGEMINI_PRO_VISION_MODEL='GEMINI_PRO_VISION_MODEL';

// gemini.provider.tsimport{GenerativeModel,GoogleGenerativeAI}from'@google/generative-ai';import{Provider}from'@nestjs/common';import{env}from'~configs/env.config';import{GENERATION_CONFIG,SAFETY_SETTINGS}from'~configs/gemini.config';import{GEMINI_PRO_MODEL,GEMINI_PRO_VISION_MODEL}from'./gemini.constant';exportconstGeminiProModelProvider:Provider<GenerativeModel>={provide:GEMINI_PRO_MODEL,useFactory:()=>{constgenAI=newGoogleGenerativeAI(env.GEMINI.KEY);returngenAI.getGenerativeModel({model:env.GEMINI.PRO_MODEL,generationConfig:GENERATION_CONFIG,safetySettings:SAFETY_SETTINGS,});},};exportconstGeminiProVisionModelProvider:Provider<GenerativeModel>={provide:GEMINI_PRO_VISION_MODEL,useFactory:()=>{constgenAI=newGoogleGenerativeAI(env.GEMINI.KEY);returngenAI.getGenerativeModel({model:env.GEMINI.PRO_VISION_MODEL,generationConfig:GENERATION_CONFIG,safetySettings:SAFETY_SETTINGS,});},};

I define two providers to provide the Gemini Pro model and the Gemini Pro Vision model respectively. Then I can inject these providers into the Gemini service.

// content.helper.tsimport{Content,Part}from'@google/generative-ai';exportfunctioncreateContent(text:string,...images:Express.Multer.File[]):Content[]{constimageParts:Part[]=images.map((image)=>{return{inlineData:{mimeType:image.mimetype,data:image.buffer.toString('base64'),},};});return[{role:'user',parts:[...imageParts,{text,},],},];}

createContent is a helper function that creates the content for the model.

// gemini.service.ts// ... omit the import statements to save space@Injectable()exportclassGeminiService{constructor(@Inject(GEMINI_PRO_MODEL)privatereadonlyproModel:GenerativeModel,@Inject(GEMINI_PRO_VISION_MODEL)privatereadonlyproVisionModel:GenerativeModel,){}asyncgenerateText(prompt:string):Promise<GenAiResponse>{constcontents=createContent(prompt);const{totalTokens}=awaitthis.proModel.countTokens({contents});constresult=awaitthis.proModel.generateContent({contents});constresponse=awaitresult.response;consttext=response.text();return{totalTokens,text};}...}

generateText method accepts a prompt and calls the Gemini API to generate the text. The method returns the total number of tokens and the text to the controller

// gemini.controller.ts// omit the import statements to save space@ApiTags('Gemini')@Controller('gemini')exportclassGeminiController{constructor(privateservice:GeminiService){}@ApiBody({description:'Prompt',required:true,type:GenerateTextDto,})@Post('text')generateText(@Body()dto:GenerateTextDto):Promise<GenAiResponse>{returnthis.service.generateText(dto.prompt);}...otherroutes....}

Example 2: Generate text from a prompt and an image

This example needs both the prompt and the image file

// gemini.service.ts// ... omit the import statements to save space@Injectable()exportclassGeminiService{constructor(@Inject(GEMINI_PRO_MODEL)privatereadonlyproModel:GenerativeModel,@Inject(GEMINI_PRO_VISION_MODEL)privatereadonlyproVisionModel:GenerativeModel,){}...othermethods...asyncgenerateTextFromMultiModal(prompt:string,file:Express.Multer.File):Promise<GenAiResponse>{try{constcontents=createContent(prompt,file);const{totalTokens}=awaitthis.proVisionModel.countTokens({contents});constresult=awaitthis.proVisionModel.generateContent({contents});constresponse=awaitresult.response;consttext=response.text();return{totalTokens,text};}catch(err){if(errinstanceofError){thrownewInternalServerErrorException(err.message,err.stack);}throwerr;}}}

// file-validator.pipe.tsimport{FileTypeValidator,MaxFileSizeValidator,ParseFilePipe}from'@nestjs/common';exportconstfileValidatorPipe=newParseFilePipe({validators:[newMaxFileSizeValidator({maxSize:1*1024*1024}),newFileTypeValidator({fileType:newRegExp('image/[jpeg|png]')}),],});

DefinefileValidatorPipe to validate that the uploaded file is either a JPEG or a PNG file, and that the file does not exceed 1MB.

// gemini.controller.ts@ApiConsumes('multipart/form-data')@ApiBody({schema:{type:'object',properties:{prompt:{type:'string',description:'Prompt',},file:{type:'string',format:'binary',description:'Binary file',},},},})@Post('text-and-image')@UseInterceptors(FileInterceptor('file'))asyncgenerateTextFromMultiModal(@Body()dto:GenerateTextDto,@UploadedFile(fileValidatorPipe)file:Express.Multer.File,):Promise<GenAiResponse>{returnthis.service.generateTextFromMultiModal(dto.prompt,file);}

file is the key that provides the binary file in the form data

Example 3: Analyze two images

This example is similar to example 2 except it needs a prompt and 2 images for comparison and contrast.

// gemini.service.tsasyncanalyzeImages({prompt,firstImage,secondImage}:AnalyzeImage):Promise<GenAiResponse>{try{constcontents=createContent(prompt,firstImage,secondImage);const{totalTokens}=awaitthis.proVisionModel.countTokens({contents});constresult=awaitthis.proVisionModel.generateContent({contents});constresponse=awaitresult.response;consttext=response.text();return{totalTokens,text};}catch(err){if(errinstanceofError){thrownewInternalServerErrorException(err.message,err.stack);}throwerr;}}

// gemini.controller.ts@ApiConsumes('multipart/form-data')@ApiBody({schema:{type:'object',properties:{prompt:{type:'string',description:'Prompt',},first:{type:'string',format:'binary',description:'Binary file',},second:{type:'string',format:'binary',description:'Binary file',},},},})@Post('analyse-the-images')@UseInterceptors(FileFieldsInterceptor([{name:'first',maxCount:1},{name:'second',maxCount:1},]),)asyncanalyseImages(@Body()dto:GenerateTextDto,@UploadedFiles()files:{first?:Express.Multer.File[];second?:Express.Multer.File[];},):Promise<GenAiResponse>{if(!files.first?.length){thrownewBadRequestException('The first image is missing');}if(!files.second?.length){thrownewBadRequestException('The second image is missing');}returnthis.service.analyzeImages({prompt:dto.prompt,firstImage:files.first[0],secondImage:files.second[0]});}

first is the key that provides the first binary file in the form data andsecond is another key that provides the second binary file.

Test the endpoints

I can test the endpoints with Postman or Swagger documentation after starting the application

npm run start:dev

The URL of the Swagger documentation ishttp://localhost:3000/api

(Bonus) Deploy to Google Cloud Run

Install the gcloud CLI on the machine according to the official documentation. On my machine, the installation path is ~/google-cloud-sdk.

Then, I open a new terminal and change to the root of the project. On the command line, I update the environment variables before the deployment

$~/google-cloud-sdk/bin/gcloud run deploy\--update-env-varsGEMINI_API_KEY=<replace with your own key>,GEMINI_PRO_MODEL=gemini-pro,GEMINI_PRO_VISION_MODEL=gemini-pro-vision

If the deployment is successful, the NestJS application will run on Google Cloud Run.

This is the end of the blog post that analyzes data retrieval patterns in Angular. I hope you like the content and continue to follow my learning experience in Angular, NestJS, and other technologies.