DEV Community

Connie Leung
Connie Leung

Posted on

Building an AI-Powered Alt Text Generator with Angular, Firebase AI Logic, and Gemini 3

In this project, I stepped out of my comfort zone to upgrade from Gemini 2.5 to Gemini 3 in Vertex AI. The goal was to build an intelligent Image Alt Text Generator that goes beyond simple description and hashtags.

By leveraging Gemini 3.0 (Pro Preview), this application analyzes an image to generate alternate text, hashtags, recommendations, and even finds a surprising or obscuring fact using the built-in Google Search tool. Moreover, the thinking process is included so users can understand the AI model's rationale in tackling the task.

The Tech Stack

  • Frontend: Angular v21
  • Backend: Firebase AI Logic (Vertex AI)
  • Model: Gemini 3.0 Pro Preview
  • Cloud: Firebase App Check, Firebase Remote Config
  • Features: JSON Structured Output, Thinking Mode, Google Search Grounding

Key Capabilities

  1. Structured Output: The model returns a strictly formatted JSON object, making it easy to display data in the UI.
  2. Thinking Mode: Gemini 3.0 enables "thinking" by default. We can track the token usage for thoughts, inputs, and outputs to understand the model's reasoning cost.
  3. Tool calling: The application uses built-in Google Search tool to look for facts, provides grounding chunks, Google Search suggestions, and web search queries in the grounding metadata.

1. Bootstrap Firebase

Before starting the Angular application, the provideAppInitializer function loads the model configurations from Firebase Remote Config.

function createRemoteConfig(firebaseApp: FirebaseApp) {
  const remoteConfig = getRemoteConfig(firebaseApp);

  // Set default values for Remote Config parameters.
  remoteConfig.defaultConfig = {
    'geminiModelName': 'gemini-3-pro-preview',
    'vertexAILocation': 'global',
    'includeThoughts': true,
    'thinkingBudget': 512,
  };

  return remoteConfig;
}
Enter fullscreen mode Exit fullscreen mode

The createRemoteConfig function instantiates a remote config and sets the default values for the parameters.

Parameter Description Default Value
geminiModelName Gemini Model gemini-3-pro-preview
vertexAILocation VertexAI Location global
includeThoughts Whether or not the model generates thinking processes true
thinkingBudget Thinking budget in tokens 512

When the application fails to load the configuration values, it falls back to the values in remoteConfig.defaultConfig.

export async function bootstrapFirebase() {
    try {
      const configService = inject(ConfigService);
      const firebaseApp = initializeApp(firebaseConfig.app);

      initializeAppCheck(firebaseApp, {
        provider: new ReCaptchaEnterpriseProvider(firebaseConfig.recaptchaEnterpriseSiteKey),
        isTokenAutoRefreshEnabled: true,
      });

      const remoteConfig = createRemoteConfig(firebaseApp);
      await fetchAndActivate(remoteConfig);

      configService.loadConfig(firebaseApp, remoteConfig);
    } catch (err) {
      console.error('Remote Config fetch failed', err);
      throw err;
    }
}
Enter fullscreen mode Exit fullscreen mode

The bootstrapFirebase function does the following:

  1. Inject a ConfigService and create a firebase application
  2. Initialize a Firebase App
  3. Initialize Firebase App check so that Firebase AI Logic does not get abused
  4. Create Firebase remote config and load the configurations
  5. Assign firebaseApp and remoteConfig to configService
export const appConfig: ApplicationConfig = {
  providers: [
    provideAppInitializer(async () => bootstrapFirebase()),
  ]
};

bootstrapApplication(App, appConfig)
  .catch((err) => console.error(err));
Enter fullscreen mode Exit fullscreen mode

The provideAppInitizer executes bootstrapFirebase to initialize Firebase.

2. Inject Generative AI Model

We load the configurations from Firebase Remote Config to construct a Generative AI model. The model specifies a response mime type and response schema to ensure Gemini 3 gives us a JSON object. Moreover, we enable thinking mode and Google Search by passing the thinkingConfig and tools.

The JSON Schema

First, we define what our data should look like. We want tags, alterative text, recommendations, and a surprising fact.

export const ImageAnalysisSchema = Schema.object({
    properties: {
        tags: Schema.array({
            items: Schema.string(),
        }),
        alternativeText: Schema.string(),
        recommendations: Schema.array({
          items: Schema.object({
            properties: {
              id: Schema.integer(),
              text: Schema.string(),
              reason: Schema.string()
            }
          })
        }),
        fact: Schema.string(),
     }
});
Enter fullscreen mode Exit fullscreen mode

Create the Generative AI Model

function getGenerativeAIModel(firebaseApp: FirebaseApp, remoteConfig: RemoteConfig) {
    const model = getValue(remoteConfig, 'geminiModelName').asString();
    const vertexAILocation = getValue(remoteConfig, 'vertexAILocation'). asString();
    const includeThoughts = getValue(remoteConfig, 'includeThoughts').asBoolean();
    const thinkingBudget = getValue(remoteConfig, 'thinkingBudget').asNumber();

    const ai = getAI(firebaseApp, { backend: new VertexAIBackend(vertexAILocation) });

    return getGenerativeModel(ai, {
        model,
        generationConfig: {
          responseMimeType: 'application/json',
          responseSchema: ImageAnalysisSchema,
          thinkingConfig: {
            includeThoughts,
            thinkingBudget,
          }
        },
        tools: [{
          googleSearch: {}
        }]
    });
}
Enter fullscreen mode Exit fullscreen mode

The getValue function loads the Gemini model name, VertexAI location, thinking mode, and thinking budget from the Firebase Remote Config.

The getAI function instantiates a VertexAI backend.

The getGenerativeModel function instantiates a Nano Banana Pro model that returns a structured output, and enables thinking and tool calling. The tools array enables Google Search and the thinkingConfig sets a token budget.

The Provider Logic

export function provideFirebase() {
    return makeEnvironmentProviders([
        {
            provide: AI_MODEL,
            useFactory: () => {
              const configService = inject(ConfigService);

              if (!configService.remoteConfig) {
                throw new Error('Remote config does not exist.');
              }

              if (!configService.firebaseApp) {
                throw new Error('Firebase App does not exist');
              }

              return getGenerativeAIModel(configService.firebaseApp, configService.remoteConfig);
            }
        }
    ]);
}
Enter fullscreen mode Exit fullscreen mode

The provider function validates the remote config and the firebase app are valid before creating the AI model.

export const appConfig: ApplicationConfig = {
  providers: [
    ... provideAppInitializer ...
    provideFirebase(),
  ]
};
Enter fullscreen mode Exit fullscreen mode

We include provideFirebase in appConfig so that we can use the injection token, AI_MODEL, to inject the Nano Banana Pro model in services and components.

3. The Prompt Engineering

In the FirebaseService, we convert the uploaded image to a format Vertex AI accepts and construct a multi-step prompt. This ensures the model performs specific tasks, like generating specific hashtags or finding obscure facts.

private aiModel = inject(AI_MODEL);
Enter fullscreen mode Exit fullscreen mode

The inject function injects the Nano Banana Pro model into the service.

async generateAltText(image: File): Promise<ImageAnalysisResponse> {
  const imagePart = await fileToGenerativePart(image);

  const prompt = `
    You are asked to perform four tasks:
    Task 1: Generate an alternative text for the image provided, max 125 characters.
    Task 2: Generate at least 3 tags to describe the image.
    Task 3: Based on the alternative text and tags, provide suggestions to make the image more interesting.
    Task 4: Search for a surprising or obscure fact that interconnects the following tags. If a direct link doesn't exist, find a conceptual link.
  `;

  const result = await this.aiModel.generateContent({
    contents: [{ role: 'user', parts: [{ text: prompt }, imagePart] }]
  });

  if (result?.response) {
      const response = result.response;
      const thought = response.thoughtSummary() || '';
      const text = response.text().replace(/```
{% endraw %}
json\n?|
{% raw %}
```/g, '');
      const parsed: ImageAnalysis = JSON.parse(text);
      const tokenUsage = this.getTokenUsage(response.usageMetadata);
      const citations = this.constructCitations(response.candidates?.[0]?.groundingMetadata);

      return {
        parsed,
        thought,
        tokenUsage,
        metadata: citations,
      };
   }
   throw Error('No text generated.');
}
Enter fullscreen mode Exit fullscreen mode

After receiving the response, we extract the thought summary, text, usageMetadata and grounding metadata.

3. Parse Response Text

The text is a string representation of a JSON object; therefore, it is parsed to a JSON object of type ImageAnalysis.

export type ImageAnalysis = {
    alternativeText: string;
    tags: string[];
    recommendations: Recommendation[]
    fact: string;
}
Enter fullscreen mode Exit fullscreen mode
const text = response.text().replace(/```
{% endraw %}
json\n?|
{% raw %}
```/g, '');
const parsed: ImageAnalysis = JSON.parse(text);
Enter fullscreen mode Exit fullscreen mode

4. Calculate Token Usage

Gemini 3 enables thinking by default; therefore, the total number of tokens is the sum of input tokens, output tokens, and thought tokens.

private getTokenUsage(usageMetadata?: UsageMetadata) {
    const tokenUsage = {
      input: usageMetadata?.promptTokenCount || 0,
      output: usageMetadata?.candidatesTokenCount || 0,
      thought: usageMetadata?.thoughtsTokenCount || 0,
      total: usageMetadata?.totalTokenCount || 0,
    };
    return tokenUsage;
}
Enter fullscreen mode Exit fullscreen mode
  • Input tokens = usageMetadata?.promptTokenCount
  • Output tokens = usageMetadata?.candidatesTokenCount
  • Thought tokens = usageMetadata?.thoughtsTokenCount

5. Handling Citations (Grounding)

One of the coolest features of using Gemini 3 is grounding. When the model searches for that "obscure fact," it returns metadata containing the sources.

We need a helper function to parse this metadata and construct citations for the UI:

private constructCitations(groundingMetadata?: GroundingMetadata) {
  const citations: string[] = [];
  const supports = groundingMetadata?.groundingSupports;
  const chunks = groundingMetadata.groundingChunks;
  if (supports) {
    for (const support of supports) {
      const indices = support.groundingChunkIndices || [];
      for (const index of indices) {
        const chunk = chunks[index];
        if (chunk?.web) {
          citations.push(chunk.web.uri);
        }
      }
    }
  }
  return citations;
}
Enter fullscreen mode Exit fullscreen mode

We iterate the support.groundingChunkIndices to obtain each chunk index. Then, we use the index to look up the grounding chunk. When the chunk has a web uri, the web uri is appended to the citations array. Finally, the function returns the citations array.

6. The Angular App

In App Component, the FirebaseService generates alternative text when the generate button is clicked. The result is written to the analysis signal.

firebaseAiService = inject(FirebaseService);

async handleGenerateClick() {
    const file = this.selectedFile();
    if (!file) { 
        return;
    }

    this.analysis.set(undefined);

    const results = await this.firebaseAiService.generateAltText(file);
    this.analysis.set(results);
}
Enter fullscreen mode Exit fullscreen mode

In the template, the app-tags-display component displays hashtags. The app-google-search-suggestions displays a surprising fact, citations, web search queries, and Google Search suggestions. The app-thought-summary component displays thought summaries and token usage.

@let imageAnalysis = analysis();
<app-tags-display [tags]="imageAnalysis?.parsed?.tags || []" />
<app-google-search-suggestions
    [interestingFact]="imageAnalysis?.parsed?.fact"
    [metadata]="imageAnalysis?.metadata"
/>
<app-thought-summary [thought]="imageAnalysis?.thought || ''" [tokenUsage]="imageAnalysis?.tokenUsage" />
Enter fullscreen mode Exit fullscreen mode

Conclusion

Migrating to Gemini 3 in Vertex AI unlocked powerful features for this Angular application. The combination of Thinking Mode for transparency and Grounding for factual accuracy makes Gemini 3 an incredibly robust tool for analyzing complex visual media.

Resources

Note: Google Cloud credits are provided for this project.

Top comments (0)