Connie Leung

Posted on Dec 11 • Edited on Dec 15

Building an AI-Powered Alt Text Generator with Angular, Firebase AI Logic, and Gemini 3

#angular #firebase #gemini #aisprint

In this project, I stepped out of my comfort zone to upgrade from Gemini 2.5 to Gemini 3 in Vertex AI. The goal was to build an intelligent Image Alt Text Generator that goes beyond simple description and hashtags.

By leveraging Gemini 3.0 (Pro Preview), this application analyzes an image to generate alternate text, hashtags, recommendations, and even finds a surprising or obscuring fact using the built-in Google Search tool. Moreover, the thinking process is included so users can understand the AI model's rationale in tackling the task.

The Tech Stack

Frontend: Angular v21
Backend: Firebase AI Logic (Vertex AI)
Model: Gemini 3.0 Pro Preview
Cloud: Firebase App Check, Firebase Remote Config
Features: JSON Structured Output, Thinking Mode, Google Search Grounding

Key Capabilities

Structured Output: The model returns a strictly formatted JSON object, making it easy to display data in the UI.
Thinking Mode: Gemini 3.0 enables "thinking" by default. We can track the token usage for thoughts, inputs, and outputs to understand the model's reasoning cost.
Tool calling: The application uses the built-in Google Search tool to look for facts, provides grounding chunks, Google Search suggestions, and web search queries in the grounding metadata.

1. Bootstrap Firebase

Before starting the Angular application, the provideAppInitializer function loads the model configurations from Firebase Remote Config.

function createRemoteConfig(firebaseApp: FirebaseApp) {
  const remoteConfig = getRemoteConfig(firebaseApp);

  // Set default values for Remote Config parameters.
  remoteConfig.defaultConfig = {
    'geminiModelName': 'gemini-3-pro-preview',
    'vertexAILocation': 'global',
    'includeThoughts': true,
    'thinkingBudget': 512,
  };

  return remoteConfig;
}

The createRemoteConfig function instantiates a remote config and sets the default values for the parameters.

Parameter	Description	Default Value
geminiModelName	Gemini Model	gemini-3-pro-preview
vertexAILocation	VertexAI Location	global
includeThoughts	Whether or not the model generates thinking processes	true
thinkingBudget	Thinking budget in tokens	512

When the application fails to load the configuration values, it falls back to the values in remoteConfig.defaultConfig.

export async function bootstrapFirebase() {
    try {
      const configService = inject(ConfigService);
      const firebaseApp = initializeApp(firebaseConfig.app);

      initializeAppCheck(firebaseApp, {
        provider: new ReCaptchaEnterpriseProvider(firebaseConfig.recaptchaEnterpriseSiteKey),
        isTokenAutoRefreshEnabled: true,
      });

      const remoteConfig = createRemoteConfig(firebaseApp);
      await fetchAndActivate(remoteConfig);

      configService.loadConfig(firebaseApp, remoteConfig);
    } catch (err) {
      console.error('Remote Config fetch failed', err);
      throw err;
    }
}

The bootstrapFirebase function does the following:

Inject a ConfigService and create a firebase application
Initialize a Firebase App
Initialize Firebase App check so that Firebase AI Logic does not get abused
Create Firebase remote config and load the configurations
Assign firebaseApp and remoteConfig to configService

export const appConfig: ApplicationConfig = {
  providers: [
    provideAppInitializer(async () => bootstrapFirebase()),
  ]
};

bootstrapApplication(App, appConfig)
  .catch((err) => console.error(err));

The provideAppInitializer executes bootstrapFirebase to initialize Firebase.

2. Inject Generative AI Model

We load the configurations from Firebase Remote Config to construct a Generative AI model. The model specifies a response mime type and response schema to ensure Gemini 3 gives us a JSON object. Moreover, we enable thinking mode and Google Search by passing the thinkingConfig and tools.

The JSON Schema

First, we define what our data should look like. We want tags, alternative text, recommendations, and a surprising fact.

export const ImageAnalysisSchema = Schema.object({
    properties: {
        tags: Schema.array({
            items: Schema.string(),
        }),
        alternativeText: Schema.string(),
        recommendations: Schema.array({
          items: Schema.object({
            properties: {
              id: Schema.integer(),
              text: Schema.string(),
              reason: Schema.string()
            }
          })
        }),
        fact: Schema.string(),
     }
});

Create the Generative AI Model

function getGenerativeAIModel(firebaseApp: FirebaseApp, remoteConfig: RemoteConfig) {
    const model = getValue(remoteConfig, 'geminiModelName').asString();
    const vertexAILocation = getValue(remoteConfig, 'vertexAILocation'). asString();
    const includeThoughts = getValue(remoteConfig, 'includeThoughts').asBoolean();
    const thinkingBudget = getValue(remoteConfig, 'thinkingBudget').asNumber();

    const ai = getAI(firebaseApp, { backend: new VertexAIBackend(vertexAILocation) });

    return getGenerativeModel(ai, {
        model,
        generationConfig: {
          responseMimeType: 'application/json',
          responseSchema: ImageAnalysisSchema,
          thinkingConfig: {
            includeThoughts,
            thinkingBudget,
          }
        },
        tools: [{
          googleSearch: {}
        }]
    });
}

The getValue function loads the Gemini model name, VertexAI location, thinking mode, and thinking budget from the Firebase Remote Config.

The getAI function instantiates a VertexAI backend.

The getGenerativeModel function instantiates a Gemini 3.0 Pro Image model that returns a structured output, and enables thinking and tool calling. The tools array enables Google Search and the thinkingConfig sets a token budget.

The Provider Logic

export function provideFirebase() {
    return makeEnvironmentProviders([
        {
            provide: AI_MODEL,
            useFactory: () => {
              const configService = inject(ConfigService);

              if (!configService.remoteConfig) {
                throw new Error('Remote config does not exist.');
              }

              if (!configService.firebaseApp) {
                throw new Error('Firebase App does not exist');
              }

              return getGenerativeAIModel(configService.firebaseApp, configService.remoteConfig);
            }
        }
    ]);
}

The provider function validates the remote config and the firebase app are valid before creating the AI model.

export const appConfig: ApplicationConfig = {
  providers: [
    ... provideAppInitializer ...
    provideFirebase(),
  ]
};

We include provideFirebase in appConfig so that we can use the injection token, AI_MODEL, to inject the Gemini 3.0 Pro Image model in services and components.

3. The Prompt Engineering

In the FirebaseService, we convert the uploaded image to a format Vertex AI accepts and construct a multi-step prompt. This ensures the model performs specific tasks, like generating specific hashtags or finding obscure facts.

private aiModel = inject(AI_MODEL);

The inject function injects the Gemini 3.0 Pro Image model into the service.

async generateAltText(image: File): Promise<ImageAnalysisResponse> {
  const imagePart = await fileToGenerativePart(image);

  const prompt = `
    You are asked to perform four tasks:
    Task 1: Generate an alternative text for the image provided, max 125 characters.
    Task 2: Generate at least 3 tags to describe the image.
    Task 3: Based on the alternative text and tags, provide suggestions to make the image more interesting.
    Task 4: Search for a surprising or obscure fact that interconnects the following tags. If a direct link doesn't exist, find a conceptual link.
  `;

  const result = await this.aiModel.generateContent({
    contents: [{ role: 'user', parts: [{ text: prompt }, imagePart] }]
  });

  if (result?.response) {
      const response = result.response;
      const thought = response.thoughtSummary() || '';
      const text = response.text().replace(/```
{% endraw %}
json\n?|
{% raw %}
```/g, '');
      const parsed: ImageAnalysis = JSON.parse(text);
      const tokenUsage = this.getTokenUsage(response.usageMetadata);
      const citations = this.constructCitations(response.candidates?.[0]?.groundingMetadata);

      return {
        parsed,
        thought,
        tokenUsage,
        metadata: citations,
      };
   }
   throw Error('No text generated.');
}

After receiving the response, we extract the thought summary, text, usageMetadata and grounding metadata.

3. Parse Response Text

The text is a string representation of a JSON object; therefore, it is parsed to a JSON object of type ImageAnalysis.

export type ImageAnalysis = {
    alternativeText: string;
    tags: string[];
    recommendations: Recommendation[];
    fact: string;
}

const text = response.text().replace(/```
{% endraw %}
json\n?|
{% raw %}
```/g, '');
const parsed: ImageAnalysis = JSON.parse(text);

4. Calculate Token Usage

Gemini 3 enables thinking by default; therefore, the total number of tokens is the sum of input tokens, output tokens, and thought tokens.

private getTokenUsage(usageMetadata?: UsageMetadata) {
    const tokenUsage = {
      input: usageMetadata?.promptTokenCount || 0,
      output: usageMetadata?.candidatesTokenCount || 0,
      thought: usageMetadata?.thoughtsTokenCount || 0,
      total: usageMetadata?.totalTokenCount || 0,
    };
    return tokenUsage;
}

Input tokens = usageMetadata?.promptTokenCount
Output tokens = usageMetadata?.candidatesTokenCount
Thought tokens = usageMetadata?.thoughtsTokenCount

5. Handling Citations (Grounding)

One of the coolest features of using Gemini 3 is grounding. When the model searches for that "obscure fact," it returns metadata containing the sources.

We need a helper function to parse this metadata and construct citations for the UI:

private constructCitations(groundingMetadata?: GroundingMetadata) {
  const citations: string[] = [];
  const supports = groundingMetadata?.groundingSupports;
  const chunks = groundingMetadata?.groundingChunks;
  if (supports) {
    for (const support of supports) {
      const indices = support.groundingChunkIndices || [];
      for (const index of indices) {
        const chunk = chunks[index];
        if (chunk?.web) {
          citations.push(chunk.web.uri);
        }
      }
    }
  }
  return citations;
}

We iterate the support.groundingChunkIndices to obtain each chunk index. Then, we use the index to look up the grounding chunk. When the chunk has a web uri, the web uri is appended to the citations array. Finally, the function returns the citations array.

6. The Angular App

In App Component, the FirebaseService generates alternative text when the generate button is clicked. The result is written to the analysis signal.

firebaseAiService = inject(FirebaseService);

async handleGenerateClick() {
    const file = this.selectedFile();
    if (!file) { 
        return;
    }

    this.analysis.set(undefined);

    const results = await this.firebaseAiService.generateAltText(file);
    this.analysis.set(results);
}

In the template, the app-tags-display component displays hashtags. The app-google-search-suggestions displays a surprising fact, citations, web search queries, and Google Search suggestions. The app-thought-summary component displays thought summaries and token usage.

@let imageAnalysis = analysis();
<app-tags-display [tags]="imageAnalysis?.parsed?.tags || []" />
<app-google-search-suggestions
    [interestingFact]="imageAnalysis?.parsed?.fact"
    [metadata]="imageAnalysis?.metadata"
/>
<app-thought-summary [thought]="imageAnalysis?.thought || ''" [tokenUsage]="imageAnalysis?.tokenUsage" />

Conclusion

Migrating to Gemini 3 in Vertex AI unlocked powerful features for this Angular application. The combination of Thinking Mode for transparency and Grounding for factual accuracy makes Gemini 3 an incredibly robust tool for analyzing complex visual media.