In this project, I stepped out of my comfort zone to upgrade from Gemini 2.5 to Gemini 3 in Vertex AI. The goal was to build an intelligent Image Alt Text Generator that goes beyond simple description and hashtags.
By leveraging Gemini 3.0 (Pro Preview), this application analyzes an image to generate alternate text, hashtags, recommendations, and even finds a surprising or obscuring fact using the built-in Google Search tool. Moreover, the thinking process is included so users can understand the AI model's rationale in tackling the task.
The Tech Stack
- Frontend: Angular v21
- Backend: Firebase AI Logic (Vertex AI)
- Model: Gemini 3.0 Pro Preview
- Cloud: Firebase App Check, Firebase Remote Config
- Features: JSON Structured Output, Thinking Mode, Google Search Grounding
Key Capabilities
- Structured Output: The model returns a strictly formatted JSON object, making it easy to display data in the UI.
- Thinking Mode: Gemini 3.0 enables "thinking" by default. We can track the token usage for thoughts, inputs, and outputs to understand the model's reasoning cost.
- Tool calling: The application uses built-in Google Search tool to look for facts, provides grounding chunks, Google Search suggestions, and web search queries in the grounding metadata.
1. Bootstrap Firebase
Before starting the Angular application, the provideAppInitializer function loads the model configurations from Firebase Remote Config.
function createRemoteConfig(firebaseApp: FirebaseApp) {
const remoteConfig = getRemoteConfig(firebaseApp);
// Set default values for Remote Config parameters.
remoteConfig.defaultConfig = {
'geminiModelName': 'gemini-3-pro-preview',
'vertexAILocation': 'global',
'includeThoughts': true,
'thinkingBudget': 512,
};
return remoteConfig;
}
The createRemoteConfig function instantiates a remote config and sets the default values for the parameters.
| Parameter | Description | Default Value |
|---|---|---|
| geminiModelName | Gemini Model | gemini-3-pro-preview |
| vertexAILocation | VertexAI Location | global |
| includeThoughts | Whether or not the model generates thinking processes | true |
| thinkingBudget | Thinking budget in tokens | 512 |
When the application fails to load the configuration values, it falls back to the values in remoteConfig.defaultConfig.
export async function bootstrapFirebase() {
try {
const configService = inject(ConfigService);
const firebaseApp = initializeApp(firebaseConfig.app);
initializeAppCheck(firebaseApp, {
provider: new ReCaptchaEnterpriseProvider(firebaseConfig.recaptchaEnterpriseSiteKey),
isTokenAutoRefreshEnabled: true,
});
const remoteConfig = createRemoteConfig(firebaseApp);
await fetchAndActivate(remoteConfig);
configService.loadConfig(firebaseApp, remoteConfig);
} catch (err) {
console.error('Remote Config fetch failed', err);
throw err;
}
}
The bootstrapFirebase function does the following:
- Inject a ConfigService and create a firebase application
- Initialize a Firebase App
- Initialize Firebase App check so that Firebase AI Logic does not get abused
- Create Firebase remote config and load the configurations
- Assign
firebaseAppandremoteConfigtoconfigService
export const appConfig: ApplicationConfig = {
providers: [
provideAppInitializer(async () => bootstrapFirebase()),
]
};
bootstrapApplication(App, appConfig)
.catch((err) => console.error(err));
The provideAppInitizer executes bootstrapFirebase to initialize Firebase.
2. Inject Generative AI Model
We load the configurations from Firebase Remote Config to construct a Generative AI model. The model specifies a response mime type and response schema to ensure Gemini 3 gives us a JSON object. Moreover, we enable thinking mode and Google Search by passing the thinkingConfig and tools.
The JSON Schema
First, we define what our data should look like. We want tags, alterative text, recommendations, and a surprising fact.
export const ImageAnalysisSchema = Schema.object({
properties: {
tags: Schema.array({
items: Schema.string(),
}),
alternativeText: Schema.string(),
recommendations: Schema.array({
items: Schema.object({
properties: {
id: Schema.integer(),
text: Schema.string(),
reason: Schema.string()
}
})
}),
fact: Schema.string(),
}
});
Create the Generative AI Model
function getGenerativeAIModel(firebaseApp: FirebaseApp, remoteConfig: RemoteConfig) {
const model = getValue(remoteConfig, 'geminiModelName').asString();
const vertexAILocation = getValue(remoteConfig, 'vertexAILocation'). asString();
const includeThoughts = getValue(remoteConfig, 'includeThoughts').asBoolean();
const thinkingBudget = getValue(remoteConfig, 'thinkingBudget').asNumber();
const ai = getAI(firebaseApp, { backend: new VertexAIBackend(vertexAILocation) });
return getGenerativeModel(ai, {
model,
generationConfig: {
responseMimeType: 'application/json',
responseSchema: ImageAnalysisSchema,
thinkingConfig: {
includeThoughts,
thinkingBudget,
}
},
tools: [{
googleSearch: {}
}]
});
}
The getValue function loads the Gemini model name, VertexAI location, thinking mode, and thinking budget from the Firebase Remote Config.
The getAI function instantiates a VertexAI backend.
The getGenerativeModel function instantiates a Nano Banana Pro model that returns a structured output, and enables thinking and tool calling. The tools array enables Google Search and the thinkingConfig sets a token budget.
The Provider Logic
export function provideFirebase() {
return makeEnvironmentProviders([
{
provide: AI_MODEL,
useFactory: () => {
const configService = inject(ConfigService);
if (!configService.remoteConfig) {
throw new Error('Remote config does not exist.');
}
if (!configService.firebaseApp) {
throw new Error('Firebase App does not exist');
}
return getGenerativeAIModel(configService.firebaseApp, configService.remoteConfig);
}
}
]);
}
The provider function validates the remote config and the firebase app are valid before creating the AI model.
export const appConfig: ApplicationConfig = {
providers: [
... provideAppInitializer ...
provideFirebase(),
]
};
We include provideFirebase in appConfig so that we can use the injection token, AI_MODEL, to inject the Nano Banana Pro model in services and components.
3. The Prompt Engineering
In the FirebaseService, we convert the uploaded image to a format Vertex AI accepts and construct a multi-step prompt. This ensures the model performs specific tasks, like generating specific hashtags or finding obscure facts.
private aiModel = inject(AI_MODEL);
The inject function injects the Nano Banana Pro model into the service.
async generateAltText(image: File): Promise<ImageAnalysisResponse> {
const imagePart = await fileToGenerativePart(image);
const prompt = `
You are asked to perform four tasks:
Task 1: Generate an alternative text for the image provided, max 125 characters.
Task 2: Generate at least 3 tags to describe the image.
Task 3: Based on the alternative text and tags, provide suggestions to make the image more interesting.
Task 4: Search for a surprising or obscure fact that interconnects the following tags. If a direct link doesn't exist, find a conceptual link.
`;
const result = await this.aiModel.generateContent({
contents: [{ role: 'user', parts: [{ text: prompt }, imagePart] }]
});
if (result?.response) {
const response = result.response;
const thought = response.thoughtSummary() || '';
const text = response.text().replace(/```
{% endraw %}
json\n?|
{% raw %}
```/g, '');
const parsed: ImageAnalysis = JSON.parse(text);
const tokenUsage = this.getTokenUsage(response.usageMetadata);
const citations = this.constructCitations(response.candidates?.[0]?.groundingMetadata);
return {
parsed,
thought,
tokenUsage,
metadata: citations,
};
}
throw Error('No text generated.');
}
After receiving the response, we extract the thought summary, text, usageMetadata and grounding metadata.
3. Parse Response Text
The text is a string representation of a JSON object; therefore, it is parsed to a JSON object of type ImageAnalysis.
export type ImageAnalysis = {
alternativeText: string;
tags: string[];
recommendations: Recommendation[]
fact: string;
}
const text = response.text().replace(/```
{% endraw %}
json\n?|
{% raw %}
```/g, '');
const parsed: ImageAnalysis = JSON.parse(text);
4. Calculate Token Usage
Gemini 3 enables thinking by default; therefore, the total number of tokens is the sum of input tokens, output tokens, and thought tokens.
private getTokenUsage(usageMetadata?: UsageMetadata) {
const tokenUsage = {
input: usageMetadata?.promptTokenCount || 0,
output: usageMetadata?.candidatesTokenCount || 0,
thought: usageMetadata?.thoughtsTokenCount || 0,
total: usageMetadata?.totalTokenCount || 0,
};
return tokenUsage;
}
- Input tokens = usageMetadata?.promptTokenCount
- Output tokens = usageMetadata?.candidatesTokenCount
- Thought tokens = usageMetadata?.thoughtsTokenCount
5. Handling Citations (Grounding)
One of the coolest features of using Gemini 3 is grounding. When the model searches for that "obscure fact," it returns metadata containing the sources.
We need a helper function to parse this metadata and construct citations for the UI:
private constructCitations(groundingMetadata?: GroundingMetadata) {
const citations: string[] = [];
const supports = groundingMetadata?.groundingSupports;
const chunks = groundingMetadata.groundingChunks;
if (supports) {
for (const support of supports) {
const indices = support.groundingChunkIndices || [];
for (const index of indices) {
const chunk = chunks[index];
if (chunk?.web) {
citations.push(chunk.web.uri);
}
}
}
}
return citations;
}
We iterate the support.groundingChunkIndices to obtain each chunk index. Then, we use the index to look up the grounding chunk. When the chunk has a web uri, the web uri is appended to the citations array. Finally, the function returns the citations array.
6. The Angular App
In App Component, the FirebaseService generates alternative text when the generate button is clicked. The result is written to the analysis signal.
firebaseAiService = inject(FirebaseService);
async handleGenerateClick() {
const file = this.selectedFile();
if (!file) {
return;
}
this.analysis.set(undefined);
const results = await this.firebaseAiService.generateAltText(file);
this.analysis.set(results);
}
In the template, the app-tags-display component displays hashtags. The app-google-search-suggestions displays a surprising fact, citations, web search queries, and Google Search suggestions. The app-thought-summary component displays thought summaries and token usage.
@let imageAnalysis = analysis();
<app-tags-display [tags]="imageAnalysis?.parsed?.tags || []" />
<app-google-search-suggestions
[interestingFact]="imageAnalysis?.parsed?.fact"
[metadata]="imageAnalysis?.metadata"
/>
<app-thought-summary [thought]="imageAnalysis?.thought || ''" [tokenUsage]="imageAnalysis?.tokenUsage" />
Conclusion
Migrating to Gemini 3 in Vertex AI unlocked powerful features for this Angular application. The combination of Thinking Mode for transparency and Grounding for factual accuracy makes Gemini 3 an incredibly robust tool for analyzing complex visual media.
Resources
- Enable Thought Summaries: https://firebase.google.com/docs/ai-logic/thinking?api=vertex#enable-thought-summaries
- Grounding: https://firebase.google.com/docs/ai-logic/grounding-google-search?api=vertex#ground-the-model
- Github Repo: https://github.com/railsstudent/firebase-ai-hybrid-demo
Note: Google Cloud credits are provided for this project.
Top comments (0)