From Photo to Footprint: Building a Carbon Analyzer with Gemini AI

#devchallenge #googleaichallenge #ai #gemini

Google AI Challenge Submission

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

Have you ever stood in a store, holding a product, and wondered about its real environmental cost? What did it take to create this? From the raw materials pulled from the earth to the energy used in the factory, every item has an invisible "embodied carbon" footprint. But this information is almost impossible to find.

I wanted to change that. My goal was to build a tool that could make this complex data instantly accessible and understandable. The result is Carbon Scan, a web app that uses Google's Gemini AI to estimate the cradle-to-gate carbon footprint of everyday objects from a simple photo or text description.

The Solution: Instant Carbon Insights

Carbon Scan is a simple, fast, and mobile-friendly tool. You can either use your device's camera to scan an object in front of you or type a description like "a wool sweater" or "a 13-inch laptop." In seconds, you get a detailed analysis.

Demo

Check Out The Live App Here

The app provides not just a total estimate, but also:

A breakdown of the footprint across materials, production, transport, and packaging.
Suggestions for lower-carbon alternatives with actionable tips.
A list of data sources the model referenced for its analysis.
The ability to ask follow-up questions for comparative analysis.

How I Used Google AI Studio

Of course, the magic behind Carbon Scan is the Gemini API, specifically the gemini-2.5-flash model. I chose it for three key reasons: its powerful multimodal capabilities, its reliable JSON output mode, and its speed.

1. The Prompt: Giving Gemini Its Mission
I engineered the prompt to be a multi-step process, ensuring I get all the necessary information in a structured way. Here is the core prompt sent to the API:

Analyze the user's request to estimate the "cradle-to-gate" embodied carbon footprint. Strictly follow these instructions:

Assess if the main object is clear and identifiable from the image or text. If not, state why (e.g., "Too blurry", "Ambiguous description").

If identifiable, provide: the object's name, a brief explanation of the carbon estimate, the total carbon estimate in kg CO₂e, and a numerical breakdown for {materials, production, transport, packaging} that sums to the total.

Suggest 2-3 lower-carbon alternatives. For each alternative, provide its name, a brief reason why it's lower-carbon, and one actionable tip for the user.

Provide 2-3 primary data sources you are referencing, including the title and a real, valid URL.

If an image is provided, also provide a normalized bounding box [x, y, width, height] for the main object. Omit this for text-only requests.

This structured approach turns Gemini into a specialized analysis tool. To handle follow-up questions, a second, context-aware prompt is generated:

function createFollowUpPrompt(prevResult, userQuery) {
    return `The user is following up on a previous analysis.
    PREVIOUS ANALYSIS CONTEXT:
    - Object: ${prevResult.objectName}
    - Estimated Footprint: ${prevResult.carbonEstimateKgCO2e} kg CO₂e
    USER'S FOLLOW-UP QUESTION: "${userQuery}"

    Based on the original image and this new context, provide an answer.
    ...
    Strictly follow all original instructions and the JSON output schema.`;
}

2. The JSON Schema: Demanding structured data
Just asking an AI for JSON can be unreliable. It might forget a field, use the wrong data type, or generate invalid syntax. Gemini's JSON mode, enforced with a responseSchema, solves this completely.

By defining the exact shape of the data I expect on the server, I guarantee that the API response will be a perfectly formatted, parseable JSON object every single time. This eliminates a huge amount of client-side validation and error handling.

Here’s a snippet of the schema I defined in server.js:

const RESPONSE_SCHEMA = {
  type: 'OBJECT',
  properties: {
    isIdentifiable: { type: 'BOOLEAN' },
    objectName: { type: 'STRING' },
    carbonEstimateKgCO2e: { type: 'NUMBER' },
    breakdown: {
      type: 'OBJECT',
      properties: {
        materials: { type: 'NUMBER' },
        production: { type: 'NUMBER' },
        // ... and so on
      },
      required: ['materials', 'production', 'transport', 'packaging'],
    },
    // ... schema for alternatives, sources, etc.
  },
};

3. The Secure Backend: Protecting the API Key
A crucial best practice is to never expose your API key on the client-side. To handle this, I built a simple Node.js Express server that acts as a secure backend-for-frontend (BFF).

The browser sends the image and text to our server, which then securely calls the Gemini API, adding the API key on the backend where it's safe. For production, the key isn't even stored in an environment variable; it's fetched securely from Google Cloud Secret Manager on server startup.
This snippet from server.js shows how the key is fetched:

// server.js
const { SecretManagerServiceClient } = require('@google-cloud/secret-manager');
let geminiApiKey;

async function accessGeminiApiKey() {
    if (geminiApiKey) return;

    // This name points to the secret stored securely in Google Cloud
    const secretName = 'projects/PROJECT_ID/secrets/CarbonScan/versions/latest';

    try {
        const client = new SecretManagerServiceClient();
        const [version] = await client.accessSecretVersion({ name: secretName });
        geminiApiKey = version.payload.data.toString('utf8');
        console.log('Successfully fetched API key from Secret Manager.');
    } catch (error) {
        console.error('Failed to access secret from Secret Manager.', error);
        throw new Error('Could not load API key.');
    }
}

Multimodal Features

The core of Carbon Scan's user experience is its multimodal functionality, allowing users to interact with the world in the most natural way possible. The app accepts two distinct types of input to perform the same complex analysis, all powered by the same underlying AI model.

Visual input via camera: Users can simply point their device's camera at a physical object and capture an image. This is the most intuitive way to analyze something you have right in front of you. The Gemini model processes the image directly, identifying the object, its likely materials, and its context to provide a nuanced carbon footprint estimate. This bridges the gap between the physical world and digital analysis.
Text input: Sometimes you don't have the object on hand, or you want to analyze a more generic item (e.g., "a 13-inch laptop" or "a pair of leather boots"). The text input allows users to describe any item they can imagine. The AI uses this text to generate the same detailed analysis, drawing on its vast knowledge base. Similarly, the user can give text inputs to explore alternative scenarios, such as if the product was made with a different material or in a different location.

Challenges & Learnings

Securing the API Key: My first instinct was a pure frontend app, but I quickly realized exposing the API key was a non-starter. Building the simple Express proxy was a good learning experience and seems to be the right way to build production-ready AI apps. It also allowed me to integrate with Secret Manager, a professional best practice.
Handling "Bad" Scans: What happens if a user scans a blurry photo or the sky? The AI needs a way to fail gracefully. Adding the isIdentifiable boolean flag and reasonIfNot string to my JSON schema was the solution. It allows the AI to tell me why it couldn't perform an analysis, which I can then show to the user (e.g., "The image is too blurry to identify an object.").
Deployment: Taking a local dev project to a live, scalable URL on Google Cloud Run was a journey. I learned how to containerize the application with Docker, set up a build pipeline with Cloud Build, and configure IAM permissions for Secret Manager. It's a nice setup that goes from a git push to a live deployment automatically.

Conclusion

Building Carbon Scan has been an fun exploration of what's possible with modern AI. Gemini's multimodal and structured data capabilities allowed me to create a tool that feels both impressive and genuinely useful. While the results are AI-generated estimates for educational purposes, they represent a step toward making complex environmental data more transparent and accessible to everyone.

I'm curious to see how others might use and build upon this concept. Though, I'd admittedly just be happy if anyone finds it useful and interest at all, lol.

For future improvements, I'm mainly considering including other environmental impacts than just GHG emissions. For example, blue water consumption. Folks often don't realize just how much water is embedded in products, with huge amounts of water consumed in everything from food production to electricity generation. I also haven't fully built out the "Personal Footprint Tracker" aspect, that would let users save the estimates that are generated and track their personal GHG emissions over time.

Check Out The Live App Here
Github repo here

What would you scan first? Let me know in the comments below. Thanks for reading about and checking out Carbon Scan!

Important Disclaimer

The carbon footprint values provided by this application are estimates generated by an AI model. They are for informational and educational purposes only and have not been validated by a third-party expert. These results should not be used for commercial purposes, marketing claims, or official carbon accounting.

FAQ

What does "cradle-to-gate" mean?
"Cradle-to-gate" refers to the carbon footprint of a product from the extraction of raw materials (the "cradle") up to the point it leaves the factory gate. It includes emissions from materials, manufacturing, and transport, but excludes distribution, consumer use, and end-of-life disposal.
What is a Product Carbon Footprint (PCF)?
A Product Carbon Footprint is the total amount of greenhouse gas (GHG) emissions generated throughout a product's life cycle. It's a key metric for understanding a product's environmental impact. This app provides a "cradle-to-gate" estimate of the PCF.
Why is ISO-conformance important?
The International Organization for Standardization (ISO) provides standards (like ISO 14067) that create a consistent, internationally recognized methodology for calculating PCFs. Conformance ensures that the calculations are transparent, credible, and follow best practices.
Can I compare results from different studies?
It's often very difficult. Even ISO-compliant studies can differ in their assumptions, system boundaries, and data sources. Comparing results requires deep expertise in Life Cycle Assessment (LCA) to ensure the comparison is fair. Treat each PCF value as specific to its own study.
What’s the difference between CO₂, CO₂e, and GHG?
CO₂ (Carbon Dioxide): A single greenhouse gas.
GHG (Greenhouse Gases): A broad category of gases that trap heat, including CO₂, methane (CH₄), and nitrous oxide (N₂O).
CO₂e (Carbon Dioxide Equivalent): A standard unit for measuring carbon footprints. It converts the impact of different GHGs into the equivalent amount of CO₂.