Your team just shipped an ML inference endpoint. Stakeholders are asking uncomfortable questions about environmental impact. Someone forwards a scary headline about AI boiling the oceans. Your PM wants numbers by Friday.
Sound familiar? I've been there twice in the last year alone. And here's what I've learned: most of the numbers people throw around about AI resource consumption are either wildly out of context or completely made up. The real problem isn't that AI uses resources — everything does — it's that most teams have zero visibility into their specific footprint.
Let's fix that.
The actual problem: you're flying blind
When someone asks "how much water does our AI stuff use?" most developers shrug. Fair enough — your cloud provider abstracts away the physical infrastructure. But that abstraction is exactly what makes the conversation so frustrating. You end up arguing about industry-wide estimates instead of your actual workload.
Recent analysis from researchers at UC Davis (published on the California Water Blog) suggests that AI's water footprint, while real, is a small fraction of what sectors like agriculture consume. Data centers in general — not just AI — account for a relatively modest share of total water use. But "relatively modest at industry scale" doesn't answer your PM's question about your deployment.
The root cause of the panic is a measurement gap. So let's close it.
Step 1: Understand what actually consumes resources
Before measuring anything, you need a mental model of where the water and energy go.
Data centers use water primarily for cooling. The metric you care about is PUE (Power Usage Effectiveness) and WUE (Water Usage Effectiveness).
# PUE = Total facility power / IT equipment power
# A PUE of 1.0 means perfect efficiency (impossible)
# Modern hyperscale data centers typically hit 1.1-1.2
# Older facilities can be 1.5-2.0
# WUE = Annual water usage (liters) / IT equipment energy (kWh)
# Lower is better. Some facilities use air cooling and hit near-zero WUE.
def estimate_workload_water(gpu_hours, tdp_watts, pue, wue_liters_per_kwh):
"""Rough estimate of water consumption for a GPU workload."""
# Total energy including facility overhead
energy_kwh = (gpu_hours * tdp_watts / 1000) * pue
# Water used for cooling
water_liters = energy_kwh * wue_liters_per_kwh
return {
"energy_kwh": round(energy_kwh, 2),
"water_liters": round(water_liters, 2),
"water_gallons": round(water_liters * 0.264172, 2)
}
# Example: 100 GPU-hours on an A100 (300W TDP)
# at a modern facility (PUE 1.1, WUE 1.8 L/kWh)
result = estimate_workload_water(
gpu_hours=100,
tdp_watts=300,
pue=1.1,
wue_liters_per_kwh=1.8 # varies enormously by facility and climate
)
print(result)
# {'energy_kwh': 33.0, 'water_liters': 59.4, 'water_gallons': 15.69}
That's 100 GPU-hours producing roughly 60 liters of water consumption. For context, a single load of laundry uses about 50-80 liters. This isn't nothing, but it's not the apocalypse either.
The catch: WUE varies massively by location and season. A data center in Phoenix using evaporative cooling in summer will have a very different WUE than one in Sweden using outside air. Some newer facilities in cool climates achieve near-zero direct water usage.
Step 2: Pull real numbers from your cloud provider
Stop guessing. The major cloud providers now expose carbon and energy data, and you can derive water estimates from these.
# If you're on Google Cloud, their Carbon Footprint reporting
# is available in the console and via API
# Export your project's carbon data:
gcloud beta billing accounts describe $BILLING_ACCOUNT_ID \
--format="json" | jq '.carbonInformation'
# For AWS, check the Customer Carbon Footprint Tool
# Available in the billing console since 2022
# For a more granular, provider-agnostic approach,
# use the Cloud Carbon Footprint open-source tool:
git clone https://github.com/cloud-carbon-footprint/cloud-carbon-footprint.git
cd cloud-carbon-footprint
yarn install
yarn start # spins up a dashboard connected to your cloud billing data
The open-source Cloud Carbon Footprint project is genuinely useful here. It pulls billing data from AWS, GCP, or Azure and estimates energy consumption and carbon emissions. Water isn't directly tracked by most tools yet, but once you have energy numbers, you can estimate water using published WUE values for your provider's regions.
Step 3: Optimize the workload itself
The most effective way to reduce your footprint isn't buying carbon offsets or switching providers. It's using fewer resources.
Pick the right model size
This is the biggest lever. A distilled 7B parameter model running inference can be 10-50x cheaper in compute than a 70B model. If your use case doesn't need the larger model, you're wasting everything — money, energy, and water.
# Before: using a massive model for simple classification
# GPU memory: 40GB+, inference time: 800ms
response = large_model.predict(text) # overkill for sentiment analysis
# After: fine-tuned smaller model for the specific task
# GPU memory: 8GB, inference time: 45ms
response = distilled_model.predict(text) # 90% of the accuracy, 5% of the cost
Batch your inference
If your workload allows it, batching requests reduces per-query energy consumption significantly because the GPU spends less time idle between requests.
Choose your region deliberately
This is underrated. Running your workload in a region with cooler climate and cleaner energy grid reduces both carbon and water footprint. Northern Europe and Canada tend to score well here. If latency requirements allow it, this is free optimization.
Use spot/preemptible instances for training
Beyond cost savings, spot instances tend to fill capacity that would otherwise sit idle (and still consume baseline power for cooling). You're using resources that are already being cooled.
Step 4: Build a dashboard your PM can read
The goal isn't just to measure — it's to communicate. Here's a simple approach:
import json
from datetime import datetime
def generate_sustainability_report(workloads, facility_pue=1.1, facility_wue=1.8):
"""Generate a simple sustainability report for AI workloads."""
report = {
"generated": datetime.now().isoformat(),
"facility_assumptions": {"pue": facility_pue, "wue_l_per_kwh": facility_wue},
"workloads": []
}
total_water = 0
for w in workloads:
energy = (w["gpu_hours"] * w["tdp_watts"] / 1000) * facility_pue
water = energy * facility_wue
total_water += water
report["workloads"].append({
"name": w["name"],
"energy_kwh": round(energy, 1),
"water_liters": round(water, 1)
})
# Add a comparison that makes sense to non-engineers
avg_shower_liters = 65 # ~8 min shower
report["total_water_liters"] = round(total_water, 1)
report["equivalent_showers"] = round(total_water / avg_shower_liters, 1)
return report
workloads = [
{"name": "recommendation-model-training", "gpu_hours": 500, "tdp_watts": 300},
{"name": "inference-endpoint-daily", "gpu_hours": 24, "tdp_watts": 300},
]
print(json.dumps(generate_sustainability_report(workloads), indent=2))
Relatable comparisons are key. "Our monthly AI inference uses the same water as 5 showers" lands better than "297 liters" in a stakeholder meeting.
Prevention: making this a habit
Don't let this be a one-time fire drill. A few things that have worked for my teams:
- Add resource estimates to your model evaluation process. When comparing model A vs model B, include compute cost alongside accuracy metrics. Make it part of the decision.
- Set up monthly reporting. Cloud Carbon Footprint can export data. Pipe it into whatever dashboard your team already watches.
- Track efficiency, not just totals. Total water use going up because you're serving 10x more users is fine. Water-per-request going up means something is wrong.
-
Document your region choices. When someone asks why you're running in
europe-north1instead ofus-central1, you want that reasoning written down.
The bigger picture
Here's my honest take after digging into this: AI's resource footprint is real and worth measuring, but the discourse around it has been driven more by headline anxiety than by data. Agriculture dwarfs data center water consumption by orders of magnitude. That doesn't mean we shouldn't optimize — we absolutely should — but context matters.
The best thing you can do as a developer is replace hand-waving with actual numbers. Measure your workload. Optimize what you can. Report clearly. And when someone forwards you a panic headline, you'll have real data to respond with instead of vibes.
That's a much better position to be in than arguing about industry-wide estimates that may or may not apply to your 8-GPU inference cluster.
Top comments (0)