DEV Community

Ganugapati Sai Sowmya
Ganugapati Sai Sowmya

Posted on

A 3rd year CS student's attempt to reduce AI's water footprint — EcoCache (A Python Library)

Did you know that every ~20 questions you ask an AI chatbot consumes
roughly a 500ml bottle of water for data centre cooling?

As AI scales, so does its thirst. A huge chunk of this is pure
waste — because we ask LLMs the same things over and over. Every
redundant query is a real, physical cost.

I'm a 3rd year CS engineering student and I built EcoCache to
reduce and measure that waste.

What it does

EcoCache sits in front of your LLM API calls. Before hitting the
model, it checks whether a semantically similar question was already
answered. If yes — it returns the cached answer instantly. If no —
it calls the API and stores the result for next time.

It's not exact string matching. "What is TCP?" and "Can you explain
TCP protocols?" are recognised as the same question using vector
embeddings and cosine similarity.

See it in action

from ecocache.client import EcoCacheClient

client = EcoCacheClient()  # add your Gemini API key to .env

# First call — hits the API
r1 = client.chat("What is the difference between TCP and UDP?")
print(r1["source"])   # → "api"

# Similar question — served from cache, no API call made
r2 = client.chat("Can you explain TCP vs UDP protocols?")
print(r2["source"])      # → "cache"
print(r2["savings"])     # → water and carbon saved so far
Enter fullscreen mode Exit fullscreen mode

The dashboard

It comes with a live dashboard that tracks savings in real time:

EcoCache dashboard showing cache hit rate, water saved, and recent queries

50% cache hit rate on my tests. Every cache hit = one fewer LLM
inference = ~5mL water and ~4g CO2 saved. Small numbers individually.
Meaningful at scale.

How it works under the hood

  1. Query comes in
  2. Sentence-transformers converts it to a 384-dimensional vector
  3. FAISS searches for the nearest vector in the cache
  4. If similarity > 0.85 — return cached response
  5. If not — call the LLM, store the result

Try it

git clone https://github.com/GanugapatiSaiSowmya/ecocache
cd ecocache
python3.11 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

GitHub: https://github.com/GanugapatiSaiSowmya/ecocache

This is v0.1; rough edges exist. I'm actively working on it and
I appreciate your feedback, issues, or contributions.

If you think responsible AI development matters, a star would mean
a lot to a broke college student trying to make a dent ⭐

python #ai #sustainability #opensource #climatetech

Top comments (1)

Collapse
 
apex_stack profile image
Apex Stack

The semantic similarity approach with FAISS + sentence-transformers is a really clever architecture for this. I run a local Llama 3 instance to generate SEO content for a financial data site — thousands of stock analysis pages across multiple languages — and the redundancy problem you're describing is something I deal with constantly. Stocks in the same sector often generate nearly identical analytical patterns, so a huge portion of my LLM calls are essentially asking the same structural question with slightly different ticker data.

The 0.85 cosine similarity threshold is an interesting design choice. Have you experimented with how that number affects the tradeoff between cache hit rate and answer quality? In my experience with financial data, even small contextual differences (like asking about the same metric for two companies in the same sector) can produce meaningfully different answers, so I'd be curious how EcoCache handles that boundary between "similar enough to cache" and "different enough to re-query."

Also worth noting — the environmental angle could be a strong positioning strategy if you ever want to grow this beyond a library. Companies are increasingly reporting on their AI compute footprint, and having a concrete dashboard showing water/carbon savings per cached query is exactly the kind of metric sustainability teams want to see in audits.