A 3rd year CS student's attempt to reduce AI's water footprint — EcoCache (A Python Library)

#ai #python #opensource #sustainability

Did you know that every ~20 questions you ask an AI chatbot consumes
roughly a 500ml bottle of water for data centre cooling?

As AI scales, so does its thirst. A huge chunk of this is pure
waste — because we ask LLMs the same things over and over. Every
redundant query is a real, physical cost.

I'm a 3rd year CS engineering student and I built EcoCache to
reduce and measure that waste.

What it does

EcoCache sits in front of your LLM API calls. Before hitting the
model, it checks whether a semantically similar question was already
answered. If yes — it returns the cached answer instantly. If no —
it calls the API and stores the result for next time.

It's not exact string matching. "What is TCP?" and "Can you explain
TCP protocols?" are recognised as the same question using vector
embeddings and cosine similarity.

See it in action

from ecocache.client import EcoCacheClient

client = EcoCacheClient()  # add your Gemini API key to .env

# First call — hits the API
r1 = client.chat("What is the difference between TCP and UDP?")
print(r1["source"])   # → "api"

# Similar question — served from cache, no API call made
r2 = client.chat("Can you explain TCP vs UDP protocols?")
print(r2["source"])      # → "cache"
print(r2["savings"])     # → water and carbon saved so far

The dashboard

It comes with a live dashboard that tracks savings in real time:

50% cache hit rate on my tests. Every cache hit = one fewer LLM
inference = ~5mL water and ~4g CO2 saved. Small numbers individually.
Meaningful at scale.

How it works under the hood

Query comes in
Sentence-transformers converts it to a 384-dimensional vector
FAISS searches for the nearest vector in the cache
If similarity > 0.85 — return cached response
If not — call the LLM, store the result

Try it

git clone https://github.com/GanugapatiSaiSowmya/ecocache
cd ecocache
python3.11 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

GitHub: https://github.com/GanugapatiSaiSowmya/ecocache

This is v0.1; rough edges exist. I'm actively working on it and
I appreciate your feedback, issues, or contributions.

If you think responsible AI development matters, a star would mean
a lot to a broke college student trying to make a dent ⭐

python #ai #sustainability #opensource #climatetech

Top comments (2)

Apex Stack • Mar 17

The semantic similarity approach with FAISS + sentence-transformers is a really clever architecture for this. I run a local Llama 3 instance to generate SEO content for a financial data site — thousands of stock analysis pages across multiple languages — and the redundancy problem you're describing is something I deal with constantly. Stocks in the same sector often generate nearly identical analytical patterns, so a huge portion of my LLM calls are essentially asking the same structural question with slightly different ticker data.

The 0.85 cosine similarity threshold is an interesting design choice. Have you experimented with how that number affects the tradeoff between cache hit rate and answer quality? In my experience with financial data, even small contextual differences (like asking about the same metric for two companies in the same sector) can produce meaningfully different answers, so I'd be curious how EcoCache handles that boundary between "similar enough to cache" and "different enough to re-query."

Also worth noting — the environmental angle could be a strong positioning strategy if you ever want to grow this beyond a library. Companies are increasingly reporting on their AI compute footprint, and having a concrete dashboard showing water/carbon savings per cached query is exactly the kind of metric sustainability teams want to see in audits.

Ganugapati Sai Sowmya • Mar 17 • Edited

Thank you so much for this comment. Your use case is really interesting and honestly, not something I had thought about when building this.

On why the 0.85 threshold... I chose it based on my own tests with general knowledge queries and it seemed to work okay there. But you've pointed out something I hadn't considered at all. For fields that deal with healthcare or like you mentioned financial data, where two companies in the same sector might look structurally similar but actually need different answers, 0.85 as a threshold would probably cause wrong cache hits. That's a real problem.

I think making the threshold configurable per use case makes sense — so someone like you could set it much stricter. Right now it's just one global value which is definitely a limitation I want to fix.

I don't know much about financial data or ESG reporting but your point about sustainability audits is something I want to think about more. I built the dashboard mostly just to make the savings visible... I didn't realise that kind of per-query metric was actually useful for companies in a formal way.

Still very early days with this, right now it's at v0.1 with a lot of rough edges. But feedback like this is exactly what helps me figure out what to build next. If you ever want to test it on your use case, I'd genuinely love to know how it holds up!
Thanks for the new perspective and insights! Really appreciated it!