Greenwashing is everywhere. Companies claim to be "eco-friendly," "sustainable," and "carbon-neutral" without evidence to back it up. We built Greenwashing Checker to automatically analyze environmental claims and flag potential greenwashing.
The technical challenge: teaching a system to distinguish genuine sustainability commitments from marketing fluff.
What Makes a Claim Greenwashing?
Before writing any code, we needed a taxonomy. Based on research from the European Commission and the FTC Green Guides, we identified seven patterns of greenwashing:
- Vague claims: "eco-friendly" with no specifics
- Irrelevant claims: Technically true but meaningless ("CFC-free" when CFCs are banned)
- Hidden tradeoffs: Highlighting one green attribute while ignoring larger impacts
- No proof: Claims without third-party certification or data
- Lesser of two evils: "Green" cigarettes, "sustainable" fast fashion
- Fake labels: Made-up certifications that look official
- Outright lies: Fabricated data or false certifications
Our system focuses on patterns 1, 2, 4, and 6 — the ones most amenable to automated detection.
The Analysis Pipeline
When a user submits a URL or text for analysis, it goes through four stages:
Input Text → Claim Extraction → Pattern Matching → Evidence Check → Score
Stage 1: Claim Extraction
First, we need to identify which sentences are actually making environmental claims. Not every sentence on a sustainability page is a claim — many are just filler.
import re
GREEN_KEYWORDS = [
"sustainable", "eco-friendly", "green", "carbon neutral",
"carbon negative", "net zero", "renewable", "biodegradable",
"recyclable", "organic", "natural", "clean energy",
"zero waste", "climate positive", "ethically sourced",
"fair trade", "cruelty free", "plant based", "compostable"
]
def extract_claims(text):
sentences = text.split(".")
claims = []
for sentence in sentences:
sentence = sentence.strip().lower()
keyword_matches = [
kw for kw in GREEN_KEYWORDS
if kw in sentence
]
if keyword_matches:
claims.append({
"text": sentence,
"keywords": keyword_matches,
"has_quantifier": bool(re.search(r"\d+%|\d+ tons?|\d+ tonnes?", sentence))
})
return claims
Stage 2: Vagueness Detection
This is where the NLP gets interesting. We score each claim on a specificity scale:
def vagueness_score(claim):
score = 0
text = claim["text"]
# Vague qualifiers increase score
vague_patterns = [
(r"\b(some|many|various|several)\b", 0.2),
(r"\b(striving|working towards|committed to)\b", 0.3),
(r"\b(eco-friendly|green|sustainable)\b", 0.15), # Without specifics
(r"\b(better for|good for|helps?)\b", 0.2),
]
for pattern, weight in vague_patterns:
if re.search(pattern, text):
score += weight
# Specific quantifiers reduce score
if claim["has_quantifier"]:
score -= 0.3
# Named certifications reduce score
certifications = ["iso 14001", "b corp", "leed", "energy star",
"fsc", "rainforest alliance", "cradle to cradle"]
for cert in certifications:
if cert in text:
score -= 0.4
return max(0, min(1, score))
A claim like "We are committed to being more sustainable" scores high on vagueness. A claim like "We reduced Scope 1 emissions by 23% between 2023-2025, verified by ISO 14064" scores low.
Stage 3: Certification Verification
We maintain a database of legitimate environmental certifications and their visual identifiers:
CREATE TABLE certifications (
id INT PRIMARY KEY,
name VARCHAR(200),
issuing_body VARCHAR(200),
verification_url VARCHAR(500),
is_legitimate BOOLEAN,
category VARCHAR(100)
);
-- Known fake/misleading certifications
INSERT INTO certifications (name, is_legitimate) VALUES
("Green Approved", false),
("Eco Safe", false),
("Nature Certified", false),
("100% Green", false);
-- Legitimate certifications
INSERT INTO certifications (name, is_legitimate, issuing_body) VALUES
("B Corporation", true, "B Lab"),
("ISO 14001", true, "ISO"),
("Energy Star", true, "EPA"),
("FSC Certified", true, "Forest Stewardship Council");
When our system detects a certification mentioned in text, it cross-references this database. Fake or self-awarded certifications are flagged immediately.
Stage 4: AI-Powered Deep Analysis
For nuanced cases that rule-based patterns miss, we use an LLM as a second opinion:
def ai_deep_analysis(claims, company_context):
prompt = f"""Analyze these environmental claims for potential greenwashing.
Company: {company_context}
Claims: {json.dumps(claims)}
For each claim, evaluate:
1. Specificity (1-10): How specific and measurable is this claim?
2. Verifiability (1-10): Could a third party verify this?
3. Materiality (1-10): Does this address the company actual environmental impact?
4. Red flags: Any classic greenwashing patterns?
Return structured JSON."""
return call_ai_api(prompt)
The AI layer catches subtle patterns like a fast fashion brand highlighting their use of recycled hangers while ignoring the environmental impact of producing millions of garments.
The Scoring System
We output a "Greenwashing Risk Score" from 0 to 100:
- 0-25: Low risk. Claims are specific, verified, and material.
- 26-50: Moderate risk. Some vague claims but generally substantiated.
- 51-75: High risk. Multiple vague or unsubstantiated claims.
- 76-100: Very high risk. Classic greenwashing patterns detected.
The score is a weighted combination of all four analysis stages. We deliberately avoid binary "greenwashing yes/no" labels because the reality is always a spectrum.
Technical Challenges We Solved
Multi-language support: Greenwashing is not an English-only problem. We support French, German, Spanish, and Italian claim detection. Each language has its own set of vague qualifiers and green buzzwords.
PDF parsing: Many sustainability reports are published as PDFs. We use a combination of pdftotext and custom layout parsing to extract meaningful text from formatted reports.
Rate limiting the AI layer: To keep costs manageable, we cache AI analyses per domain and only re-analyze when the page content changes significantly (measured by content hash similarity above 85%).
Accuracy and Limitations
We are transparent about what the tool can and cannot do. It excels at:
- Detecting vague, unsubstantiated claims
- Identifying fake certifications
- Flagging missing evidence
It struggles with:
- Evaluating whether specific numbers are accurate (we cannot audit a company)
- Detecting hidden tradeoffs without industry-specific context
- Analyzing claims in images or videos
We clearly state these limitations on every analysis result page at greenwashing-checker.com.
Why Open Methodology Matters
We publish our scoring methodology in full. If a company disagrees with their score, they can see exactly which claims triggered which flags. This transparency has led to several companies actually improving their sustainability communications after seeing their analysis — which might be the best outcome we could hope for.
Analyze any company sustainability page at greenwashing-checker.com — the methodology is fully transparent.
Top comments (0)