DEV Community

Cover image for Scopewise: Submission for Weekend Challenge: Earth Day Edition
Alessandro Bahgat
Alessandro Bahgat

Posted on

Scopewise: Submission for Weekend Challenge: Earth Day Edition

DEV Weekend Challenge: Earth Day

This is a submission for Weekend Challenge: Earth Day Edition

What I Built

ScopeWise is an agentic Scope 3 Category 6 (business travel) control center for a fictional company. The landing view isn't an annual report — it's an Opportunities panel that names the four or five policy levers a sustainability lead should pull next, each ranked by estimated tCO₂e recovered. Below that: a filter strip, the headline KPIs against an SBTi trajectory, a baseline-comparison table that benchmarks every route against a cabin-agnostic typical, and the heaviest-routes ledger with a "vs typical" ratio column flagged wherever a route is running >1.2× baseline. A Gemini 2.5 Flash drawer lets the operator ask natural-language follow-ups that call the same tools the UI reads from.

The twist is synthetic trips, real emissions, two TIM products. I curated a catalog of 365 real scheduled flights (42 carriers, 55 airports) from publicly-published airline timetables, then generated 2,000 synthetic corporate trips on top of that catalog. Each trip is submitted to Google's Travel Impact Model via computeScope3FlightEmissions — the enrichment pipeline keeps only rows where TIM returned source: TIM_EMISSIONS (real per-flight model output, not a fallback to a typical or modeled baseline). Of 2,000 candidate trips, TIM had specific data for 810 across 115 routes; the other 1,190 — discontinued services, wet-leased routes, regional carriers outside the catalog — were dropped rather than filled with an estimate. The v2 move: run computeTypicalFlightEmissions over every unique surviving route in one batch call, land that cabin-agnostic baseline alongside the specific number, and turn the gap between them into the signal the control center hangs off. The specific-flight claim stays load-bearing; typical is a separate data product used only for comparison.

Demo

Live URL: https://scopewise-demo.vercel.app

Hero image

Code

Repository: github.com/abahgat/scopewise

Mermaid chart

How I Built It

1. The data-integrity problem with synthetic demos

Most "Scope 3 dashboard" demos are built on fabricated emissions values — a constant kg-per-km multiplied by a distance estimate. Judges can't verify the numbers, and a careful reader doesn't have to: the same dashboard with different constants would tell a different story. I wanted every figure on the screen to be defensible.

The move was to curate a catalog of real (carrier, flight_number, origin, destination) tuples from public airline schedules, then build 2,000 synthetic trips on top of that catalog. The enrichment script submits each trip to TIM's Scope 3 endpoint and enforces the specific-flight check at build time:

// scripts/enrich-emissions.ts
if (r.source !== 'TIM_EMISSIONS') {
  droppedFallback++;
  continue;
}
Enter fullscreen mode Exit fullscreen mode

TIM_EMISSIONS is the marker TIM stamps when it has real per-flight model output — actual aircraft, load factor, seat configuration. Anything else (TYPICAL_FLIGHT_EMISSIONS, modeled defaults) means TIM fell back off the per-flight catalog for that market/date, so we drop the row instead of landing an estimate in its place. Of 2,000 candidate trips, 810 survive and 1,190 are dropped (discontinued services, wet-leased routes, regional carriers outside the catalog). Every surviving row carries a tim_model_version stamp and a defensible per-flight number; the 1,190 we can't enrich don't quietly turn into distance-formula guesses.

2. Baseline comparison via computeTypicalFlightEmissions

An annual report tells you how much you emitted. A control center tells you what to do about it. The move that turns one into the other is the second TIM endpoint: computeTypicalFlightEmissions. It takes a markets[] array of {origin, destination} pairs (up to 1,000 per batch) and returns a cabin-agnostic baseline in gCO₂/pax. For the 115 unique routes remaining after the specific-flight check, that's a single API call at enrichment time. The pipeline lands the baseline into the same row as the specific number, and every aggregation downstream carries an actual / typical ratio.

Opportunities

That ratio is the signal the Opportunities panel reads off of. Four insight categories fall out naturally:

  • HIGH_RATIO_ROUTE — routes with ≥3 flights averaging ≥1.8× typical (cabin-mix story)
  • DEPT_BUSINESS_OVERUSE — departments where BUSINESS+FIRST is >50% of their emissions
  • SHORT_HAUL_UPGRADE_PATTERN — BUSINESS/FIRST on routes where typical is small (proxy for <3h legs)
  • UNDERUSED_ECONOMY_OPPORTUNITY — (dept, route) pairs that are 100% non-ECONOMY on a route that sees ECONOMY elsewhere

The impact estimate per insight is the obvious one: sum_actual − sum_actual / avg_ratio, divided by 1,000, annualized. It's not a SBTi-grade abatement curve — it's a reasonable first cut that orders interventions by size and gives a number a human can argue with.

Baseline

The honest caveat, stated explicitly under the baseline-comparison table in the UI: typical is cabin-agnostic, specific is cabin-aware. A FIRST-cabin trip's 4× ratio is dominated by cabin choice, not airline or aircraft. We lean into that rather than hiding it — the intervention framing ("if this dept flew BUSINESS on <8h routes instead of FIRST, you'd save X tCO₂e") is exactly what a policy lever looks like.

One UX consequence: filters (department, cabin, date range) live in client-side context and re-derive every aggregation from a single trip snapshot rather than re-fetching. At 810-row scale the whole page recomputes in well under a frame, so changing a dept pill updates the chart, the timeline, the baseline table and the Opportunities impact numbers instantly. That's the affordance that makes this feel like a console instead of a static report.

3. Snowflake as the single source of truth

The warehouse holds one fact table (ENRICHED_TRIPS), two dimensions (EMPLOYEES, AIRPORTS), and four analytical views that feed both the REST endpoints and the agent's SQL tool. V_TOP_ROUTES is the most interesting of them because it precomputes the share-of-total column the dashboard needs:

CREATE OR REPLACE VIEW V_TOP_ROUTES AS
WITH total AS (
    SELECT SUM(emissions_kg_co2e) AS company_total FROM ENRICHED_TRIPS
)
SELECT
    e.origin_iata      AS origin,
    e.destination_iata AS destination,
    COUNT(*)           AS flights,
    SUM(e.emissions_kg_co2e) AS total_kg,
    100.0 * SUM(e.emissions_kg_co2e) / NULLIF(t.company_total, 0) AS pct_of_total
FROM ENRICHED_TRIPS e
CROSS JOIN total t
GROUP BY e.origin_iata, e.destination_iata, t.company_total
ORDER BY total_kg DESC;
Enter fullscreen mode Exit fullscreen mode

The other three views (V_EXEC_SUMMARY, V_DEPT_EMISSIONS_MONTHLY, V_CLASS_BREAKDOWN_MONTHLY) handle the KPI tiles, the department rollup, and the stacked-area chart by class of service. Because both the dashboard and the agent read through the same views, adding a new metric means one view change — not two code paths drifting out of sync. Auth is key-pair to a service user, private key stored base64-encoded in Vercel env vars.

4. Gemini with grounded tools

Chat with Gemini

The chat endpoint is Vercel AI SDK v4 + @ai-sdk/google pointed at gemini-2.5-flash. Six tools are exposed, all typed with Zod:

  • get_top_routes({ limit, dept? }) — pre-baked route ranking
  • get_department_breakdown({ period }) — per-dept kg/employee for all, ytd, or a YYYY-MM
  • compare_to_benchmark({ dept }) — dept vs company average with signed delta %
  • get_insights({ category? }) — the v2 tool: returns the same Opportunities the UI is ranking, so the agent can say "this route shows up on your Opportunities panel as HIGH_RATIO_ROUTE" instead of re-deriving it
  • explain_chart({ chart_id }) — returns the exact slice behind each dashboard chart so the agent narrates what the user is actually looking at
  • run_sql({ query }) — the escape hatch

The escape hatch is the interesting one. run_sql is SELECT-only, enforced by this regex:

const SQL_WRITE_GUARD =
  /\b(INSERT|UPDATE|DELETE|DROP|ALTER|CREATE|MERGE|TRUNCATE|GRANT|REVOKE|USE|SET)\b/i;
Enter fullscreen mode Exit fullscreen mode

Any query matching the guard is rejected before it reaches Snowflake, results are capped at 100 rows, and the Snowflake role itself is read-only as a second line of defense. The system prompt caps responses at four sentences, instructs the agent to cite "Google TIM (specific-flight model)" on every numeric answer, and forbids speculation.

An example interaction (representative, not a transcript):

User: Which department has the highest per-employee emissions, and how does it compare to the company average?

tool call: compare_to_benchmark({ dept: "SALES" })
result: { dept_avg: 4821.3, company_avg: 2940.7, delta_pct: 64.0 }

Agent: Sales leads at 4,821.3 kg CO2e per employee — 64.0% above the company average of 2,940.7 kg. Source: Google TIM (specific-flight model).

5. Why a TIM MCP server matters (forward-looking)

I also built timcp, a standalone MCP server wrapping TIM's three endpoints. For the bulk enrichment here I used the REST API directly — the whole dataset fits in two batches of 1,000, so a tool-discovery layer adds latency without buying anything. But for ad-hoc questions like "how does this flight compare to the typical emissions for this route?", exposing computeTypicalFlightEmissions as an MCP tool at agent runtime is the right shape. That's the natural next step: the agent gets a live benchmarking capability that doesn't require precomputing typical values into Snowflake.

6. Architectural decisions

  • Vercel AI SDK over Genkit — faster to ship on Next.js with native streaming; one runtime instead of two
  • Snowflake Cortex skipped for v1 — Gemini 3 Pro now runs inside Cortex (Jan 2026), which is a killer Snowflake+Gemini combo. Trial-edition permissions and the 36-hour clock argued against it; noted as future direction
  • Parallel subagent build — three workstreams (data, backend, frontend) ran concurrently in isolated git worktrees, gated by a frozen src/lib/types.ts contract and a mock fixture so backend and frontend could develop without waiting on Snowflake
  • Recharts, not Tremor charts — Tremor's chart package has a React 19 peer conflict; I kept Tremor cards and dropped to Recharts directly for the bar and stacked-area charts
  • Real flight catalog committed to the repo — 365 hand-curated tuples from public carrier schedules. It's reviewable, reproducible, and the "all emissions are TIM-specific" claim is auditable against a file anyone can read

Prize Categories

Snowflake. Business travel is roughly 1% of global CO2 and almost entirely reported as Scope 3 Category 6 — a meaningful Earth Day narrative that isn't already saturated. Technical execution: five analytical views (V_EXEC_SUMMARY, V_DEPT_EMISSIONS_MONTHLY, V_TOP_ROUTES, V_CLASS_BREAKDOWN_MONTHLY, V_ROUTE_DELTAS), key-pair auth from a serverless runtime, a PUT + COPY INTO bootstrap pipeline, and one fact table whose invariant (every row has a TIM_EMISSIONS-sourced emissions number, fallbacks dropped at enrichment time) is enforced before load. V_ROUTE_DELTAS joins the cabin-agnostic typical_emissions_kg_co2e column against the specific actual to compute the ratio the Opportunities panel ranks off of — one view change, two consumers (UI and agent) stay in sync. Creativity: the "synthetic trips, real numbers" design neutralizes the usual credibility gap of demo dashboards — the warehouse is doing real analytical work on numbers an auditor could trace back to Google.

Unfortunately, I did not get to deploy this on Snowflake by the end of the weekend: I tried to sign up for a 30-day trial but my domain kept getting rejected :(

Gemini. The agent never hallucinates a figure because it can't reach one without a tool call, and every tool result carries the TIM provenance string forward into the cited answer. The six-tool surface (five pre-baked + run_sql escape hatch) keeps the model's job narrow: pick the right tool, read the result, narrate in at most four sentences. The v2 tool get_insights({ category? }) returns the same ranked Insight[] the Opportunities panel is reading, so the agent can defer to the policy-lever framing the UI already surfaces instead of re-deriving it from raw SQL. Streaming is wired through the Vercel AI SDK's toDataStreamResponse() so tool-call chips and token-by-token output land in the drawer together. The SELECT-only regex guard on run_sql is a pattern I think is worth stealing — it's one line, it fails closed, and it converts "give an LLM a SQL console" from a scary idea into a reasonable one.

Top comments (0)