DEV Community

AlbanoSanchez
AlbanoSanchez

Posted on

CHE MCP — Building Argentina's First National MCP Ecosystem: 5-Stage Classifier, WMA Online Learning, 748 Datasets

Argentina just got its first national MCP ecosystem — and it was built from Bahía Blanca.

CHE MCP is an intelligent gateway that connects any AI agent with real-time Argentine data. Dollar exchange rates, weather, football, tax compliance (ARCA), inflation, public transit — 80+ official data sources through a SINGLE MCP server.

Why does this matter? Because right now, if you want your AI to answer "¿cuánto está el dólar blue?", you either Google it yourself or install 80 different MCP servers. CHE MCP solves that with a gateway that understands natural language in Spanish and routes queries automatically.


How It Works — 5-Stage Intelligent Gateway

Query: "dolar blue hoy"

┌────▼─────┐ Stage 1 — Keyword matching
│ Keyword │ 3,000+ keywords across 182 classified domains
└────┬─────┘

┌────▼─────┐ Stage 2 — WMA weighted routing
│ WMA │ Weighted Majority Algorithm: learns from every query
└────┬─────┘

┌────▼─────┐ Stage 3 — Semantic embeddings
│ Embedding │ 384-dim vectors (all-MiniLM-L6-v2) with Jaccard fallback
└────┬─────┘

┌────▼─────┐ Stage 4 — Data Node search
│ Data Node │ DuckDB SQL over 748 Parquet datasets + NL-to-SQL
└────┬─────┘

┌────▼─────┐ Stage 5 — LLM fallback
│ LLM │ External endpoint (optional, configurable)
└────┬─────┘

┌────▼─────┐
│ Response │ "Dólar blue: $1,245 / $1,265 compra/venta"
└──────────┘

The WMA Router — A Classifier That Learns

The Weighted Majority Algorithm (WMA) is an online learning system embedded directly in the router. Every domain starts with equal weight (1.0). When a query succeeds, the winning domain gets reinforced (+0.1). When it fails, the domain gets penalized (−0.1). Weights are bounded at [0.1, 5.0] and persisted to disk — the router starts warm and improves with every query.

Benchmark: 95.45% Top-First-Score accuracy on MCPAgentBench (66 diverse queries).

Data Node — SQL, But Natural

748 Parquet datasets from datos.gob.ar (Argentina's open data portal), compressed 9.92× with Zstd (404 MB vs 3.92 GB CSV). The Data Node converts natural language to SQL:

User: "¿Cuánto aumentó la inflación en 2024?"
→ DuckDB generates: SELECT AVG(valor) FROM indice_precios_consumidor
WHERE fecha BETWEEN '2024-01-01' AND '2024-12-31'
→ Result: 117.8% anual

SQL injection guardrails, read-only enforcement, 5-second timeout, 1,000-row result limit.

Resilience Patterns

Pattern Implementation
3-tier cache In-memory LRU (200 entries) → disk (atomic writes) → live CKAN
Circuit breaker Per-dataset, 3-failure threshold, 60s cooldown, serves stale data
Request collapsing Concurrent identical queries share a single upstream fetch
Predictive pre-fetch Top-10 hot datasets refresh every 15 minutes
Rate limiting Token bucket per API key, 100 req/min, noisy neighbor isolation

Built for the Next MCP Standard

The Model Context Protocol is undergoing its biggest architectural update in July 2026 — mandatory Streamable HTTP transport, stateless architecture. CHE MCP was architected for this from day one:

  • ✅ Streamable HTTP transport
  • ✅ MCP SDK @modelcontextprotocol v1.29.0
  • ✅ JWT + API key auth with scope validation
  • ✅ OpenTelemetry distributed tracing

Tech Stack

  • TypeScript 5.4 + Node.js 24
  • DuckDB (columnar, embeddable)
  • all-MiniLM-L6-v2 via @xenova/transformers
  • Zod validation, Vitest (280+ tests)
  • MCP SDK v1.29.0 (server.registerTool API)

Built from Bahía Blanca, Argentina 🇦🇷 with Gentle AI's SDD orchestration + Engram persistent memory.

Full technical documentation: github.com/Albano-schz/che-mcp-docs


What questions do you have about building MCP ecosystems at national scale?

Top comments (0)