When people talk about AI benchmarks, they often talk about numbers. Scores, leaderboards, rankings. However “AI performance” has been defined in English. The datasets, the tasks, even the expectations of what a “good answer” looks like, all shaped far away from the cultural, linguistic, and economic realities of our region. Here in Latin America, the story we’re trying to write isn’t about who’s first or best, it’s about what matters.
So we built LatamBoard: a community-driven initiative to create transparent, task-oriented evaluation standards for Spanish and Portuguese use cases, and, more importantly, for the kind of work and problems that actually moves Latin America forward.
🌎 Why this matters
Latin America has incredible AI talent. Researchers, startups, and practitioners across the region are doing brilliant work, but when it comes to measuring success, we’re stuck with tools that don’t understand our context.
An English benchmark can’t tell you if your transcription model understands a regional accent. It won’t know that “factura” could mean an invoice or a pastry, or that “boleto” is ticket in spanish but a payment method in Brazil.
If we want AI to transform productivity in Latin America, in agriculture, health, education, logistics, and public services, we need benchmarks that measure the right things.
Not just linguistic correctness, but task completion.
Not just fluency, but understanding.
Not just performance, but impact.
🤝 Built by a community
LatamBoard is a community effort. We’re researchers, product managers, engineers, and enthusiasts who believe that evaluation is the missing layer between AI hype and real-world progress.
We organize our work in phases:
Phase 1 (Live): Language understanding baselines for Spanish and Portuguese.
Phase 2 (Current): Real-world task evaluation, translation, transcription, summarization, structured extraction.
Phase 3: Community-contributed tasks and datasets, with feedback formats that can actually train better models.
We’re building open infrastructure, open datasets, and open standards. Our tools, like Benchy, are already open source. Anyone can contribute new tasks, datasets, or evaluation methods.
🚀 How to join
If you work in AI in Latin America, this is your invitation.
🎓 Universities & Researchers: Partner with us to develop new benchmarks and contribute specialized datasets. Help establish academic standards for LATAM AI evaluation.
🏢 Companies: Share your use cases and evaluation needs. Help us understand what performance metrics matter most for regional applications.
💻 Developers: Contribute to our open-source evaluation infrastructure. Submit your models and help improve our methodologies.
📊 Data Scientists: Help us build comprehensive evaluation datasets that reflect the diversity of Latin American markets.
We deserve AI solutions optimized for our specific needs. By building transparent, community-driven evaluation standards, we're helping LATAM developers, researchers, and companies make better decisions about which AI tools actually work for their use cases.
The goal isn't just better benchmarks, it's helping establish Latin America as a place where serious AI evaluation and innovation happens.
Interested in contributing?
Top comments (0)