DEV Community

Cover image for Khasibert: A Region-First Language Model for Khasi NLP
B Nyalang
B Nyalang

Posted on

Khasibert: A Region-First Language Model for Khasi NLP

Most language models overlook low-resource languages. Khasibert is built to change that—it's the first open-source Khasi language model designed for translation, summarization, and civic NLP tasks in Northeast India.


What Is Khasibert?

  • A compact transformer-based LLM trained on Khasi-language corpora
  • Optimized for low-resource deployment and real-world usability
  • Built by MWire Labs to support inclusive, culturally aware AI.

Why It Matters

  • Khasi is spoken by over a million people, yet underrepresented in mainstream NLP
  • Khasibert enables language technology research, civic applications, and education tools
  • It’s part of a broader mission to democratize AI for Northeast India.

What’s Under the Hood

  • Pretrained on cleaned, deduplicated Khasi text
  • Fine-tuned for translation, summarization, and semantic understanding
  • Benchmarked for responsiveness in resource-constrained environments

Top comments (0)