Open Vernacular AI Kit

open-vernacular-ai-kit is an open-source SDK + CLI for cleaning up Indian vernacular-English code-mixed text. This release is India-first with Sarvam AI integrations, and is designed to expand globally in future updates with community-contributed language and provider adapters It is designed for messy WhatsApp-style inputs where vernacular text might appear in:

native script (example: ગુજરાતી)
Romanized vernacular text (example: Gujlish)
Mixed script in the same sentence

The goal is to normalize text before sending it to downstream models (Sarvam-M / Mayura / Sarvam-Translate), and to provide a reusable open-source foundation for vernacular AI workflows Global language/provider expansion is planned and PR-friendly.

This repo is alpha-quality but SDK-first: the public API centers on CodeMixConfig + CodeMixPipeline.

Quick example:

gck codemix "maru business plan ready chhe!!!"
# -> મારું business plan ready છે!!

What We Solve

This project is a production-oriented normalization layer for India-focused AI…

DEV Community

Building a Vernacular AI Preprocessing Layer for Indian Code-Mixed Text

SudhirGadhvi / open-vernacular-ai-kit

Clean Indian code-mixed text before it reaches your LLM.

Open Vernacular AI Kit

What We Solve

Top comments (0)