<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: BARI ANKIT VINOD </title>
    <description>The latest articles on DEV Community by BARI ANKIT VINOD  (@onlycr7).</description>
    <link>https://dev.to/onlycr7</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1246423%2F4705b57c-532b-4482-a8b9-d8af604e98b8.jpeg</url>
      <title>DEV Community: BARI ANKIT VINOD </title>
      <link>https://dev.to/onlycr7</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/onlycr7"/>
    <language>en</language>
    <item>
      <title>GuWiki: Building a Gujarati AI Wikipedia from Scratch 🇮🇳</title>
      <dc:creator>BARI ANKIT VINOD </dc:creator>
      <pubDate>Fri, 05 Jun 2026 16:31:04 +0000</pubDate>
      <link>https://dev.to/onlycr7/guwiki-building-a-gujarati-ai-wikipedia-from-scratch-12c6</link>
      <guid>https://dev.to/onlycr7/guwiki-building-a-gujarati-ai-wikipedia-from-scratch-12c6</guid>
      <description>&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;GuWiki is an AI-powered Wikipedia-style platform built specifically for the Gujarati language.&lt;/p&gt;

&lt;p&gt;Gujarati is spoken by more than &lt;strong&gt;60 million people worldwide&lt;/strong&gt;, yet high-quality AI tools, language models, and speech technologies for Gujarati remain limited compared to English and other major languages.&lt;/p&gt;

&lt;p&gt;I wanted to help close that gap.&lt;/p&gt;

&lt;p&gt;Instead of relying on existing foundation models, I built the core AI components from scratch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Gujarati Large Language Model (LLM)&lt;/li&gt;
&lt;li&gt;A Gujarati Automatic Speech Recognition (ASR) model&lt;/li&gt;
&lt;li&gt;A complete data engineering pipeline&lt;/li&gt;
&lt;li&gt;A Wikipedia-style knowledge platform powered by these models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Users can search, read, and interact with Gujarati knowledge using AI that understands the language natively.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Models
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Gujarati NanoGPT (LLM)
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/aijadugar/gujarati-nanogpt" rel="noopener noreferrer"&gt;https://huggingface.co/aijadugar/gujarati-nanogpt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A language model trained specifically on Gujarati text to understand vocabulary, grammar, and language patterns.&lt;/p&gt;

&lt;h4&gt;
  
  
  Gujarati ASR Model
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/aijadugar/gujarati-asr" rel="noopener noreferrer"&gt;https://huggingface.co/aijadugar/gujarati-asr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A speech-to-text model trained for Gujarati audio, enabling voice-based interaction and accessibility.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Live Application
&lt;/h3&gt;

&lt;p&gt;👉 &lt;a href="https://gu-wiki.vercel.app/" rel="noopener noreferrer"&gt;https://gu-wiki.vercel.app/&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Video Walkthrough
&lt;/h3&gt;

&lt;p&gt;🎥 &lt;a href="https://www.loom.com/share/b5e278e624724d8f975e95b0fc3c6297" rel="noopener noreferrer"&gt;https://www.loom.com/share/b5e278e624724d8f975e95b0fc3c6297&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;AI-powered Gujarati knowledge search&lt;/li&gt;
&lt;li&gt;Native Gujarati language understanding&lt;/li&gt;
&lt;li&gt;Speech-to-text capabilities&lt;/li&gt;
&lt;li&gt;Custom-trained Gujarati LLM&lt;/li&gt;
&lt;li&gt;Custom-trained Gujarati ASR&lt;/li&gt;
&lt;li&gt;End-to-end data engineering pipeline&lt;/li&gt;
&lt;li&gt;Fast and responsive web interface&lt;/li&gt;
&lt;li&gt;Open-source repository for community contribution&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Comeback Story
&lt;/h2&gt;

&lt;p&gt;This project started as an ambitious idea:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Can I build AI infrastructure for Gujarati instead of simply consuming models built for English?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The answer turned out to be much harder than expected.&lt;/p&gt;

&lt;p&gt;The biggest challenge wasn't building the website.&lt;/p&gt;

&lt;p&gt;It was the data.&lt;/p&gt;

&lt;p&gt;Gujarati lacks the abundance of high-quality datasets available for English. I spent significant time collecting, cleaning, validating, and preparing Gujarati text and speech data before any model training could even begin.&lt;/p&gt;

&lt;p&gt;The project went through multiple iterations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rebuilding data pipelines&lt;/li&gt;
&lt;li&gt;Cleaning noisy datasets&lt;/li&gt;
&lt;li&gt;Experimenting with tokenization strategies&lt;/li&gt;
&lt;li&gt;Training and retraining language models&lt;/li&gt;
&lt;li&gt;Improving speech recognition quality&lt;/li&gt;
&lt;li&gt;Optimizing inference performance&lt;/li&gt;
&lt;li&gt;Connecting everything into a single user experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the Finish-Up-A-Thon, I focused on turning these individual research efforts into a complete, usable product.&lt;/p&gt;

&lt;p&gt;I improved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model integration&lt;/li&gt;
&lt;li&gt;Frontend experience&lt;/li&gt;
&lt;li&gt;Deployment workflows&lt;/li&gt;
&lt;li&gt;Documentation&lt;/li&gt;
&lt;li&gt;Performance optimizations&lt;/li&gt;
&lt;li&gt;Repository structure&lt;/li&gt;
&lt;li&gt;End-to-end user experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is GuWiki: a fully working AI-powered knowledge platform designed around Gujarati language users.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Experience with GitHub Copilot
&lt;/h2&gt;

&lt;p&gt;GitHub Copilot played a significant role throughout development.&lt;/p&gt;

&lt;p&gt;While building GuWiki, I worked across multiple domains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data engineering&lt;/li&gt;
&lt;li&gt;Machine learning&lt;/li&gt;
&lt;li&gt;Deep learning&lt;/li&gt;
&lt;li&gt;Model training&lt;/li&gt;
&lt;li&gt;API development&lt;/li&gt;
&lt;li&gt;Frontend development&lt;/li&gt;
&lt;li&gt;Deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Switching contexts constantly can slow development, and Copilot helped reduce that friction.&lt;/p&gt;

&lt;p&gt;Some of the ways it helped include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generating boilerplate code&lt;/li&gt;
&lt;li&gt;Accelerating API development&lt;/li&gt;
&lt;li&gt;Creating data processing utilities&lt;/li&gt;
&lt;li&gt;Writing training scripts faster&lt;/li&gt;
&lt;li&gt;Refactoring repetitive code&lt;/li&gt;
&lt;li&gt;Generating documentation&lt;/li&gt;
&lt;li&gt;Suggesting fixes during debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I appreciated most was that Copilot allowed me to stay focused on solving the actual AI and language challenges instead of spending time on repetitive implementation details.&lt;/p&gt;

&lt;p&gt;It felt less like autocomplete and more like a development companion that helped maintain momentum throughout the project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Project Matters
&lt;/h2&gt;

&lt;p&gt;Most AI innovation today happens in a small number of major languages.&lt;/p&gt;

&lt;p&gt;Millions of people speak Gujarati every day, but the ecosystem of open-source AI tools for the language is still developing.&lt;/p&gt;

&lt;p&gt;GuWiki is my contribution toward making AI more accessible for Gujarati speakers by building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open-source language models&lt;/li&gt;
&lt;li&gt;Open-source speech models&lt;/li&gt;
&lt;li&gt;Public datasets and pipelines&lt;/li&gt;
&lt;li&gt;Real-world applications powered by those models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My goal is not only to build a product but to help strengthen the Gujarati AI ecosystem so future developers and researchers can build on top of it.&lt;/p&gt;

&lt;p&gt;If even a small part of this work helps expand access to knowledge and AI for Gujarati speakers, then the project has already succeeded.&lt;/p&gt;




&lt;h2&gt;
  
  
  Repository &amp;amp; Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/aijadugar/GuWiki" rel="noopener noreferrer"&gt;https://github.com/aijadugar/GuWiki&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Live App: &lt;a href="https://gu-wiki.vercel.app/" rel="noopener noreferrer"&gt;https://gu-wiki.vercel.app/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Demo: &lt;a href="https://www.loom.com/share/b5e278e624724d8f975e95b0fc3c6297" rel="noopener noreferrer"&gt;https://www.loom.com/share/b5e278e624724d8f975e95b0fc3c6297&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Gujarati LLM: &lt;a href="https://huggingface.co/aijadugar/gujarati-nanogpt" rel="noopener noreferrer"&gt;https://huggingface.co/aijadugar/gujarati-nanogpt&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Gujarati ASR: &lt;a href="https://huggingface.co/aijadugar/gujarati-asr" rel="noopener noreferrer"&gt;https://huggingface.co/aijadugar/gujarati-asr&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⭐ If you find the project interesting, feel free to explore the repository and share feedback.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>githubcopilot</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
