DEV Community

Cover image for How I Built an Indonesian NLP Parser That Understands Warung Owners, Then Abandoned It
semi
semi

Posted on

How I Built an Indonesian NLP Parser That Understands Warung Owners, Then Abandoned It

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

Warung MiMo is an AI-powered assistant for small warungs (Indonesian street shops). It lets shop owners manage inventory, track debts, and log sales using natural Indonesian, either by voice, text, or receipt scanning.

The project started from a simple question: what if a tiny shop owner could talk to software the same way they talk to their helper? Not with menus, not with spreadsheets, just with the way people actually speak in a warung.

I built the first version during the MiMo Orbit 100T Token Grant. The UI was there. The tech stack was solid. But the core engine, the Indonesian NLP parser, was not finished. The project sat abandoned until the GitHub Finish-Up-A-Thon Challenge gave me the push to revive it.

Tech stack: Next.js 16, React 19, shadcn/ui, Tailwind 4, TypeScript. Deployed on Vercel.

Demo

Live: https://warung-mimo.vercel.app

Source: https://github.com/iyop666/warung-mimo

Try these inputs in the assistant page:

  • "Indomie tinggal lima bungkus"
  • "Aqua habis, gula sisa setengah kg"
  • "Bu Sari utang empat puluh dua ribu"
  • "bayar utang Bu Sari"
  • "Catat utang Pak Budi 50rb, terus telur tinggal tiga"
  • "Teh Botol kosong, kopi sisa dua belas sachet"

The assistant parses each sentence, identifies products, extracts numbers (even written as Indonesian words), and generates structured actions like stock updates or debt records.

The Comeback Story

Before:
The original Warung MiMo had a working UI, a product catalog, and a basic input field. But the NLP engine was shallow. It could handle simple commands like "Indomie 5" but failed on real Indonesian sentences like "empat puluh dua ribu" or "setengah dus" or "sisa tiga". The project was stuck in that painful zone where a repo exists but the product does not feel finished enough to trust.

What I changed, fixed, and added:

  1. Indonesian Number Parser. Built from scratch with 30+ number words, compound logic ("empat puluh dua" = 42), and colloquial shortcuts ("42rb", "setengah", "seperempat").

  2. Product Matching. 8 core warung products with 30+ aliases. "Aqua" maps to "Aqua 600ml". "Mie goreng" maps to "Indomie Goreng". Longest keyword match prevents false positives.

  3. Stock Context Parsing. 5 regex patterns for the same concept: "habis", "tinggal N", "sisa N", "stok N", "kosong". Because in real Indonesian, there are at least five ways to say "I have three left".

  4. Multi-Action Splitting. One sentence can contain a debt, a stock update, and a restock order. The parser splits by commas, "dan", "terus", "lalu", "juga", then processes each segment independently.

  5. Debt Tracking. 4 regex patterns for recording debt, 4 for settling. Each handles a different way Indonesians talk about money: "bu sari utang 25 ribu", "catat utang bu sari 25000", "pak budi ngutang 15 ribu".

  6. Weekly Insights Engine. Generates contextual business suggestions based on sales data, like: "Minggu ini cuaca panas, penjualan minuman naik signifikan (+25%). Fokus restok minuman."

After:
~2,500 lines of TypeScript across 15 React components. 50+ regex patterns in the NLP engine. 30+ number word mappings. 8 products with 3-4 aliases each. Deployed and live at warung-mimo.vercel.app.

My Experience with GitHub Copilot

GitHub Copilot did not build the project for me. But it helped me move faster in three specific areas:

  1. Regex iteration. When building the debt patterns and stock context parser, I would type a comment like // match 'bayar utang bu sari' and Copilot would suggest the regex pattern. I still verified and adjusted, but it cut iteration time in half.

  2. Edge case handling. When testing "42rb" vs "empat puluh dua ribu" vs "42.000", Copilot suggested the fallback digit parser that handles all three formats.

  3. Template generation. For the weekly insights engine, I described the goal and Copilot suggested the template system that produces contextual Indonesian business suggestions.

The biggest help was reducing the friction between "I know what I want" and "I have written the code." I still had to think hard about the language logic. Copilot just made the typing part faster.

Top comments (0)