<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tushar Jaju</title>
    <description>The latest articles on DEV Community by Tushar Jaju (@tushar9802).</description>
    <link>https://dev.to/tushar9802</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3924950%2F83310de1-c7c7-4ca5-8394-2f1ec6f00afc.jpeg</url>
      <title>DEV Community: Tushar Jaju</title>
      <link>https://dev.to/tushar9802</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tushar9802"/>
    <language>en</language>
    <item>
      <title>I almost added an em-dash remover to my LLM library. Then I tested whether local models even produce em-dashes.</title>
      <dc:creator>Tushar Jaju</dc:creator>
      <pubDate>Sun, 21 Jun 2026 08:04:34 +0000</pubDate>
      <link>https://dev.to/tushar9802/i-almost-added-an-em-dash-remover-to-my-llm-library-then-i-tested-whether-local-models-even-3eln</link>
      <guid>https://dev.to/tushar9802/i-almost-added-an-em-dash-remover-to-my-llm-library-then-i-tested-whether-local-models-even-3eln</guid>
      <description>&lt;p&gt;&lt;a href="https://pypi.org/project/llmclean/" rel="noopener noreferrer"&gt;llmclean&lt;/a&gt; is a tiny zero-dependency library I maintain for cleaning the noise out of raw LLM output. v0.2.0 was a "what production traffic taught me" release — every fix came from a real break in one of my own pipelines.&lt;/p&gt;

&lt;p&gt;0.3.0 is a different kind of release. This time I had a list of features I was fairly sure I needed, sourced from what people keep complaining about and re-implementing by hand: strip the &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; reasoning blocks, kill the em-dashes and smart quotes, remove the zero-width characters, flatten the markdown for text-to-speech.&lt;/p&gt;

&lt;p&gt;Before writing any of it, I did something I should have done the first time: I checked whether the models I care about actually produce that mess. I ran eight generative prompts across five local models — Llama 3.1, Gemma 4, Qwen 2.5, DeepSeek-R1, Mistral, all 7–8B instruct — and measured what came out. Forty generations, one diagnostic pass each.&lt;/p&gt;

&lt;p&gt;Three of my assumptions were wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Local models barely produce the typography mess at all
&lt;/h2&gt;

&lt;p&gt;The em-dash thing is a real phenomenon. It got loud enough that OpenAI shipped a setting to suppress em-dashes in ChatGPT. There are standalone libraries that do nothing but replace fancy punctuation with ASCII. So I assumed I'd see plenty of it.&lt;/p&gt;

&lt;p&gt;Across 40 generations from local models, I saw &lt;strong&gt;zero&lt;/strong&gt; smart quotes, zero ellipsis characters, zero non-breaking spaces, zero ligatures, zero zero-width characters. I even wrote a prompt that explicitly asked the model to quote someone saying "hello", use a dash for emphasis, and trail off with an ellipsis. The models gave me straight quotes and three literal dots: &lt;code&gt;...&lt;/code&gt;, not &lt;code&gt;…&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The typography mess that everyone writes cleanup code for is, as far as I can measure, a frontier cloud-model trait. ChatGPT, Claude, and Gemini emit it. A 7B instruct model running on your laptop mostly doesn't.&lt;/p&gt;

&lt;p&gt;That doesn't mean the feature is useless — people paste cloud output into pipelines constantly, and that's exactly where this stuff lands. But it changed how I built and tested it. &lt;code&gt;normalize_typography&lt;/code&gt; and &lt;code&gt;strip_invisibles&lt;/code&gt; are scoped, in their docstrings and tests, as tools for &lt;em&gt;pasted cloud output&lt;/em&gt;, and they're tested against synthetic fixtures shaped like ChatGPT output — not against my local models, because my local models can't produce the inputs.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The fullwidth-punctuation idea was backwards
&lt;/h2&gt;

&lt;p&gt;I had a note to myself that Qwen, with its Chinese-heavy training, would emit fullwidth punctuation — &lt;code&gt;，：；（）&lt;/code&gt; — and that I'd need to normalize it inside JSON. A whole class of cleanup the library didn't cover.&lt;/p&gt;

&lt;p&gt;When I actually prompted Qwen (and the others) with Chinese text, here's what happened: the Chinese &lt;em&gt;content&lt;/em&gt; came through fine, sitting inside JSON string values with completely normal ASCII &lt;code&gt;:&lt;/code&gt; and &lt;code&gt;"&lt;/code&gt; structure. Fullwidth punctuation showed up only when I asked for Chinese &lt;em&gt;prose&lt;/em&gt; — &lt;code&gt;北京是中国的首都，拥有丰富的历史文化遗产&lt;/code&gt; — where the &lt;code&gt;，&lt;/code&gt; and &lt;code&gt;。&lt;/code&gt; are correct, not noise.&lt;/p&gt;

&lt;p&gt;So fullwidth normalization isn't a JSON-repair problem at all. It's a prose-normalization problem, and a niche one. It ended up as an opt-in, off-by-default category on &lt;code&gt;normalize_typography&lt;/code&gt;, not a JSON strategy. The synthetic case where fullwidth punctuation breaks JSON parsing? &lt;code&gt;enforce_json&lt;/code&gt; does have that gap — but no model I have actually emits it, so I didn't build for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. On Ollama, the &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; tags never leak
&lt;/h2&gt;

&lt;p&gt;This was the one that nearly sent me building against the wrong input. DeepSeek-R1 is a reasoning model; it thinks in a &lt;code&gt;&amp;lt;think&amp;gt;...&amp;lt;/think&amp;gt;&lt;/code&gt; block before answering. The obvious cleanup is to strip that block.&lt;/p&gt;

&lt;p&gt;Except when I ran DeepSeek-R1 through Ollama and looked at the response, there were no tags. The reasoning was just gone from the text. Ollama (current versions) parses the &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; block server-side and hands it back in a &lt;em&gt;separate&lt;/em&gt; &lt;code&gt;thinking&lt;/code&gt; field — on both the native API and the OpenAI-compatible one. A consumer using Ollama never sees the tags inline, so &lt;code&gt;strip_reasoning_trace&lt;/code&gt; would be a no-op for them.&lt;/p&gt;

&lt;p&gt;The tags are real, though. They leak on llama.cpp directly, on vLLM unless you pass &lt;code&gt;--reasoning-parser&lt;/code&gt;, on raw &lt;code&gt;transformers&lt;/code&gt;, on LM Studio, and on most hosted aggregators. So I validated the stripper a different way: I captured a genuine DeepSeek-R1 reasoning trace out of Ollama's &lt;code&gt;thinking&lt;/code&gt; field, re-wrapped it in the inline &lt;code&gt;&amp;lt;think&amp;gt;...&amp;lt;/think&amp;gt;&lt;/code&gt; format those other backends emit, and confirmed the function recovers the answer exactly — including the DeepSeek quirk where the opening tag lives in the chat template and only a trailing &lt;code&gt;&amp;lt;/think&amp;gt;&lt;/code&gt; comes back.&lt;/p&gt;

&lt;h2&gt;
  
  
  What shipped
&lt;/h2&gt;

&lt;p&gt;Five new functions, all pure standard library, all scoped by what the experiment showed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llmclean&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;strip_reasoning_trace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strip_preamble&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llmclean&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;strip_invisibles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize_typography&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strip_markdown&lt;/span&gt;

&lt;span class="nf"&gt;strip_reasoning_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;think&amp;gt;let me work it out&amp;lt;/think&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Paris.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# → 'Paris.'
&lt;/span&gt;&lt;span class="nf"&gt;strip_preamble&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sure! Here is the answer: 42&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                       &lt;span class="c1"&gt;# → '42'
&lt;/span&gt;&lt;span class="nf"&gt;strip_invisibles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello﻿&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                               &lt;span class="c1"&gt;# → 'hello'
&lt;/span&gt;&lt;span class="nf"&gt;normalize_typography&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;“It’s fine”—really…&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# → '"It\'s fine"-really...'
&lt;/span&gt;&lt;span class="nf"&gt;strip_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;# Title&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;- **bold** point&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                        &lt;span class="c1"&gt;# → 'Title\n\nbold point'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;strip_markdown&lt;/code&gt; and the fence handling are validated against real local captures, because markdown is the one thing every model emits constantly — it showed up on every "explain this with headers and bullets", every code answer, every table.&lt;/p&gt;

&lt;p&gt;There's also a correctness fix in this release that has nothing to do with the sweep. The old Python-literal repair in &lt;code&gt;enforce_json&lt;/code&gt; did a blind find-and-replace of &lt;code&gt;True&lt;/code&gt;/&lt;code&gt;False&lt;/code&gt;/&lt;code&gt;None&lt;/code&gt;, which meant &lt;code&gt;{"note": "set the flag to True"}&lt;/code&gt; came out as &lt;code&gt;{"note": "set the flag to true"}&lt;/code&gt; — it corrupted the words inside string values, and inside string keys too. A regex can't tell a bare &lt;code&gt;True&lt;/code&gt; token from the letters &lt;code&gt;True&lt;/code&gt; inside a quote. The fix is a single pass that tracks whether it's inside a string and only rewrites literals outside one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual lesson
&lt;/h2&gt;

&lt;p&gt;I write cleanup code for a living, more or less, and I still almost built three features against an imagined version of model output instead of the real one. The sweep took an afternoon. It killed one feature's premise, demoted another to opt-in, and re-scoped a third — and it left me able to document when each function actually helps, instead of implying it helps everywhere.&lt;/p&gt;

&lt;p&gt;If you're post-processing LLM output, it's worth running the cheap experiment: a handful of prompts across the models you actually deploy, and a look at what literally comes out. The mess you're cleaning may not be the mess you think it is.&lt;/p&gt;

&lt;p&gt;llmclean 0.3.0 is on &lt;a href="https://pypi.org/project/llmclean/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt; and &lt;a href="https://github.com/Tushar-9802/llmclean" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Eight functions, zero dependencies, still fits in your head.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building Sakhi: Hindi Voice-to-Form for India's ASHA Workers, Solo in Six Weeks</title>
      <dc:creator>Tushar Jaju</dc:creator>
      <pubDate>Tue, 19 May 2026 14:27:01 +0000</pubDate>
      <link>https://dev.to/tushar9802/building-sakhi-hindi-voice-to-form-for-indias-asha-workers-solo-in-six-weeks-2685</link>
      <guid>https://dev.to/tushar9802/building-sakhi-hindi-voice-to-form-for-indias-asha-workers-solo-in-six-weeks-2685</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — Six-week solo build of a Hindi voice-to-form pipeline for India's ~1 million community health workers. Two deployment modes: a workstation path with Whisper + Gemma 4 E4B on Ollama, and a fully offline on-device path running Gemma 4 E2B INT4 on the Cactus SDK on Android. Submitted to Kaggle's Gemma 4 Good Hackathon. Source on GitHub, fine-tune on Ollama.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;India's 1 million Accredited Social Health Activists (ASHAs) handle the last clinical mile for maternal and child health. They conduct 50+ million home visits a year — vitals, symptoms, counselling, danger-sign assessment. Every visit still ends with a paper form filled from memory and physically carried to the Primary Health Center on the next clinic day.&lt;/p&gt;

&lt;p&gt;Danger signs that &lt;em&gt;were&lt;/em&gt; observed — preeclampsia, postpartum hemorrhage, neonatal distress — sometimes never reach the clinical system in time for intervention.&lt;/p&gt;

&lt;p&gt;Two compounding constraints make this hard to fix with conventional tooling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hindi voice, often in regional dialects.&lt;/strong&gt; Cloud STT is unreliable on rural-clinical Hindi (published benchmarks: 27–70%+ WER, deletion-dominant — numbers and symptoms silently drop).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connectivity is intermittent.&lt;/strong&gt; Airplane-mode operation cannot be a fallback. It must be the default.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;Two deployment modes for how ASHAs actually work — a workstation in the health center, and the phone in the field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Workstation path (PHC, GPU):
[Hindi Audio] → Whisper-Large CT2 → Hindi Normalization → Gemma 4 E4B (function calling)
                                                            ├── extract_form()
                                                            ├── flag_danger_sign()
                                                            └── issue_referral()

On-device path (Android, no network):
[Hindi Text] → Hindi Normalization → Visit-type detect → Gemma 4 E2B INT4 on Cactus
                                                          ├── extract_form
                                                          └── detect_danger
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Workstation mode handles voice: a phone uploads audio to a shared PC at the sub-centre, Whisper-Large-V2 Hindi via CTranslate2 transcribes, Gemma 4 E4B Q4_K_M on Ollama extracts the structured form with native function calling. End-to-end &lt;strong&gt;15–25 seconds&lt;/strong&gt; on an RTX 5070 Ti.&lt;/p&gt;

&lt;p&gt;Field mode runs the full pipeline (normalize → detect visit type → extract form → flag danger signs) entirely on-device. End-to-end &lt;strong&gt;320.7s&lt;/strong&gt; on a OnePlus 11R (Snapdragon 8+ Gen 1), zero network. The on-device LLM does Hindi text → form; voice routes to the workstation when WiFi returns (more on why below).&lt;/p&gt;

&lt;h2&gt;
  
  
  The hardest engineering call: leaving on-device voice OUT
&lt;/h2&gt;

&lt;p&gt;I wanted on-device voice-to-form. A phone, no laptop, no network — that's the cleanest pitch. I pulled it from the build instead.&lt;/p&gt;

&lt;p&gt;Cactus SDK ships multilingual Whisper INT4 for transcription — no Hindi-specific checkpoint. The published numbers are bad:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;27% WER best-case on rural Hindi&lt;/li&gt;
&lt;li&gt;70%+ on clinical content&lt;/li&gt;
&lt;li&gt;Error profile is &lt;strong&gt;deletion-dominant&lt;/strong&gt; — numbers and symptoms silently drop while filler words survive&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A missed BP reading is a missed referral. A demo where Sakhi says "BP normal" because the actual &lt;code&gt;155/100&lt;/code&gt; was deleted during transcription is exactly the failure mode an ASHA cannot catch in the field.&lt;/p&gt;

&lt;p&gt;So voice routes to the workstation where Whisper-Large-V2 Hindi runs. The on-device LLM handles Hindi text → form for the case where an ASHA types a quick note offline. Field mode also captures raw audio offline and syncs to the workstation when WiFi returns.&lt;/p&gt;

&lt;p&gt;This was the most uncomfortable call of the build. The submission video shows raw on-device JSON output from text input instead of faking voice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anti-hallucination: model extracts, Python decides
&lt;/h2&gt;

&lt;p&gt;The hardest problem isn't getting Gemma to talk about a transcript. It's getting it to stop &lt;em&gt;inventing&lt;/em&gt;. Early prototypes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hallucinated patient names from generic forms of address (&lt;code&gt;दीदी&lt;/code&gt; / &lt;code&gt;बहन&lt;/code&gt; — Hindi for "elder sister" / "sister", used informally for any woman regardless of relation).&lt;/li&gt;
&lt;li&gt;Invented BP readings on routine visits that never mentioned vitals.&lt;/li&gt;
&lt;li&gt;Turned counselling utterances ("eat iron-rich food, drink plenty of water") into "danger signs."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern that stuck: &lt;strong&gt;Gemma proposes evidence; Python decides what counts.&lt;/strong&gt; The LLM extracts only what was &lt;em&gt;said&lt;/em&gt; — verbatim utterances, structured under the schema. Validation, range-checks, deduplication, blocklist filtering: none of that runs inside the prompt. It runs in code, against the transcript, after extraction.&lt;/p&gt;

&lt;p&gt;Six layers of validation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Evidence length filter&lt;/strong&gt; — danger signs with under 10-character evidence are dropped.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generic ASHA phrase blocklist&lt;/strong&gt; — boilerplate (&lt;code&gt;कोई तकलीफ़ हो तो फ़ोन कर दीजिए&lt;/code&gt; / "call me if there's any problem") filtered.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normal-value filter&lt;/strong&gt; — signs citing benign values (&lt;code&gt;110/70&lt;/code&gt;, &lt;code&gt;बिल्कुल ठीक&lt;/code&gt; / "totally fine", &lt;code&gt;सामान्य&lt;/code&gt; / "normal") stripped.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transcript grounding&lt;/strong&gt; — evidence must appear verbatim in the transcript.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deduplication&lt;/strong&gt; across overlapping danger signs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Form validation&lt;/strong&gt; — strips invented patient names (दीदी/बहन patterns), default ages, phantom lab results; range checks on BP (60–250 / 30–150), Hb (3–20), weight (1–200), gestational weeks (1–45).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;False-alarm rate on routine visits: &lt;strong&gt;0&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demographics never go through the LLM
&lt;/h2&gt;

&lt;p&gt;Early prototypes asked Gemma to extract patient name, age, and household composition from the audio. It hallucinated names from &lt;code&gt;दीदी&lt;/code&gt; and &lt;code&gt;बहन&lt;/code&gt;, defaulted ages on under-specified utterances, invented household members.&lt;/p&gt;

&lt;p&gt;The fix wasn't prompt-tuning. It was structural: demographics enter as a typed header — the way every clinical EMR works. The LLM never sees the question. It only extracts what was &lt;em&gt;said&lt;/em&gt; during the visit.&lt;/p&gt;

&lt;p&gt;This pattern generalizes — any LLM-based structured extraction where the field is known-and-typed should not be in the prompt at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Blackwell + Windows + Unsloth dead end
&lt;/h2&gt;

&lt;p&gt;Unsloth's bundled &lt;code&gt;save_pretrained_gguf&lt;/code&gt; mmap-fails on Blackwell + Windows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RuntimeError: unable to mmap ... [WinError 8] Not enough memory resources
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;WSL was out (CUDA passthrough for Whisper was already finicky in this setup). Linux dual-boot would have eaten two days I didn't have.&lt;/p&gt;

&lt;p&gt;I wrote &lt;code&gt;scripts/export_merge.py&lt;/code&gt; — manual LoRA-into-base delta-merge in PyTorch — then handed the merged FP16 model to &lt;code&gt;llama.cpp/convert_hf_to_gguf.py&lt;/code&gt; + &lt;code&gt;llama-quantize Q4_K_M&lt;/code&gt;. The fine-tune ships on the Ollama registry through that workaround:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull tusharbrisingr9802/sakhi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A/B vs base on the eval rubric: &lt;strong&gt;14/15 fine-tune vs 15/15 base&lt;/strong&gt;. Base is the production path. The fine-tune is published for deployments that prefer English schema-label normalization (&lt;code&gt;दस्त&lt;/code&gt; → &lt;code&gt;Diarrhea&lt;/code&gt;, &lt;code&gt;चक्कर&lt;/code&gt; → &lt;code&gt;dizziness&lt;/code&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Reproduce it locally
&lt;/h2&gt;

&lt;p&gt;The workstation stack is the primary path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Tushar-9802/Sakhi
&lt;span class="nb"&gt;cd &lt;/span&gt;Sakhi
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements-runtime.txt
ollama pull gemma4:e4b-it-q4_K_M
&lt;span class="nb"&gt;cd &lt;/span&gt;frontend &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm run build &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; ..
python api.py
&lt;span class="c"&gt;# Browser: http://localhost:8000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Requires ~10 GB VRAM (E4B Q4_K_M is roughly 9 GB resident). Verifies function calling, normalization, the 6-layer validation, and schema correctness end-to-end. Voice-to-form, text-to-form, and queue-and-sync all run on this stack.&lt;/p&gt;

&lt;p&gt;For the on-device Android path see the GitHub Release — prebuilt APK plus in-app SAF zip-import of the Cactus model. Cactus's &lt;code&gt;gemma-4-E2B-it&lt;/code&gt; INT4 build is gated on HuggingFace, so it isn't redistributed; the import flow keeps the no-adb path open for reviewers.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's not in this submission
&lt;/h2&gt;

&lt;p&gt;Full root-cause walkthroughs live in &lt;code&gt;FAILURES.md&lt;/code&gt; in the repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No on-device voice&lt;/strong&gt; — covered above. On-device LLM does Hindi text → form; voice routes to the workstation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No real ASHA endorsement.&lt;/strong&gt; Outreach didn't land inside the deadline. Real-voice testing came from family help in Bareilly — Hindi-native readers on a real phone mic, three of four role-play scripts. Not a corpus.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthetic training data.&lt;/strong&gt; 1,154 fine-tune examples and the 15-case automated eval are LLM-generated Hindi with gTTS audio.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regional dialect coverage.&lt;/strong&gt; Tested on standard Hindi from Bareilly + role-play scripts. Bhojpuri, Awadhi, Magahi, code-switched Marwari/Bhili are not validated.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Partner with an ASHA training institute to collect 100+ hours of real ASHA home-visit audio under field conditions.&lt;/li&gt;
&lt;li&gt;Fine-tune an IndicWhisper variant on that real audio for the on-device voice-in path that is not in this submission.&lt;/li&gt;
&lt;li&gt;Harden integration with the official MCTS API so forms post directly into the NHM system instead of being exported as JSON/CSV.&lt;/li&gt;
&lt;li&gt;Pilot with 10–20 ASHA workers in one rural block with before/after time-and-accuracy measurement.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;3-min demo video&lt;/strong&gt; — &lt;a href="https://youtu.be/n-u7J1lljUg" rel="noopener noreferrer"&gt;https://youtu.be/n-u7J1lljUg&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub repository&lt;/strong&gt; — &lt;a href="https://github.com/Tushar-9802/Sakhi" rel="noopener noreferrer"&gt;https://github.com/Tushar-9802/Sakhi&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama fine-tune&lt;/strong&gt; — &lt;code&gt;ollama pull tusharbrisingr9802/sakhi&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kaggle writeup&lt;/strong&gt; — &lt;a href="https://www.kaggle.com/competitions/gemma-4-good-hackathon/writeups/sakhi-voice-to-form-for-asha-workers" rel="noopener noreferrer"&gt;https://www.kaggle.com/competitions/gemma-4-good-hackathon/writeups/sakhi-voice-to-form-for-asha-workers&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If any of the patterns above are useful in your own LLM extraction pipelines — the model-extracts/Python-decides separation, demographics-as-typed-header, or the Whisper-INT4-WER receipts argument for not shipping fake on-device voice — drop a note in the comments. I'm &lt;a href="https://github.com/Tushar-9802" rel="noopener noreferrer"&gt;@Tushar-9802&lt;/a&gt; on GitHub.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>healthtech</category>
      <category>hindi</category>
    </item>
    <item>
      <title>I kept rewriting the same regex passes against LLM output. So I made a library.</title>
      <dc:creator>Tushar Jaju</dc:creator>
      <pubDate>Mon, 11 May 2026 12:28:29 +0000</pubDate>
      <link>https://dev.to/tushar9802/i-kept-rewriting-the-same-regex-passes-against-llm-output-so-i-made-a-library-539</link>
      <guid>https://dev.to/tushar9802/i-kept-rewriting-the-same-regex-passes-against-llm-output-so-i-made-a-library-539</guid>
      <description>&lt;p&gt;I've been working on a few LLM-based projects over the last year. &lt;a href="https://github.com/Tushar-9802/Sakhi" rel="noopener noreferrer"&gt;Sakhi&lt;/a&gt;, a Hindi voice-to-form pipeline for community health workers in India. A &lt;a href="https://github.com/Tushar-9802/Resume-parser" rel="noopener noreferrer"&gt;resume parser&lt;/a&gt; for engineering candidates. A couple of smaller things. Different domains, different models, different prompts.&lt;/p&gt;

&lt;p&gt;But there's a pattern: at the bottom of every pipeline, right before the model's output became "data we trust," I'd find the same kind of code.&lt;/p&gt;

&lt;p&gt;Strip markdown fences. Repair half-broken JSON. Trim runaway repetitions. Normalize Python &lt;code&gt;True&lt;/code&gt;/&lt;code&gt;False&lt;/code&gt;/&lt;code&gt;None&lt;/code&gt; to JSON booleans. Cut off the trailing "I hope this helps!" the model added after the actual answer.&lt;/p&gt;

&lt;p&gt;Every project had its own ad-hoc version of these. Slightly different regex, slightly different edge cases. The third time I copy-pasted a "strip &lt;code&gt;&lt;/code&gt;&lt;code&gt;json` ... `&lt;/code&gt;&lt;code&gt;&lt;/code&gt;" cleaner across projects, I gave up and made it a library.&lt;/p&gt;

&lt;p&gt;That's &lt;code&gt;llmclean&lt;/code&gt;. Zero dependencies, pure standard library, three small utilities. v0.1.0 was on PyPI a couple of months ago. v0.2.0 just shipped, and it's the one I want to talk about — because what changed in this release is the part that makes the case for a separate library at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  What v0.1.0 did
&lt;/h2&gt;

&lt;p&gt;Three functions, total. That's the entire public API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llmclean&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;strip_fences&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;enforce_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trim_repetition&lt;/span&gt;

&lt;span class="nf"&gt;strip_fences&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;```

json&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;

```&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → '{"name": "Alice"}'
&lt;/span&gt;
&lt;span class="nf"&gt;enforce_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Here you go: {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: True, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [1,2,3,]}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → '{\n  "ok": true,\n  "items": [1, 2, 3]\n}'
&lt;/span&gt;
&lt;span class="nf"&gt;trim_repetition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The answer is 42. This is final. This is final.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → 'The answer is 42. This is final.'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each function returns the original input on failure (never raises), so it composes safely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;enforce_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;trim_repetition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;strip_fences&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_output&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stuck it on PyPI in March, copy-pasted the usage into Sakhi and the resume parser, moved on. Standard "I wrote a thing, hope it doesn't bite me" energy.&lt;/p&gt;

&lt;h2&gt;
  
  
  What production traffic taught me
&lt;/h2&gt;

&lt;p&gt;Then I went back to those two projects and kept building. And the library quietly broke in three different ways across the next two months, each one from real data I was feeding into it. Every one of those breaks became a v0.2.0 fix.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. CRLF on Windows silently inverted fence detection
&lt;/h3&gt;

&lt;p&gt;Output from Ollama running on my Windows machine came back with &lt;code&gt;\r\n&lt;/code&gt; line endings. The fence regex used &lt;code&gt;[ \t]*$&lt;/code&gt; as the trailing anchor. In Python's &lt;code&gt;re.MULTILINE&lt;/code&gt; mode, &lt;code&gt;$&lt;/code&gt; matches the position immediately before &lt;code&gt;\n&lt;/code&gt; — not before &lt;code&gt;\r\n&lt;/code&gt;. So the &lt;code&gt;\r&lt;/code&gt; sat between my whitespace class and the newline, and the regex silently failed to match the fence line.&lt;/p&gt;

&lt;p&gt;The nasty part: it failed in an &lt;em&gt;inverted&lt;/em&gt; way. The closing fence line (with no &lt;code&gt;\r\n&lt;/code&gt; after it) still matched the regex, so the function read it as an &lt;em&gt;unclosed opening fence&lt;/em&gt; and stripped it. Meanwhile the actual opening line survived as content. Output looked like garbled JSON wrapped in a leftover code fence.&lt;/p&gt;

&lt;p&gt;Fix: &lt;code&gt;[ \t]*\r?$&lt;/code&gt;. Three regexes, one character each. &lt;/p&gt;

&lt;h3&gt;
  
  
  2. BOM at position 0 broke &lt;code&gt;json.loads&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Some Windows file-IO round-trips and LLM client SDKs prepend a Byte Order Mark (&lt;code&gt;U+FEFF&lt;/code&gt;). Sakhi started hitting this when Whisper transcripts went through Windows file IO and emerged with a BOM at position 0. &lt;code&gt;json.loads&lt;/code&gt; sees an unexpected character at position 0 and bails immediately — before any of llmclean's strategy pipeline got a chance to fix anything.&lt;/p&gt;

&lt;p&gt;Fix: &lt;code&gt;lstrip("﻿")&lt;/code&gt; at the entry point of both &lt;code&gt;strip_fences&lt;/code&gt; and &lt;code&gt;enforce_json&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Doubled-quote overruns when escape sequences leak
&lt;/h3&gt;

&lt;p&gt;Occasionally I'd see model output like &lt;code&gt;{"key": ""value""}&lt;/code&gt;. Doubled quotes on both sides of a string, usually because an upstream stage involved Python triple-quoted f-strings, or an escape got applied twice somewhere.&lt;/p&gt;

&lt;p&gt;Sakhi's own pipeline has three regexes for this kind of overrun, but two of them have an edge case: they can corrupt legitimate empty-string values (&lt;code&gt;{"k": ""}&lt;/code&gt;) because the regex can't tell "overrun" from "intentional empty" without parser-level context. So in llmclean I only included the safe one — the form that &lt;em&gt;requires&lt;/em&gt; non-empty content between the doubled quotes. That handles the common case (&lt;code&gt;""text""&lt;/code&gt; → &lt;code&gt;"text"&lt;/code&gt;) and never touches legitimate empties.&lt;/p&gt;

&lt;p&gt;This kind of careful subtraction is the part I'm most happy about. It's less code than Sakhi has, but more correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of the thing
&lt;/h2&gt;

&lt;p&gt;llmclean lives in a small gap between bigger tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For schema validation: use &lt;code&gt;jsonschema&lt;/code&gt; or &lt;code&gt;pydantic&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For re-prompting the model when output is bad: use &lt;code&gt;instructor&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For constraining the model at generation time so it can't produce broken output: use &lt;code&gt;outlines&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;llmclean is the post-hoc cleanup pass. The thing you run &lt;em&gt;after&lt;/em&gt; the model has emitted text and &lt;em&gt;before&lt;/em&gt; you try to parse it. It composes with all of the above — it's not competing with them.&lt;/p&gt;

&lt;p&gt;What I'm trying to keep true to while iterating:&lt;/p&gt;

&lt;p&gt;Functions never raise. Every public function returns the original input on failure, so it composes safely in pipelines that can't afford an exception path.&lt;/p&gt;

&lt;p&gt;Zero runtime dependencies. The standard library is enough for what this needs to do, and pulling in a dependency would force every downstream user to deal with version conflicts they didn't sign up for.&lt;/p&gt;

&lt;p&gt;Predictable behaviour. Same input, same output. No external state, no model calls, no fuzzy heuristics that change semantics silently between versions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it, tell me where it breaks
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;llmclean
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What I'd find genuinely useful:&lt;/p&gt;

&lt;p&gt;If you try it on output from a model I haven't tested against and it fails, file an issue with the raw input. Real failure cases are what improvements come from — every fix in v0.2.0 came from one.&lt;/p&gt;

&lt;p&gt;If your project has its own LLM-output cleanup logic, I'd love to know what your edge cases are. The whole library exists because three of my projects had different ad-hoc versions of the same thing. There's probably a fourth and fifth class of failure I haven't seen.&lt;/p&gt;

&lt;p&gt;If you've solved this with &lt;code&gt;instructor&lt;/code&gt; or &lt;code&gt;guardrails&lt;/code&gt; or some other tool and want to argue I should have just used that — also welcome. Comparative honesty is more useful than marketing.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Tushar-9802/llmclean" rel="noopener noreferrer"&gt;Tushar-9802/llmclean&lt;/a&gt;&lt;br&gt;
PyPI: &lt;a href="https://pypi.org/project/llmclean/" rel="noopener noreferrer"&gt;llmclean on PyPI&lt;/a&gt;&lt;br&gt;
Changelog: &lt;a href="https://github.com/Tushar-9802/llmclean/blob/main/CHANGELOG.md" rel="noopener noreferrer"&gt;CHANGELOG.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next version probably picks up a few more patterns I noted while inspecting MedScribe (a SOAP-note extraction project of mine): prompt-leakage stripping when the model echoes back parts of its own prompt, and section-level repetition truncation. Those are in the queue, currently driven by the same process — find them in real work first, port to the library second.&lt;/p&gt;

&lt;p&gt;If you've got a use case where llmclean would help, or one where it's already broken on you, the issue tracker is open.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
