<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Christopher Pribadi</title>
    <description>The latest articles on DEV Community by Christopher Pribadi (@christopher_pribadi_344bc).</description>
    <link>https://dev.to/christopher_pribadi_344bc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3934206%2Fca1fe152-e472-4156-afce-8e9156c72c8a.jpg</url>
      <title>DEV Community: Christopher Pribadi</title>
      <link>https://dev.to/christopher_pribadi_344bc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/christopher_pribadi_344bc"/>
    <language>en</language>
    <item>
      <title>The Hidden Cost of Bad Meeting Transcription (It's Not What You Think)</title>
      <dc:creator>Christopher Pribadi</dc:creator>
      <pubDate>Tue, 19 May 2026 04:42:18 +0000</pubDate>
      <link>https://dev.to/christopher_pribadi_344bc/the-hidden-cost-of-bad-meeting-transcription-its-not-what-you-think-4gj7</link>
      <guid>https://dev.to/christopher_pribadi_344bc/the-hidden-cost-of-bad-meeting-transcription-its-not-what-you-think-4gj7</guid>
      <description>&lt;p&gt;&lt;strong&gt;You probably think the cost of bad meeting transcription is time spent fixing transcripts.&lt;/strong&gt;&lt;br&gt;
It's not.&lt;br&gt;
That's the visible cost. The real cost is what you're not seeing.&lt;br&gt;
The Visible Cost&lt;/p&gt;

&lt;p&gt;You record a 60-minute meeting. The transcription accuracy is 60-70%. Someone spends 20-30 minutes fixing it.&lt;br&gt;
For a team of 20: 8+ hours/day of cleanup = $103k/year in wasted time.&lt;/p&gt;

&lt;p&gt;That's measurable. That's what CFOs see.&lt;br&gt;
But it's the smallest cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Hidden Costs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Knowledge Loss ($250k/year): When transcripts are useless, people stop using them. They use personal notes instead. Now three people have three different records of what was said. Rework, slow onboarding, missed decisions.&lt;/p&gt;

&lt;p&gt;Decision Latency ($150k/year): By the time someone reads bad notes, they've already made decisions based on what they remember, not what was said. Clarification calls, deal delays, lost revenue.&lt;/p&gt;

&lt;p&gt;Compliance Risk ($50k/year amortized): Bad transcription = unreliable records. If you ever need to prove what was discussed, you can't. PDPA violations in SE Asia = fines + legal fees.&lt;/p&gt;

&lt;p&gt;Team Friction ($100k/year): "I thought we agreed to X" / "No, we said Y" / "Check the notes!" / "The notes don't make sense." Bad transcription becomes a source of conflict.&lt;/p&gt;

&lt;p&gt;Scaling Friction ($200k/year): New hires can't learn from written records. Everything requires 1-on-1 explanation. You can't scale beyond founder involvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;The Math&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
For a 50-person regional team:&lt;br&gt;
CostAnnualFixing transcripts$103,750Knowledge loss$250,000Decision latency$150,000Compliance risk$50,000Team friction$100,000Scaling friction$200,000TOTAL$853,750&lt;br&gt;
That's $17k per employee per year.&lt;br&gt;
A good transcription tool costs $200-300/person/year.&lt;br&gt;
ROI: 34x-85x.&lt;br&gt;
If you're using a tool that doesn't work for your team's language patterns, you're not saving money. You're burning it.&lt;/p&gt;

&lt;p&gt;Thats why we created a multilingual meeting notetaker :&lt;br&gt;
 &lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="www.bysik.app" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;www.bysik.app&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;a href="//www.bysik.app"&gt;&lt;/a&gt;

</description>
    </item>
    <item>
      <title>How Code-Switching Breaks AI (And Why That Matters for Southeast Asia)</title>
      <dc:creator>Christopher Pribadi</dc:creator>
      <pubDate>Sat, 16 May 2026 05:42:42 +0000</pubDate>
      <link>https://dev.to/christopher_pribadi_344bc/how-code-switching-breaks-ai-and-why-that-matters-for-southeast-asia-lb2</link>
      <guid>https://dev.to/christopher_pribadi_344bc/how-code-switching-breaks-ai-and-why-that-matters-for-southeast-asia-lb2</guid>
      <description>&lt;p&gt;Your Singapore team is on a call. Someone says: "Eh, so we need to deploy this feature lah, but the database query very slow lor. How we optimize? Boleh ask the backend team?"&lt;/p&gt;

&lt;p&gt;That sentence has &lt;strong&gt;English, Malay, and Singlish&lt;/strong&gt; grammar patterns all mixed together.&lt;/p&gt;

&lt;p&gt;Try to transcribe it with Otter.ai or Google Meet's built-in captions. You'll get something like: "Eh, so we need to deploy this feature la, but the database query very slow or how we optimize..."&lt;br&gt;
The words are garbled. The meaning is lost. And if you're using that transcript as your meeting notes, you've got a mess.&lt;br&gt;
This is code-switching. And it's breaking almost every AI transcription tool on the market.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Is Code-Switching?&lt;/strong&gt;&lt;br&gt;
Code-switching is when a bilingual or multilingual speaker mixes two or more languages in a single conversation, often within the same sentence.&lt;br&gt;
It's not broken English. It's not bad grammar. It's actually a sign of linguistic sophistication. Bilingual speakers code-switch because it's the most efficient way to communicate with other bilingual people in their community.&lt;br&gt;
It's normal in:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Singapore: Singlish (English + Malay + Mandarin + Tamil grammar patterns)&lt;/strong&gt;&lt;br&gt;
Malaysia: Bahasa Rojak (English + Malay + Cantonese)&lt;br&gt;
Philippines: Taglish (Tagalog + English)&lt;br&gt;
Thailand: Denglish (English + Thai)&lt;br&gt;
Indonesia: Bahasa Campur (Indonesian + English + regional languages)&lt;br&gt;
Vietnam: Vietglish (Vietnamese + English)&lt;/p&gt;

&lt;p&gt;Every Southeast Asian professional does this. It's how we communicate.&lt;br&gt;
But here's the problem: AI transcription models were trained on monolingual speech. They've never seen this before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Why AI Breaks on Code-Switching&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Modern speech-to-text models work by:&lt;/p&gt;

&lt;p&gt;Tokenization: Breaking audio into phonemes (individual sounds)&lt;br&gt;
Language identification: Figuring out which language it is&lt;br&gt;
Pattern matching: Matching sound patterns to known words&lt;br&gt;
Grammar correction: Using language models to fix errors&lt;/p&gt;

&lt;p&gt;Code-switching breaks at step 2.&lt;br&gt;
When you say a sentence in Singlish, the AI hears English words and Malay grammar patterns simultaneously. The language identification model gets confused. Is this English or Malay? The model has to pick one. It picks wrong. Everything downstream breaks.&lt;br&gt;
Here's a real example:&lt;br&gt;
What was said: "Eh, cannot lah, server go down already."&lt;br&gt;
What Google Transcribe hears: English (because the root words are English)&lt;br&gt;
What Google Transcribe outputs: "Eh, cannot la server go down already" (it strips the Singlish particles because they don't fit English grammar)&lt;br&gt;
What the speaker meant: "No, we can't do that right now, because the server has crashed."&lt;br&gt;
The meaning was there. But the model didn't preserve it.&lt;/p&gt;

&lt;p&gt;The Training Data Problem&lt;br&gt;
Why does this happen?&lt;br&gt;
Because the datasets used to train these models don't include code-switched speech.&lt;br&gt;
Google, OpenAI, and other major AI labs trained their speech models on:&lt;/p&gt;

&lt;p&gt;English audio (billions of hours)&lt;br&gt;
Mandarin audio (billions of hours)&lt;br&gt;
Spanish, French, German, etc. (hundreds of millions of hours each)&lt;/p&gt;

&lt;p&gt;But Singlish? Taglish? Bahasa Campur?&lt;br&gt;
&lt;strong&gt;There's almost no training data.&lt;/strong&gt;&lt;br&gt;
Why? Because code-switched speech is:&lt;/p&gt;

&lt;p&gt;Hard to label (Is this English or Malay? Both? The labeler has to make a judgment call)&lt;br&gt;
Not standardized (Singlish spoken in Singapore sounds different from Singlish in Malaysia)&lt;br&gt;
Seen as "low prestige" by researchers (academic datasets tend to focus on formal, monolingual speech)&lt;br&gt;
Computationally expensive to include (mixed-language models are harder to train)&lt;/p&gt;

&lt;p&gt;So the models just ignore it. And when they encounter it, they fail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Matters for SE Asian Teams&lt;/strong&gt;&lt;br&gt;
Imagine you're a regional PM. You record a standup with your Singapore, Bangkok, and Manila teams. Everyone code-switches naturally — it's how they communicate best.&lt;br&gt;
You use Otter.ai to transcribe. You get:&lt;/p&gt;

&lt;p&gt;60% accuracy on the English parts&lt;br&gt;
40% accuracy on the code-switched parts&lt;br&gt;
Completely butchered grammar and meaning in the mixed sentences&lt;br&gt;
Useless meeting notes&lt;/p&gt;

&lt;p&gt;So you spend 20 minutes manually fixing the transcript. Or you just don't use it.&lt;br&gt;
Either way, you've lost the main benefit of transcription: saving time.&lt;br&gt;
Scale this across a team. You're losing hours every week to bad transcription.&lt;br&gt;
For a team of 10, that's 500 hours a year of wasted time. For a 50-person team, it's 2,500 hours.&lt;br&gt;
That's real money.&lt;/p&gt;

&lt;p&gt;The Current Solutions (And Why They Don't Work)&lt;br&gt;
Option 1: Use a tool built for your specific language&lt;br&gt;
There are some tools built for Singlish or Taglish specifically. But they only work if you speak one code-switched language. If your team spans multiple countries, you're out of luck.&lt;/p&gt;

&lt;p&gt;Option 2: Record separate videos in each language&lt;br&gt;
Some teams do this. One person records the English parts, someone else records the Malay parts. This is absurd and doesn't reflect how people actually talk.&lt;/p&gt;

&lt;p&gt;Option 3: Use Google Meet or Zoom's built-in captions&lt;br&gt;
They're improving, but still 50-60% accurate on code-switched speech. Better than nothing, but still not usable for meeting notes.&lt;/p&gt;

&lt;p&gt;Option 4: Hire someone to manually transcribe&lt;br&gt;
Expensive and slow. But it works because humans can understand code-switching. A human transcriber gets the meaning right even if the grammar is mixed.&lt;/p&gt;

&lt;p&gt;Option 5: Just don't transcribe&lt;br&gt;
Most teams do this. They record meetings but don't transcribe them because the tools are so bad at code-switching.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How We're Solving This at BYSIK&lt;/strong&gt;&lt;br&gt;
When we started building BYSIK, we noticed this problem immediately.&lt;br&gt;
Our first user was a Singapore startup with a team across SG, MY, and ID. They said: "Every transcription tool fails on our meetings because we speak Singlish. We just stopped using transcription."&lt;br&gt;
That's when we realized: the existing solutions aren't built for Southeast Asia.&lt;br&gt;
So we did something different:&lt;br&gt;
&lt;strong&gt;1. We trained on code-switched speech&lt;/strong&gt;&lt;br&gt;
We built a dataset of actual code-switched audio from SE Asian professionals. Singlish, Taglish, Bahasa Campur, mixed Thai-English, all of it. We labeled it carefully and trained our speech-to-text model on this specific data.&lt;br&gt;
&lt;strong&gt;2. We use language-agnostic embedding models&lt;/strong&gt;&lt;br&gt;
Instead of deciding "Is this English or Malay?" upfront, we use embedding models that represent words in a shared semantic space. "Lah" (Malay particle) and "already" (English particle) both mean roughly the same thing in context. The model learns this.&lt;br&gt;
&lt;strong&gt;3. We added dialect and accent handling&lt;/strong&gt;&lt;br&gt;
Singlish from Singapore sounds different from Singlish from Malaysia. Bangkok Thai-English is different from Northern Thai-English. We built models that handle these variations.&lt;br&gt;
&lt;strong&gt;4. We preserve the original speech patterns&lt;/strong&gt;&lt;br&gt;
Instead of "correcting" code-switched speech into monolingual grammar, we keep it as spoken. If you said "cannot lah," the transcript says "cannot lah," not "cannot." The meaning is preserved.&lt;br&gt;
The result? 85%+ accuracy on code-switched speech, compared to 40-50% on existing tools.&lt;br&gt;
Is it perfect? No. Code-switching is inherently ambiguous sometimes — even humans disagree on what was said. But it's accurate enough to be useful for meeting notes, which is the point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;The Bigger Picture: Why SE Asia Keeps Getting Left Behind&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
This code-switching problem is a microcosm of a bigger issue.&lt;br&gt;
AI is built in the US and China. The training data is English, Mandarin, and a few other "high-resource" languages.&lt;br&gt;
Everything else — including all of Southeast Asia's languages and dialects — gets treated as an edge case.&lt;br&gt;
So we have tools that work great for monolingual English speakers in San Francisco. They're okay for Mandarin speakers in Shanghai. But for a multilingual team in Singapore? For developers in Manila who code-switch naturally? For regional teams that communicate across languages?&lt;br&gt;
The tools fail.&lt;br&gt;
And the assumption is: "That's an edge case. Most of the world speaks monolingual English or Mandarin anyway."&lt;br&gt;
But Southeast Asia is 650 million people. It's not an edge case. It's a massive market that's been ignored because building for multilingual, code-switched speech is harder than building for monolingual English.&lt;/p&gt;

&lt;p&gt;What Needs to Change&lt;br&gt;
For researchers: Start collecting and publishing datasets of code-switched speech. It's harder than monolingual data, but it's important. Southeast Asia's languages matter.&lt;br&gt;
For AI companies: Stop pretending code-switching is an edge case. Train your models on it. Your users in SE Asia deserve tools that work.&lt;br&gt;
For regional companies: If existing tools don't work for you, you don't have to accept it. Build your own, or support tools (like BYSIK) that are built for your market.&lt;br&gt;
For SE Asian founders: This is an opportunity. The entire region is using transcription tools that don't work for how we actually speak. That's a problem worth solving.&lt;/p&gt;

&lt;p&gt;The Practical Takeaway&lt;br&gt;
If you're running a team in Southeast Asia and you've been frustrated with transcription accuracy, now you know why.&lt;br&gt;
It's not your audio quality. It's not your accent. It's not that you're speaking "wrong."&lt;br&gt;
It's that the tools were built for a different market. They were trained on monolingual speech. Code-switching breaks them.&lt;br&gt;
There are solutions now. Tools are getting better at handling multilingual and code-switched speech. If you've given up on transcription because it didn't work, try again. The tech has caught up.&lt;br&gt;
And if you find a tool that actually understands how your team talks — that preserves meaning instead of "correcting" your language — stick with it. You've found something rare.&lt;/p&gt;

&lt;p&gt;P.S. — If you want to geek out about the linguistics of code-switching, there's a whole field of research on it. Start here: Poplack's "The Bilingual's Linguistic System: Evidence for Asymmetric Competence." It's fascinating stuff.&lt;br&gt;
And if you're building tools for Southeast Asia, feel free to reach out. I'm always interested in talking to founders who are solving regional problems instead of just copying the US.&lt;/p&gt;

&lt;p&gt;Full disclosure: I founded BYSIK AI because of this exact problem. We're solving code-switching for transcription. But even if you use a different tool, I hope this helped you understand why transcription has been hard in SE Asia and why it's getting better.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>nlp</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
