<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sidharth K</title>
    <description>The latest articles on DEV Community by Sidharth K (@sidharthk).</description>
    <link>https://dev.to/sidharthk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3861354%2F883c0d68-96a8-495c-a36b-b2675d56ef73.jpeg</url>
      <title>DEV Community: Sidharth K</title>
      <link>https://dev.to/sidharthk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sidharthk"/>
    <language>en</language>
    <item>
      <title>I built a "Bring Your Own Key" Chrome extension to kill my $22/month dictation subscription</title>
      <dc:creator>Sidharth K</dc:creator>
      <pubDate>Sat, 04 Apr 2026 18:04:24 +0000</pubDate>
      <link>https://dev.to/sidharthk/i-built-a-bring-your-own-key-chrome-extension-to-kill-my-22month-dictation-subscription-54km</link>
      <guid>https://dev.to/sidharthk/i-built-a-bring-your-own-key-chrome-extension-to-kill-my-22month-dictation-subscription-54km</guid>
      <description>&lt;p&gt;Last year I crossed a line with subscription fatigue.&lt;/p&gt;

&lt;p&gt;I was mid-sprint, voice-drafting a long client email, and my dictation tool flashed a notification: &lt;em&gt;"You've used 80% of your monthly minutes."&lt;/em&gt; I had a week left in the billing cycle. I could upgrade to the next plan tier — $38/month — or just… start typing again.&lt;/p&gt;

&lt;p&gt;I closed the notification and started building Aurai instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Aurai does
&lt;/h2&gt;

&lt;p&gt;Aurai is a Chrome extension that turns your voice into clean, polished text — and pastes it directly into whatever you were typing in. Gmail, Slack, Notion, any textarea or contenteditable on the web.&lt;/p&gt;

&lt;p&gt;But it's not just speech-to-text. Raw STT output is messy: run-on sentences, missing punctuation, homophones that fool the model ("their/there", "tortoise invoice" vs. "total invoice"). Aurai passes your transcript through a Gemini refinement layer that fixes these contextually and applies proper formatting before anything hits your text box.&lt;/p&gt;

&lt;p&gt;There's also a tone-shifting feature: you can flag your dictation as needing a Professional, Excited, or Persuasive rewrite. Useful when you're thinking out loud in casual language but need polished output.&lt;/p&gt;

&lt;h2&gt;
  
  
  The BYOK architecture
&lt;/h2&gt;

&lt;p&gt;The core design decision — and the one I'd want other builders to think about — is &lt;strong&gt;Bring Your Own Key&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Most AI-powered tools work like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User → Your App (backend) → AI API → Your App → User
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You pay the API bill, mark it up, and charge a subscription. It's a valid business model, but it creates a few problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You're a cost center before you have users&lt;/li&gt;
&lt;li&gt;Your users' data flows through your servers&lt;/li&gt;
&lt;li&gt;You need rate limiting, abuse prevention, key rotation...&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Aurai flips it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User (with their own API key) → AI API directly → User
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The extension runs entirely in the browser. The user's API key (stored in &lt;code&gt;chrome.storage.sync&lt;/code&gt;, never transmitted to me) is used to call the Gemini API directly from the extension. My server involvement: zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this means in practice:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No backend to maintain&lt;/li&gt;
&lt;li&gt;No API costs to absorb&lt;/li&gt;
&lt;li&gt;No data retention liability&lt;/li&gt;
&lt;li&gt;Unlimited usage — the user's own quota, not mine&lt;/li&gt;
&lt;li&gt;Free Google AI Studio keys cover most individual usage comfortably&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The technical challenge: injecting text into modern web apps
&lt;/h2&gt;

&lt;p&gt;This was the part that took the longest.&lt;/p&gt;

&lt;p&gt;Clicking into a Gmail compose box and programmatically inserting text sounds trivial. In practice, modern web frameworks maintain their own virtual DOM and state, so if you just set &lt;code&gt;element.value = "..."&lt;/code&gt; or &lt;code&gt;element.textContent = "..."&lt;/code&gt;, the framework doesn't know the value changed — and the text may vanish on the next re-render or fail to trigger form validation.&lt;/p&gt;

&lt;p&gt;Here's the approach that ended up working across the most surfaces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;injectText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;focus&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inputEvent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;InputEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;input&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;inputType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;insertText&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;bubbles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;cancelable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tagName&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;INPUT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tagName&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TEXTAREA&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nativeInputValueSetter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOwnPropertyDescriptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HTMLInputElement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prototype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;value&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)?.&lt;/span&gt;&lt;span class="kd"&gt;set&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOwnPropertyDescriptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HTMLTextAreaElement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prototype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;value&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)?.&lt;/span&gt;&lt;span class="kd"&gt;set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;nativeInputValueSetter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;nativeInputValueSetter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dispatchEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inputEvent&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// For contenteditable divs (Gmail, Notion, Slack)&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;insertText&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Edge cases that still bite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CodeMirror / Monaco editors&lt;/strong&gt;: they intercept keyboard events at a different level; &lt;code&gt;execCommand&lt;/code&gt; doesn't work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;iframes&lt;/strong&gt;: cross-origin iframes need a separate content script injection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shadow DOM&lt;/strong&gt;: some apps use shadow roots — you need to walk the shadow tree.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Gemini prompting strategy
&lt;/h2&gt;

&lt;p&gt;The refinement prompt matters a lot. Here's the rough structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;You are a text editor assistant. You will receive raw voice dictation text.
Your job is to:
&lt;span class="p"&gt;1.&lt;/span&gt; Fix transcription errors (homophones, phonetic mistakes, misheard words)
&lt;span class="p"&gt;2.&lt;/span&gt; Add appropriate punctuation and sentence structure
&lt;span class="p"&gt;3.&lt;/span&gt; Preserve the speaker's intent and vocabulary
&lt;span class="p"&gt;4.&lt;/span&gt; Output ONLY the corrected text — no commentary, no explanation

&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;If tone shift requested&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="sx"&gt;Additionally,&lt;/span&gt; rewrite the text in a [Professional/Excited/Persuasive] tone
while preserving the core content.

Raw dictation:
"""
{transcript}
"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key lessons from iterating on this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"Output ONLY the corrected text"&lt;/strong&gt; is critical — without it, Gemini sometimes prefaces with "Here is the corrected version:" which gets pasted into the user's text box&lt;/li&gt;
&lt;li&gt;Short prompts perform better than elaborate ones for this task&lt;/li&gt;
&lt;li&gt;Tone shifting works much better as a separate instruction appended to the base prompt than as a separate API call&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;User onboarding for BYOK is harder than you think.&lt;/strong&gt; "Get a free API key from Google AI Studio" sounds easy, but for non-technical users, it's a friction point. I added a direct link inside the extension that opens the exact page, with a tooltip explaining what to copy. Still, it's the number-one support question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I should have built the tone-shifting UI before launch.&lt;/strong&gt; It was a late addition and the UX is a bit rough. Users want to set their default tone and forget it. Flagging this for v2.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Aurai is free and available on the Chrome Web Store.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://chromewebstore.google.com/detail/aurai/aiglaeddgdgfbcoieieojceeffggdkhc" rel="noopener noreferrer"&gt;→ Install Aurai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you've built BYOK products before, I'd love to hear how you handled onboarding — it's the biggest UX challenge in this model and I'm still iterating.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by Sid. If Aurai saves you time, there's a support option via Gumroad inside the app.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>showdev</category>
    </item>
  </channel>
</rss>
