<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Simangaliso Vilakazi</title>
    <description>The latest articles on DEV Community by Simangaliso Vilakazi (@smngvlkz).</description>
    <link>https://dev.to/smngvlkz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1080878%2Fda94027c-c16f-4fc8-9f3b-46a5f916d696.png</url>
      <title>DEV Community: Simangaliso Vilakazi</title>
      <link>https://dev.to/smngvlkz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/smngvlkz"/>
    <language>en</language>
    <item>
      <title>How I Replaced Gemini with a Self-Hosted LLM for Two Production Apps</title>
      <dc:creator>Simangaliso Vilakazi</dc:creator>
      <pubDate>Sat, 27 Jun 2026 13:56:38 +0000</pubDate>
      <link>https://dev.to/smngvlkz/how-i-replaced-gemini-with-a-self-hosted-llm-for-two-production-apps-3069</link>
      <guid>https://dev.to/smngvlkz/how-i-replaced-gemini-with-a-self-hosted-llm-for-two-production-apps-3069</guid>
      <description>&lt;p&gt;A while back I wrote about &lt;a href="https://dev.to/smngvlkz/a-calm-terminal-inspired-portfolio-focused-on-shipped-products-ga8"&gt;my terminal-inspired portfolio&lt;/a&gt; and the products it indexes. Two of those products lean on a language model: the portfolio terminal at  &lt;a href="https://smngvlkz.com" rel="noopener noreferrer"&gt;smngvlkz.com&lt;/a&gt; that you can ask questions, and &lt;a href="https://paychasers.com" rel="noopener noreferrer"&gt;PayChasers&lt;/a&gt;, which generates OPTIONAL payment follow-up emails. Both started on Google's Gemini 3 Flash. Both now run on a model I host myself, with a fallback chain that keeps them alive when my hardware is not.&lt;/p&gt;

&lt;p&gt;This is the story of that move. The experiment that started it, why I committed to it, what the architecture looks like, the night it broke, and the parts I still have not solved.&lt;/p&gt;

&lt;h2&gt;
  
  
  It started as an experiment
&lt;/h2&gt;

&lt;p&gt;When Qwen 3.5 was announced, it made me curious about how far open models have actually come. Instead of reading benchmarks, I tested it the way I like to learn things, by running it.&lt;/p&gt;

&lt;p&gt;It began as a small experiment on my base Mac mini. I pulled Qwen through &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; just to see how capable the model would be running directly on a local machine. The results were far better than I expected. Good enough that I stopped thinking of it as a toy and started thinking about production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why move off Gemini at all
&lt;/h2&gt;

&lt;p&gt;Gemini 3 Flash worked. The integration was a few lines and the quality was good. So this was not a "the API is bad" story. It was three smaller pulls that added up.&lt;/p&gt;

&lt;p&gt;The first was cost shape. PayChasers generates optional email drafts on demand, and every preview is a few thousand tokens of system prompt plus output. That is fine at zero users and a slow leak at volume. The marginal cost of an inference I run on a machine I already own is electricity.&lt;/p&gt;

&lt;p&gt;The second was control and privacy. I wanted to choose the model, pin it, and change the prompt contract without a provider deprecating something underneath me. I also did not love sending client names and payment context to a third party when I did not have to.&lt;/p&gt;

&lt;p&gt;The third was the economics of treating AI as infrastructure rather than a metered API. Once the model runs on hardware I control, it stops being a per-call expense and becomes shared infrastructure that multiple applications can use. The same inference server now powers two different products. That reframing is the whole point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting it to production was the hard part
&lt;/h2&gt;

&lt;p&gt;The original plan was to host the model on Oracle Cloud using one of their free Ampere ARM instances in the Johannesburg region. If you have ever tried to get one, you know the struggle. Free tier ARM capacity is brutally limited, and after more than 200 automated retry attempts across two days, I still could not get one.&lt;/p&gt;

&lt;p&gt;So I pivoted. I wrote a lightweight reverse proxy, set up a Cloudflare Tunnel on one of my domains, and routed production traffic to the model running on my Mac at home. No ports opened on my home network, no static IP, just a tunnel from Cloudflare's edge to the machine on my desk.&lt;/p&gt;

&lt;p&gt;There is an honest tension here worth naming. Privacy was one of my reasons for leaving Gemini, and routing client data to a box on my desk is its own tradeoff. The tunnel keeps inbound ports closed and lets Cloudflare terminate TLS at the edge, and the reverse proxy sits in front of Ollama rather than exposing it directly. But this is a single machine I own, not a vendor's hardened, multi-tenant platform, and tightening access control on that endpoint, a service token rather than relying on an obscure hostname, is firmly on the list. Self-hosting moves the privacy boundary onto you, it does not remove it.&lt;/p&gt;

&lt;p&gt;It was meant to be temporary. The Oracle instance eventually did come through, but by then the home setup was working well, so I did not throw it away. Instead I kept the Mac mini as the primary and gave Oracle a different job, the always-on backup. More on that in a moment.&lt;/p&gt;

&lt;p&gt;This was a small full-circle moment. The Linux and infrastructure fundamentals I picked up during my bootcamp days and years of self-teaching showed up in a real production context. Provisioning tunnels, configuring DNS, writing a proxy service, setting up persistent services. All of it coming together for something real.&lt;/p&gt;

&lt;p&gt;One deliberate decision was to keep the infrastructure simple. There are a lot of frameworks and agent systems appearing in the space right now. I focused on straightforward tooling that solved the problems I actually had.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of the system
&lt;/h2&gt;

&lt;p&gt;The Mac mini, exposed through Cloudflare tunnel, is the &lt;strong&gt;primary&lt;/strong&gt;. It is fast but it is not always on, because it is a machine in my home. The Oracle Cloud VM is the &lt;strong&gt;fallback&lt;/strong&gt;. It is slower and smaller, but it stays up around the clock.&lt;/p&gt;

&lt;p&gt;Every app talks to a thin client that knows about both, tries the fast one first, and silently falls back to the reliable one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Vercel app
   |
   v
[ primary: Mac mini via Cloudflare tunnel ]  --fail/timeout--&amp;gt;  [ fallback: Oracle Cloud VM ]
        fast, not always on                                          slow, always on
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The failover client
&lt;/h2&gt;

&lt;p&gt;This is the whole idea in one function. Hit the primary with a timeout. If anything goes wrong, the status, the timeout, a dropped tunnel, fall through to the fallback.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PRIMARY_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OLLAMA_PRIMARY_URL&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;FALLBACK_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OLLAMA_FALLBACK_URL&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;PRIMARY_URL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchWithFallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;PRIMARY_URL&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="na"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AbortSignal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;15000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Primary failed (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;FALLBACK_URL&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Unknown error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Ollama request failed (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;): &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few small choices that matter more than they look:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The primary gets a &lt;strong&gt;15 second timeout&lt;/strong&gt;, the fallback does not. The thinking was that the fallback's job is to answer at all, so I let it take its time. In practice that means an unbounded &lt;code&gt;fetch&lt;/code&gt;, which can hang if Oracle is reachable but wedged. A long timeout would be the more defensible version of the same idea, and I have not added one yet&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;catch&lt;/code&gt; swallows why the primary failed, no log, no signal. Fine for failing over, bad for diagnosing, and something I would tighten before I called this production hardened.&lt;/li&gt;
&lt;li&gt;The fallback URL &lt;strong&gt;defaults to the primary&lt;/strong&gt;, so the same code runs locally with one Ollama instance and no special config.&lt;/li&gt;
&lt;li&gt;Failover is &lt;strong&gt;transparent&lt;/strong&gt;. The caller never knows which machine answered.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Load-aware model selection
&lt;/h2&gt;

&lt;p&gt;Running your own models means you also get to decide which model serves which request. I do a very simple version of routing based on how many requests are in flight.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;activeRequests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;selectModel&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// 1 request: best quality. 2+: lighter model that handles concurrency.&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;activeRequests&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;FALLBACK_MODEL&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PRIMARY_MODEL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The intent is that a single visitor gets &lt;code&gt;qwen3.5:latest&lt;/code&gt;, the better model, and the moment requests overlap, new ones drop to &lt;code&gt;qwen2.5-coder:7b&lt;/code&gt;, which is lighter under concurrency. It is one counter and a ternary, the cost and quality trade off in miniature.&lt;/p&gt;

&lt;p&gt;I will be honest about how well it actually works, because it is more idea than guarantee. &lt;code&gt;activeRequests&lt;/code&gt; is a module-level counter, so on serverless it only sees concurrency inside a single warm instance, not across the fleet. Worse, in the streaming path it is decremented in a &lt;code&gt;finally&lt;/code&gt; that runs when the function returns the &lt;code&gt;Response&lt;/code&gt;, which is before the stream has finished generating. So for the streaming features, which is most of what these apps do, the counter is near zero almost all the time and the downgrade rarely fires. It works on the non-streaming path, where the count wraps the full call. Right now it is more a hook I reached for early than a load balancer that earns the name.&lt;/p&gt;

&lt;p&gt;I also pass two Ollama options that earn their keep:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;keep_alive: -1&lt;/code&gt; keeps the model resident in memory so the next request does not pay the cold load.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;think: false&lt;/code&gt; turns off the reasoning tokens, because for a portfolio terminal and an email draft I want the answer, not the monologue.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Not everything should hit the model
&lt;/h2&gt;

&lt;p&gt;The cheapest inference is the one you never run. Previously my portfolio terminal used Gemini 3 Flash for natural language queries while common commands were handled locally without AI. I kept that split when I moved the natural language layer onto my own infrastructure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lowerQuery&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lowerQuery&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;help&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* return static command list */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lowerQuery&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;list all&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* return products + systems from data */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lowerQuery&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;show activity&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* return GitHub/GitLab stats */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;showMatch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;lowerQuery&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^show&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;([\w&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)\s&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;(\w&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;$/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;showMatch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* answer straight from structured data */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// only open-ended natural language falls through to the model&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;help&lt;/code&gt;, &lt;code&gt;list&lt;/code&gt;, &lt;code&gt;show&lt;/code&gt;, and &lt;code&gt;explain&lt;/code&gt; are answered straight from the typed data. Only genuinely open-ended questions stream from the model. It is faster, it is free, and it is more reliable than asking a 7B model to format a list it could get wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming the answer
&lt;/h2&gt;

&lt;p&gt;For the open-ended path, the portfolio streams tokens over server-sent events. Ollama returns newline-delimited JSON, so the route reads the body, split on newlines, and re-emits each token as an SSE frame.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;decoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TextDecoder&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;decoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s2"&gt;`data: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;token&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="p"&gt;})}&lt;/span&gt;&lt;span class="s2"&gt;\n\n`&lt;/span&gt;
            &lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both products stream responses token by token and run entirely on infrastructure I control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Constraining the model so the output is usable
&lt;/h2&gt;

&lt;p&gt;PayChasers is where the prompt work actually lives, because the output is not a chat bubble, it is an email that gets sent to someone's client. Two things make a self-hosted 7B model reliable enough for that.&lt;/p&gt;

&lt;p&gt;First, the model never writes real values. It writes placeholders, and the app fills them in. This keeps the model from hallucinating an amount or a name.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CRITICAL: You MUST use these exact placeholder variables instead of real values:
- {clientName} for the recipient's name
- {dueDate} for the due date
- {amount} for the amount owed
- {daysOverdue} for the number of days overdue

For example: "Hey {clientName}," NOT "Hey John,".
Return ONLY valid JSON.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Second, the tone escalates with how late the payment is, decided in code, not left to the model's mood.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;determineTone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;daysOverdue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;daysOverdue&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;urgent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;daysOverdue&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;firm&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;friendly&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And because a local model will occasionally wrap its JSON in a code fence or stray &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; block no matter how firmly you ask, the parser is defensive rather than trusting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;extractJson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&amp;lt;think&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;[\s\S]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;?&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;think&amp;gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/```&lt;/span&gt;&lt;span class="se"&gt;(?:&lt;/span&gt;&lt;span class="sr"&gt;json&lt;/span&gt;&lt;span class="se"&gt;)?\s&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;([\s\S]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;?)&lt;/span&gt;&lt;span class="sr"&gt;```/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;first&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;indexOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;{&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;last&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lastIndexOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;}&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;first&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;last&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;first&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;first&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;last&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Self-hosting a smaller model means you trade some of the provider's polish for parsing your own. That is a fair trade when the upside is control and cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  The night the power went out
&lt;/h2&gt;

&lt;p&gt;Then I learned the lesson that every self-hoster learns eventually.&lt;/p&gt;

&lt;p&gt;There was a small power outage one night around 20:00. The Mac mini, my primary inference node, switched off, and it never came back on. I only realised the next morning.&lt;/p&gt;

&lt;p&gt;PayChasers failed over to the Oracle backup automatically, exactly as it should have. But the floating terminal in my portfolio had no failover, so it just sat there dead all night. Anyone who was bored enough to try and poke at my portfolio that night got nothing.&lt;/p&gt;

&lt;p&gt;Two lessons came out of that morning:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Every service that needs inference needs a failover.&lt;/strong&gt; Not just the ones I remembered to set it up for. The portfolio terminal got the same &lt;code&gt;fetchWithFallback&lt;/code&gt; client that PayChasers already had.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A 12-hour outage I did not even notice is a monitoring problem,&lt;/strong&gt; not just me being forgetful. Mostly. Partly forgetful.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Self-hosting your own AI is great until you are the one on call at 8am on a Saturday, and there is no one else to escalate to, because it is your own thing&lt;/p&gt;

&lt;h2&gt;
  
  
  Knowing when the homelab is down
&lt;/h2&gt;

&lt;p&gt;So I built the monitoring I should have had first. PayChasers runs a small cron that health-checks both Ollama endpoints and emails me, but only on &lt;strong&gt;state transition&lt;/strong&gt;, up to down or down to up. It keeps the last known state in Upstash Redis so it does not spam me every 5 minutes while the mini is asleep.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ENDPOINTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;primary-mac&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OLLAMA_PRIMARY_URL&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fallback-oracle&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OLLAMA_FALLBACK_URL&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="c1"&gt;// hit /api/tags on each, compare to stored state in Redis,&lt;/span&gt;
&lt;span class="c1"&gt;// send a Resend email only when ok flips. Auth via CRON_SECRET.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when the mini goes offline, traffic quietly shifts to Oracle and I get exactly one email telling me so. That is the entire operations story, and that is the amount of operations story I want for a side project.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I have not solved
&lt;/h2&gt;

&lt;p&gt;I want to be honest about the edges, because the architecture above is the easy part.&lt;/p&gt;

&lt;p&gt;My evaluation is still vibes. I read the generated emails, decide they look good, and ship. I do not have an eval harness scoring tone, placeholder correctness, or JSON validity across a fixed set of cases. I should. When I claim qwen3.5 is "better" than qwen2.5-coder for a request, that is intuition, not a benchmark.&lt;/p&gt;

&lt;p&gt;The irony is that the plumbing is already there. PayChasers runs PostHog for the product funnel, signups, chases created, upgrades. Capturing AI events would be trivial. A &lt;code&gt;draft_generated&lt;/code&gt;, &lt;code&gt;draft_accepted&lt;/code&gt;, &lt;code&gt;draft_edited&lt;/code&gt;, &lt;code&gt;draft_regenerated&lt;/code&gt; funnel would tell me, with real users, how often a generated email ships untouched versus gets rewritten. That acceptance rate is a real quality signal, and it is the cheapest first step from vibes towards measurement. I just have not wired it yet.&lt;/p&gt;

&lt;p&gt;My model selection is instinct, not measurement. I picked these Qwen models because they ran well on my hardware and read well in practice. A systematic version would measure latency, quality, and cost per model and route on data.&lt;/p&gt;

&lt;p&gt;And I have not touched retrieval. Both apps stuff their full context into the system prompt, which is fine at this size and would fall apart the moment the data outgrew the window. There is no RAG here, and I have not yet had to reach for it.&lt;/p&gt;

&lt;p&gt;I am pointing at these on purpose. The move off Gemini taught me serving, the cost and reliability tradeoff, basic routing, and prompt constraining by doing them. The next layer, real evaluation and measured model choice, is the part I am learning now.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it adds up to
&lt;/h2&gt;

&lt;p&gt;Open models have come a long way. It is becoming genuinely practical to run useful AI systems on relatively small infrastructure. No GPU cluster required. What started as a small experiment on a base mini is now live for real users across two products, on infrastructure I own.&lt;/p&gt;

&lt;p&gt;This is not a finished system. It is a snapshot of how I run a model I control today, and a map of what I am building next&lt;/p&gt;

</description>
      <category>ai</category>
      <category>selfhosted</category>
      <category>nextjs</category>
      <category>ollama</category>
    </item>
    <item>
      <title>I built a Zapier integration for my solo SaaS. Here's every decision I made</title>
      <dc:creator>Simangaliso Vilakazi</dc:creator>
      <pubDate>Mon, 06 Apr 2026 11:05:33 +0000</pubDate>
      <link>https://dev.to/smngvlkz/i-built-a-zapier-integration-for-my-solo-saas-heres-every-decision-i-made-2fm1</link>
      <guid>https://dev.to/smngvlkz/i-built-a-zapier-integration-for-my-solo-saas-heres-every-decision-i-made-2fm1</guid>
      <description>&lt;p&gt;I am a full-stack software engineer in Durban, South Africa. I work  full-time and build SaaS products after hours. PayChasers is one of them. It's a payment chasing tool for anyone who's owed money and tired of asking twice.&lt;/p&gt;

&lt;p&gt;Today I submitted it to Zapier's partner program. Here's exactly how it went and what I would do differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  What PayChasers does
&lt;/h2&gt;

&lt;p&gt;PayChasers handles chasing payment reminders for invoices, rent, deposits, failed payments, and personal debts — anything where someone owes you money and you'd rather not send awkward reminders yourself.&lt;/p&gt;

&lt;p&gt;You add a chase. You choose how to handle it. Send a follow-up manually when you're ready, or turn on Auto Chase and let PayChasers handle the escalation for you: friendly nudge, due today, overdue, firm. Email or WhatsApp. The tone sharpens as the payment ages. You stay in control of every message. Nothing goes out without your templates, your tone, your timing.&lt;/p&gt;

&lt;p&gt;It works. But the product lives in isolation. If someone tracks invoices in Google Sheets, manages deals in HubSpot, or processes payments through Stripe, they have to manually create chases in PayChasers. That's friction. Friction kills adoption.&lt;/p&gt;

&lt;p&gt;So I built a Zapier integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Zapier and not direct integrations
&lt;/h2&gt;

&lt;p&gt;I considered building native integrations. A Stripe webhook listener, a Google Sheets sync, a HubSpot connector. Each one would take days. Each one adds a maintenance surface. Each one serves one app.&lt;/p&gt;

&lt;p&gt;Zapier serves 7000+ apps with one integration.&lt;/p&gt;

&lt;p&gt;I built one API surface and let Zapier be the bridge. A user who wants "Hubspot deal closes -&amp;gt; create a chase in PayChasers" can build that in 2 minutes without me writing a single line of HubSpot code.&lt;/p&gt;

&lt;p&gt;The trade-off is control. I can't fine-tune each experience. But for a solo engineer, the leverage is hard to beat.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the integration exposes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Triggers (things that happen in PayChasers)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;New Chase&lt;/strong&gt; - fires when a user creates a chase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chase Overdue&lt;/strong&gt; - fires when a chase passes its due date&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chase Paid&lt;/strong&gt; - fires when a chase is marked as paid&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Follow-up Sent&lt;/strong&gt; - fires when an automated follow-up email goes out&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Actions (things Zapier can do in PayChasers)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Create Chase&lt;/strong&gt; - create a new chase with client details, amount, due date, currency, and chase type&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Send Follow-up&lt;/strong&gt; - trigger a follow-up on an existing chase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mark as Paid&lt;/strong&gt; - mark a chase as paid&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Auth
&lt;/h3&gt;

&lt;p&gt;API key authentication. Users generate a key in Settings, paste it into Zapier. Simple, no OAuth dance. The key is hashed with SHA-256 before storage. I never see the raw key after it's generated.&lt;/p&gt;

&lt;h2&gt;
  
  
  The API layer
&lt;/h2&gt;

&lt;p&gt;I built a &lt;code&gt;/api/v1/&lt;/code&gt; REST API specifically for this. It's separate from the internal app routes. Endpoints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET    /api/v1/me                   — verify auth
GET    /api/v1/chases               — list chases
POST   /api/v1/chases               — create a chase
POST   /api/v1/chases/:id/paid      — mark paid
POST   /api/v1/chases/:id/follow-up — send follow-up
GET    /api/v1/clients              — list clients
GET    /api/v1/triggers/*           — polling triggers for Zapier
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The triggers use Zapier's polling model. Zapier hits the trigger endpoint every few minutes, and I return items sorted by most recent. Zapier deduplicates by ID.&lt;/p&gt;

&lt;p&gt;I considered webhooks (Zapier's REST Hook model) but polling was simpler to implement and doesn't require me to manage subscription state. For a product at my scale, polling every few minutes is fine and I dont see any issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  The embed strategy
&lt;/h2&gt;

&lt;p&gt;Zapier offers a "Full Zapier Experience" embed. A JavaScript widget that lets users discover and build "Zaps" directly inside your app. I built the page for it but designed it with two states:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before approval:&lt;/strong&gt; An informational page explaining whats coming. no dead UI, no broken embed. Just a clear explanation of what Zapier automation enables and what it will look like.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After approval:&lt;/strong&gt; Set one environment variable (&lt;code&gt;NEXT_PUBLIC_ZAPIER_FZE_CLIENT_ID&lt;/code&gt;), ready to redeploy and the embed loads. Users can browse Zap templates and build automations without leaving PayChasers.&lt;/p&gt;

&lt;p&gt;Zero code changes needed when the switch flips. One env var.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ref-based signup attribution
&lt;/h2&gt;

&lt;p&gt;This is the part I almost skipped and I am glad I didn't.&lt;/p&gt;

&lt;p&gt;When someone clicks through from a Zap template to sign up, I wanted to know &lt;em&gt;&lt;em&gt;which&lt;/em&gt;&lt;/em&gt; template drove them. So the signup page accepts a &lt;code&gt;?ref=&lt;/code&gt; parameter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/signup?ref=zapier-stripe&lt;/code&gt; → "Chase failed Stripe payments automatically."&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/signup?ref=zapier-sheets&lt;/code&gt; → "Turn your spreadsheet into a payment chasing machine."&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/signup?ref=zapier-hubspot&lt;/code&gt; → "Close the deal. Chase the payment. Automatically."&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/signup?ref=zapier-forms&lt;/code&gt; → "From form submission to follow-up in seconds."&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/signup?ref=zapier&lt;/code&gt; → "Automate your payment follow-ups."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each ref swaps the hero headline and subtitle on the signup page. The person sees copy that matches their intent.&lt;/p&gt;

&lt;p&gt;The ref is also stored on the user record in the database as &lt;code&gt;signupRef&lt;/code&gt;. After 30 days of beinfg live I can run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;signupRef&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"User"&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;signupRef&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;signupRef&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That tells me which integration is driving signups, which use case resonates, and where to double down in listing copy. Without it, I would know signups are happening but not why.&lt;/p&gt;

&lt;p&gt;One column. Massive signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The partner review process
&lt;/h2&gt;

&lt;p&gt;Submitting to Zapier's partner program is straightforward but not instant. Here's how it went:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build the integration&lt;/strong&gt; on Zapier's developer platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Submit for review.&lt;/strong&gt; Their team assigns a reviewer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance check.&lt;/strong&gt; The reviewer checked my homepage, terms of service, and privacy policy to confirm the country of operation. He couldn't find it. Fair, it wasn't there.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix and respond.&lt;/strong&gt; I added "South Africa" to my Terms of Service, Privacy Policy, and Contact page within the hour. Updated the Governing Law clause to reference South African law.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wait for Beta.&lt;/strong&gt; After passing review, the integration moves to Beta in the Zapier app directory.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The compliance ask caught me off guard but makes sense. They're publishing your app in their directory, they need to know where you operate.&lt;/p&gt;

&lt;p&gt;If you're building a Zapier integration, put your country and legal jurisdiction on your site before you submit. Save yourself a round trip.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start with the API earlier.&lt;/strong&gt; The &lt;code&gt;/api/v1/&lt;/code&gt; layer is clean and useful beyond Zapier. It could serve a mobile app, a CLI, or any other client. I should have built it as a first-class citizen from day one instead of bolting it on for Zapier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't overthink trigger architecture.&lt;/strong&gt; I spent time debating polling vs webhooks. Polling is fine. Ship it. You can add webhooks later if scale demands it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add signup attribution from the start.&lt;/strong&gt; The &lt;code&gt;signupRef&lt;/code&gt; field took 10 minutes to add. It should have been there from the first signup form.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Put your legal jurisdiction on your site from day one.&lt;/strong&gt; Not just for Zapier, for any partnership, integration, or compliance review. It took me an hour to fix. It should have been there from the start.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current status
&lt;/h2&gt;

&lt;p&gt;PayChasers has paying users across multiple countries. We are early, but the signal is real.&lt;/p&gt;

&lt;p&gt;The integration has been submitted and is in review. The embed page is built and waiting. The attribution tracking is live. When the green light comes, it's one env var and a deploy.&lt;/p&gt;

&lt;p&gt;If you're an indie hacker thinking about integrations, Zapier is high leverage. One integration surface, thousands of connections. Build the API, submit the integration, track your refs, and let the platform do the distribution.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I am Simangaliso, building PayChasers from South Africa. If you've ever sent a "just following up" email manually, PayChasers was built for you. &lt;a href="https://paychasers.com" rel="noopener noreferrer"&gt;paychasers.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>indiehacker</category>
      <category>saas</category>
      <category>zapier</category>
      <category>webdev</category>
    </item>
    <item>
      <title>A Terminal-Inspired Portfolio of Shipped and Researched Products (2026)</title>
      <dc:creator>Simangaliso Vilakazi</dc:creator>
      <pubDate>Fri, 02 Jan 2026 11:47:38 +0000</pubDate>
      <link>https://dev.to/smngvlkz/a-calm-terminal-inspired-portfolio-focused-on-shipped-products-ga8</link>
      <guid>https://dev.to/smngvlkz/a-calm-terminal-inspired-portfolio-focused-on-shipped-products-ga8</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/new-year-new-you-google-ai-2025-12-31"&gt;New Year, New You Portfolio Challenge Presented by Google AI&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  About Me
&lt;/h2&gt;

&lt;p&gt;I am a full-stack software engineer based in South Africa, focused on building durable software that ships and lasts.&lt;br&gt;
I care about clear system boundaries, low-maintenance architectures, and products that solve real problems without unnecessary complexity. Most of my work lives in production: SaaS tools, automation systems, payment flows, and community platforms.&lt;br&gt;
This portfolio is meant to reflect how I actually work. Pragmatic, intentional, and biased toward execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Portfolio
&lt;/h2&gt;

&lt;p&gt;You can visit it at: &lt;a href="https://smngvlkz.com" rel="noopener noreferrer"&gt;https://smngvlkz.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This portfolio is structured like internal system documentation rather than a traditional marketing site. It highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live and archived products I have shipped&lt;/li&gt;
&lt;li&gt;Clear product intent, status, and scope&lt;/li&gt;
&lt;li&gt;Ongoing research and exploratory work&lt;/li&gt;
&lt;li&gt;SaaS, automation, payments, and platform work&lt;/li&gt;
&lt;li&gt;Open-source and community contributions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  AI Integration
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Design principle:&lt;/strong&gt;&lt;br&gt;
AI is used only where it improves inspectability or reduces cognitive load. All core functionality remains deterministic and debuggable without AI.&lt;/p&gt;

&lt;p&gt;This portfolio includes an interactive terminal (&lt;strong&gt;SYSTEM.QUERY&lt;/strong&gt;) powered by &lt;strong&gt;Google Gemini 3 Flash.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Users can query the &lt;strong&gt;entire&lt;/strong&gt; portfolio data directly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured commands (handled locally for speed):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;help&lt;/strong&gt; - list available commands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;list all&lt;/strong&gt; - show all products and contributions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;show activity&lt;/strong&gt; - show GitHub &amp;amp; GitLab stats, streak, sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;show [id] [field]&lt;/strong&gt; - show specific data for any item in the portfolio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;explain [id]&lt;/strong&gt; - show full breakdown for any product or contribution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;list fields&lt;/strong&gt; - show queryable fields&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Natural language queries (powered by Gemini) - examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"what tech does paychasers use?"&lt;/li&gt;
&lt;li&gt;"compare the infrastructure of all products."&lt;/li&gt;
&lt;li&gt;"which product uses blockchain?"&lt;/li&gt;
&lt;li&gt;"what's the Cape Community Blog built with?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system prompt constrains Gemini to portfolio data only, with terminal-style output formatting. Common queries are handled locally without an API call. Complex or natural language queries fall back to Gemini.&lt;/p&gt;

&lt;p&gt;This demonstrates practical AI integration: fast where possible, intelligent when needed, without over-engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Live Activity Data (GitHub &amp;amp; GitLab)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;SYSTEM.ACTIVITY&lt;/strong&gt; pulls live data from GitHub and GitLab using authenticated API requests.&lt;/p&gt;

&lt;p&gt;A server-side route aggregates commits, repositories, contribution streaks, and session metadata into a unified activity model. This model is used in two places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Rendered as live system activity stats in the interface, including a 111-day contribution heatmap that visualizes real contributions with interactive tooltips (showing source breakdown: GitHub/GitLab commits, date, and day of week). Only days with actual contributions are inspectable on hover.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Exposed to &lt;strong&gt;SYSTEM.QUERY&lt;/strong&gt;, allowing users to inspect and query the same data via the terminal.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps activity up to date without manual updates and demonstrates real external API integration alongside AI-powered querying, with a single source of truth shared across UI and terminal&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Framework:&lt;/strong&gt; Next.js&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language:&lt;/strong&gt; Typescript, JavaScript&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Styling:&lt;/strong&gt; Tailwind CSS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI:&lt;/strong&gt; Google Gemini 3 Flash&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment:&lt;/strong&gt; Google Cloud Run (fully managed, container-based)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I used AI-assisted development to iterate on structure, copy clarity, and information hierarchy, while keeping the final implementation intentionally simple and deterministic.&lt;/p&gt;

&lt;p&gt;The site favors static rendering, fast load times, and a small surface area for long-term maintainability. The terminal feature adds interactivity without compromising the minimal aesthetic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm Most Proud Of
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The restraint in the design. Nothing exists without a reason&lt;/li&gt;
&lt;li&gt;Clear labeling of product status (live, inactive, archived)&lt;/li&gt;
&lt;li&gt;Showing real shipped work instead of demo projects&lt;/li&gt;
&lt;li&gt;AI integration that fits the product's identity (terminal queries, not chatbot fluff)&lt;/li&gt;
&lt;li&gt;A portfolio that reflects engineering maturity, not trend-chasing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a highlight reel. It is a snapshot of how I build software today.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>portfolio</category>
      <category>gemini</category>
    </item>
  </channel>
</rss>
