<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chris Kechagias</title>
    <description>The latest articles on DEV Community by Chris Kechagias (@kris_k).</description>
    <link>https://dev.to/kris_k</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3792553%2F33824df4-686a-420f-8be9-8127043b51e2.jpeg</url>
      <title>DEV Community: Chris Kechagias</title>
      <link>https://dev.to/kris_k</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kris_k"/>
    <language>en</language>
    <item>
      <title>Building a Chatbot API From Scratch — Part 2: Streaming, Prompt Engineering and Docker</title>
      <dc:creator>Chris Kechagias</dc:creator>
      <pubDate>Mon, 06 Apr 2026 05:55:08 +0000</pubDate>
      <link>https://dev.to/kris_k/building-a-chatbot-api-from-scratch-part-2-streaming-prompt-engineering-and-docker-3cn4</link>
      <guid>https://dev.to/kris_k/building-a-chatbot-api-from-scratch-part-2-streaming-prompt-engineering-and-docker-3cn4</guid>
      <description>&lt;p&gt;Part 4 actually of building a retail inventory API and then giving it a brain.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://medium.com/@ck.chris.kechagias/building-a-chatbot-api-from-scratch-13-prs-a-lot-of-broken-things-and-a-context-window-that-d2ac7c9f3b25" rel="noopener noreferrer"&gt;&lt;strong&gt;Part 3&lt;/strong&gt;&lt;/a&gt; I built the chatbot foundation: FastAPI, PostgreSQL, conversation memory, context trimming, rolling summarization, and 13 PRs worth of broken things. The API worked. It remembered what you said. It didn't fall over when the context got too long.&lt;/p&gt;

&lt;p&gt;That was enough to call it functional. But it didn't feel finished. No streaming. No real identity. No way to run it anywhere except my machine.&lt;/p&gt;

&lt;p&gt;Five PRs later, all of that changed. Some of it was clean. Some of it was not.&lt;/p&gt;




&lt;h2&gt;
  
  
  PR 14 — Auto-Title Generation
&lt;/h2&gt;

&lt;p&gt;Small PR. Big quality-of-life improvement.&lt;/p&gt;

&lt;p&gt;Every new conversation started with the title &lt;code&gt;"New Chat..."&lt;/code&gt; and stayed that way forever. I wanted it to generate automatically from the first message, without blocking the response.&lt;/p&gt;

&lt;p&gt;The approach: fire a background task after the conversation is created.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;update_conversation_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;asyncio.create_task()&lt;/code&gt; schedules it and moves on. The &lt;code&gt;201 Created&lt;/code&gt; fires immediately. The title shows up a second or two later. Clean.&lt;/p&gt;

&lt;p&gt;But before I got there, I spent an embarrassing amount of time debugging. The first version of &lt;code&gt;generate_conversation_title&lt;/code&gt; was calling the main model (&lt;code&gt;gpt-5-mini&lt;/code&gt;) and getting back empty responses. Latency was around 22 seconds. 22 seconds for a title.&lt;/p&gt;

&lt;p&gt;The problem was &lt;code&gt;max_completion_tokens&lt;/code&gt;. I had it set to &lt;code&gt;1000&lt;/code&gt; which is too low for reasoning models (they need token budget to think before responding). But even after bumping it, the main model was overkill for something this simple.&lt;/p&gt;

&lt;p&gt;The fix was a dual model setup. A utility model (&lt;code&gt;gpt-5-nano&lt;/code&gt;) for cheap background tasks, and the main model only for actual chat. After the switch, latency dropped from 22 seconds to under 2. While testing the fix I noticed OpenAI had released &lt;code&gt;gpt-5.4-mini&lt;/code&gt; and &lt;code&gt;gpt-5.4-nano&lt;/code&gt; in March 2026, so I bumped both models while I was in there. 3x faster, same quality.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;latency_ms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4527&lt;/span&gt;  &lt;span class="c1"&gt;# gpt-5-mini&lt;/span&gt;
&lt;span class="na"&gt;latency_ms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1418&lt;/span&gt;  &lt;span class="c1"&gt;# gpt-5.4-mini&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The background task lives in &lt;code&gt;summarizer.py&lt;/code&gt; and uses that utility model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_conversation_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conversation_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Background task to generate and set a title for a newly created conversation.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;conv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conversation_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;conv&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;
            &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generate_conversation_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
                &lt;span class="n"&gt;conv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;
                &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Title updated for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;conversation_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Title generation failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two lessons that burned me:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Background tasks need their own DB session.&lt;/strong&gt; You can't pass the request session in (it gets closed before the task runs). Always create a fresh &lt;code&gt;Session(engine)&lt;/code&gt; inside the background function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;@handle_openai_errors&lt;/code&gt; cannot be used on background tasks.&lt;/strong&gt; The decorator wraps exceptions into HTTP responses, which makes no sense in a fire-and-forget context. Plain &lt;code&gt;try/except&lt;/code&gt; is the right pattern.&lt;/p&gt;




&lt;h2&gt;
  
  
  PR 15 — Streaming (SSE)
&lt;/h2&gt;

&lt;p&gt;This one took the most time.&lt;/p&gt;

&lt;p&gt;The goal was to replace the blocking endpoint (wait for the full response, return it) with a streaming one. Tokens arrive at the client as they're generated, using Server-Sent Events.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Service Layer
&lt;/h3&gt;

&lt;p&gt;The streaming function is an async generator. This is where the first real problem appeared.&lt;/p&gt;

&lt;p&gt;I tried to decorate it with &lt;code&gt;@handle_openai_errors&lt;/code&gt; like everything else:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@handle_openai_errors&lt;/span&gt;  &lt;span class="c1"&gt;# THIS BREAKS IT
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_chat_completion_stream&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The decorator wraps the function with &lt;code&gt;return await func(...)&lt;/code&gt;. But &lt;code&gt;func&lt;/code&gt; is an async generator — you can't &lt;code&gt;await&lt;/code&gt; a generator. It returns a generator object, not a coroutine. The error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;object async_generator can't be used in 'await' expression
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix: remove the decorator entirely and handle errors inline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_chat_completion_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Streams a chat completion response from the OpenAI API.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openai_model&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;stream_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;include_usage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="n"&gt;max_completion_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openai_max_completion_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="nf"&gt;except &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;APIError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;APITimeoutError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;wait_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;OpenAIServiceException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI stream error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;502&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;error_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_STREAM_ERROR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the &lt;code&gt;stream_options={"include_usage": True}&lt;/code&gt;. This tells OpenAI to include token usage in the final chunk, so you don't need tiktoken on the streaming path.&lt;/p&gt;

&lt;p&gt;The final chunk has &lt;code&gt;chunk.choices == []&lt;/code&gt; and &lt;code&gt;chunk.usage&lt;/code&gt; populated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;get_chat_completion_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trimmed_messages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="n"&gt;full_content&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Controller
&lt;/h3&gt;

&lt;p&gt;The controller returns a &lt;code&gt;StreamingResponse&lt;/code&gt; wrapping an async generator. DB writes happen after the stream completes (never during).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_generator&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;full_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;get_chat_completion_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trimmed_messages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
                &lt;span class="n"&gt;full_content&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_tokens&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Stream done — now persist
&lt;/span&gt;        &lt;span class="n"&gt;latency_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
        &lt;span class="n"&gt;new_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;conversation_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ai_response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;full_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ai_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openai_model&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;latency_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;latency_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: [DONE]&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mid-stream failure: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Stream interrupted&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;stream_generator&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;time.perf_counter()&lt;/code&gt; for latency, not &lt;code&gt;time.time()&lt;/code&gt;. It's higher resolution and not affected by system clock changes.&lt;/p&gt;

&lt;p&gt;To verify streaming is actually working (not just dumping everything at once):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-N&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8000/chat/ &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"user_id": "your-uuid", "user_message": "Tell me a short story"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-N&lt;/code&gt; flag disables buffering. If it works, you'll see chunks appear one by one in the terminal.&lt;/p&gt;




&lt;h2&gt;
  
  
  PR 16 — Composable Prompt System
&lt;/h2&gt;

&lt;p&gt;This is the one I'm most proud of.&lt;/p&gt;

&lt;p&gt;The system prompt was hardcoded in config. One string. Not flexible, not maintainable, and impossible to tune without touching code. I wanted something I could compose and experiment with.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Architecture
&lt;/h3&gt;

&lt;p&gt;The idea: YAML as a control layer, markdown files as the actual content. Each prompt is assembled at request time by layering components in a fixed order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;base → core → styles → rules → intensity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The folder structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;prompts/
├── prompts.yaml          &lt;span class="c"&gt;# which components each prompt uses&lt;/span&gt;
├── base/                 &lt;span class="c"&gt;# the foundation (default, concise, tutor...)&lt;/span&gt;
├── core/                 &lt;span class="c"&gt;# identity and persona&lt;/span&gt;
├── styles/               &lt;span class="c"&gt;# tone modifiers (casual, formal, sarcastic)&lt;/span&gt;
├── rules/                &lt;span class="c"&gt;# behavioral constraints (communication, factuality...)&lt;/span&gt;
└── intensity/            &lt;span class="c"&gt;# tone calibration (low, medium, high)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The YAML defines each named prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;stoic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;base&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;base/default.md&lt;/span&gt;
  &lt;span class="na"&gt;core&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;identity&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;persona&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;communication&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;factuality&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;intensity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;medium&lt;/span&gt;

&lt;span class="na"&gt;summarizer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;base&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;base/summarizer.md&lt;/span&gt;
  &lt;span class="na"&gt;core&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;identity&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;suppression&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;factuality&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;intensity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;high&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The PromptLoader
&lt;/h3&gt;

&lt;p&gt;The loader reads everything at startup and caches it in memory. No file I/O on every request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PromptLoader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__file__&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_path&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompts.yaml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;yaml&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;safe_load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_preload_files&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_preload_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;folder&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;core&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rules&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;styles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;intensity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;dir_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_path&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;folder&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;dir_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dir_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stem&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;intensity_override&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cfg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="n"&gt;base_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;stem&lt;/span&gt;
        &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;base/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;base_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;section&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;core&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;styles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rules&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
                &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="n"&gt;intensity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intensity_override&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;intensity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;intensity/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;intensity&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;**kwargs&lt;/code&gt; handles variable injection. The summarizer prompt has &lt;code&gt;{input}&lt;/code&gt; and &lt;code&gt;{existing_summary}&lt;/code&gt; placeholders:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summarizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;new_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;existing_summary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No previous summary.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One thing to watch: any &lt;code&gt;{&lt;/code&gt; or &lt;code&gt;}&lt;/code&gt; in a markdown prompt file that isn't a variable placeholder will break &lt;code&gt;prompt.format(**kwargs)&lt;/code&gt; with a &lt;code&gt;KeyError&lt;/code&gt;. Avoid curly braces in prompt content unless they're intentional variables.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Persona
&lt;/h3&gt;

&lt;p&gt;The identity prompt is the philosophical foundation of the whole thing. This is what I actually care about:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The assistant exists to sharpen thinking, not replace it.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;It does not comfort. It does not flatter. It does not fill silence with noise.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;When a question is asked, it answers. When a problem is presented, it cuts to what matters. When thinking is lazy or circular, it names that — once, directly, without judgment.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;The goal is not to be helpful in the way a tool is helpful. The goal is to leave the person thinking more clearly than they arrived.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The chat endpoint now accepts a &lt;code&gt;prompt_key&lt;/code&gt; field. Omit it and it defaults to &lt;code&gt;"stoic"&lt;/code&gt;. Every conversation persists the prompt key so it stays consistent across messages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user_message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"What is RAG?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tutor"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The prompt library lives outside &lt;code&gt;app/&lt;/code&gt; as a standalone content layer. It's not application code — it's configuration. Same reasoning as why tests live outside &lt;code&gt;app/&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  PR 17 — Post-Feature Cleanup
&lt;/h2&gt;

&lt;p&gt;After three feature PRs, the codebase had accumulated some debt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;chat_controller&lt;/code&gt; was still in the file (fully functional, no longer routed anywhere)&lt;/li&gt;
&lt;li&gt;Three separate &lt;code&gt;PromptLoader()&lt;/code&gt; instances across different files&lt;/li&gt;
&lt;li&gt;Missing docstrings on &lt;code&gt;PromptLoader&lt;/code&gt;, &lt;code&gt;build()&lt;/code&gt;, &lt;code&gt;stream_generator()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;No success logging on the streaming path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The singleton fix: export one shared instance from &lt;code&gt;prompt_loader.py&lt;/code&gt; and import it everywhere.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# prompt_loader.py — bottom of file
&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PromptLoader&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# everywhere else
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;.prompt_loader&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same pattern as &lt;code&gt;client = AsyncOpenAI(...)&lt;/code&gt; in the OpenAI service. One instance, loaded once at startup.&lt;/p&gt;

&lt;p&gt;Cleanup PRs don't feel exciting but they're the difference between a codebase you're proud of and one you're embarrassed to show.&lt;/p&gt;




&lt;h2&gt;
  
  
  PR 18 — Containerization
&lt;/h2&gt;

&lt;p&gt;Dockerfile is deliberately minimal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.11-slim&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; pyproject.toml .&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; uv.lock .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;uv
&lt;span class="k"&gt;RUN &lt;/span&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--frozen&lt;/span&gt; &lt;span class="nt"&gt;--no-dev&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8000&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; [".venv/bin/uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;uv sync --frozen --no-dev&lt;/code&gt; is the important part — reproducible install, no dev dependencies in the image.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;docker-compose.yml&lt;/code&gt; spins up the API and a PostgreSQL 15 container, with a health check on the DB before the API starts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without &lt;code&gt;condition: service_healthy&lt;/code&gt;, the API container starts before Postgres is ready and crashes. Learned this the first time.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;.dockerignore&lt;/code&gt; uses a whitelist approach (deny everything, explicitly allow what's needed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;*

!&lt;span class="n"&gt;pyproject&lt;/span&gt;.&lt;span class="n"&gt;toml&lt;/span&gt;
!&lt;span class="n"&gt;uv&lt;/span&gt;.&lt;span class="n"&gt;lock&lt;/span&gt;
!&lt;span class="n"&gt;main&lt;/span&gt;.&lt;span class="n"&gt;py&lt;/span&gt;
!&lt;span class="n"&gt;app&lt;/span&gt;/
!&lt;span class="n"&gt;prompts&lt;/span&gt;/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;!prompts/&lt;/code&gt; line is easy to miss. The prompt library lives outside &lt;code&gt;app/&lt;/code&gt;, so without it the container starts with an empty cache and every &lt;code&gt;loader.build()&lt;/code&gt; call returns an empty string. No crash, no error, just silently wrong behavior.&lt;/p&gt;

&lt;p&gt;All Docker commands are wired into taskipy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;task build    &lt;span class="c"&gt;# docker compose up --build -d&lt;/span&gt;
task start    &lt;span class="c"&gt;# docker compose up -d&lt;/span&gt;
task stop     &lt;span class="c"&gt;# docker compose down&lt;/span&gt;
task logs     &lt;span class="c"&gt;# docker compose logs -f&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Phase 2 is complete. The API streams responses, has a composable identity, runs in Docker, and manages context intelligently.&lt;/p&gt;

&lt;p&gt;Phase 3 starts with a Telegram bot as a real frontend (no more Swagger UI demos). It'll live in a separate repo, store a per-user OpenAI key, and talk to this API over HTTP. After that: testing, then RAG.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The repo is public: &lt;a href="https://github.com/chris-kechagias/simple-chatbot-api" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the journey: &lt;a href="https://www.linkedin.com/in/chkechagias" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://medium.com/@ck.chris.kechagias" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built PR by PR. Mistakes included.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>fastapi</category>
      <category>openai</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Building a Chatbot API From Scratch: 13 PRs, a Lot of Broken Things, and a Context Window That Actually Works</title>
      <dc:creator>Chris Kechagias</dc:creator>
      <pubDate>Sat, 28 Mar 2026 12:12:22 +0000</pubDate>
      <link>https://dev.to/kris_k/building-a-chatbot-api-from-scratch-13-prs-a-lot-of-broken-things-and-a-context-window-that-3i73</link>
      <guid>https://dev.to/kris_k/building-a-chatbot-api-from-scratch-13-prs-a-lot-of-broken-things-and-a-context-window-that-3i73</guid>
      <description>&lt;p&gt;Part 3 of building a retail inventory API and then giving it a brain.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/kris_k/why-i-rebuilt-my-first-api-from-scratch-p63"&gt;Part 1&lt;/a&gt; I explained why I archived my first API and started over. In &lt;a href="https://dev.to/kris_k/restructuring-a-fastapi-project-migrating-to-supabase-and-hitting-97-test-coverage-59eb"&gt;Part 2&lt;/a&gt; I restructured it properly, migrated to Supabase, and got to 97% test coverage.&lt;/p&gt;

&lt;p&gt;The retail API is solid now. Working in production. Tests passing. Architecture I can explain.&lt;/p&gt;

&lt;p&gt;So I started the next thing: a chatbot API. Same stack, new layer. The goal: a conversational AI service that remembers what you said, manages long conversations intelligently, and eventually connects to the retail inventory data.&lt;/p&gt;

&lt;p&gt;This is what the last few weeks looked like.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Build the API at All
&lt;/h2&gt;

&lt;p&gt;I could use a ready-made chatbot SDK. Drop in a library, wrap the OpenAI call, done in an afternoon.&lt;/p&gt;

&lt;p&gt;The problem is the same one I had with the retail API the first time. I could make it work without understanding any of it.&lt;/p&gt;

&lt;p&gt;I wanted to know what happens when a conversation gets too long for the context window. How tokens get counted. Why the response structure changed between model versions. What "streaming" actually means at the transport layer.&lt;/p&gt;

&lt;p&gt;The only way to learn that is to build it yourself and break it repeatedly.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack and the Plan
&lt;/h2&gt;

&lt;p&gt;Same core as the retail API: FastAPI, SQLModel, PostgreSQL. Add OpenAI's Python SDK.&lt;/p&gt;

&lt;p&gt;Three-layer architecture: routers take requests, controllers handle logic, services talk to external APIs. Models define the data shapes. Everything lives under &lt;code&gt;app/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The plan was a roadmap of PRs, one feature at a time. No commits directly to main. Every change reviewed before merge.&lt;/p&gt;

&lt;p&gt;I had a senior dev reviewing. Same as the retail API. He reviews my PRs, asks hard questions, and doesn't let things slide.&lt;/p&gt;




&lt;h2&gt;
  
  
  The First Six PRs Were Fast
&lt;/h2&gt;

&lt;p&gt;Scaffold, config, models, logging, error handling, health endpoint. These were mechanical. I'd done the patterns before on the retail API. Took a few days.&lt;/p&gt;

&lt;p&gt;The models are worth mentioning because they drove everything else.&lt;/p&gt;

&lt;p&gt;Two tables: &lt;code&gt;Conversation&lt;/code&gt; (user_id, title, timestamps) and &lt;code&gt;Message&lt;/code&gt; (conversation_id, user_message, ai_response, model used, tokens consumed, latency). Every API call gets stored. I wanted to track exactly what the model said, which version said it, how many tokens it used, and how long it took.&lt;/p&gt;

&lt;p&gt;This felt like overkill at the time. It wasn't. That data saved me multiple times during debugging.&lt;/p&gt;




&lt;h2&gt;
  
  
  PR 7: First Working Chat Endpoint
&lt;/h2&gt;

&lt;p&gt;This is where it got real.&lt;/p&gt;

&lt;p&gt;The chat endpoint needed to do a few things at once: create or continue a conversation, load the message history, build the right payload for OpenAI, call the API, store the result, return a clean response.&lt;/p&gt;

&lt;p&gt;I wrote a single &lt;code&gt;chat_controller&lt;/code&gt; function that handles both new and existing conversations. If no &lt;code&gt;conversation_id&lt;/code&gt; is in the request, create one. If there is one, fetch the history and continue.&lt;/p&gt;

&lt;p&gt;The controller builds the messages array like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openai_system_prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ai_response&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple. Obvious in hindsight.&lt;/p&gt;

&lt;p&gt;The decorator for OpenAI error handling was more interesting. A &lt;code&gt;@handle_openai_errors&lt;/code&gt; wrapper catches &lt;code&gt;APITimeoutError&lt;/code&gt;, &lt;code&gt;APIError&lt;/code&gt;, and generic exceptions, and converts them into clean HTTP responses with consistent error codes.&lt;/p&gt;

&lt;p&gt;The controller doesn't need to know how OpenAI fails( it just calls the service ).&lt;/p&gt;




&lt;h2&gt;
  
  
  PR 8: The Project Structure Refactor
&lt;/h2&gt;

&lt;p&gt;The retail API taught me about &lt;code&gt;app/&lt;/code&gt; structure. I used it from the start here. But after seven PRs, I had a problem: config, database, and logging were scattered.&lt;/p&gt;

&lt;p&gt;PR 8 moved everything infrastructure-related into &lt;code&gt;app/core/&lt;/code&gt;. Config lives there. Database engine and session dependency live there. Logger setup lives there. Custom exceptions live there under &lt;code&gt;app/core/errors/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This sounds like housekeeping. It was. But now when I onboard someone, I can say: "The business logic is in &lt;code&gt;controllers/&lt;/code&gt;. The infrastructure is in &lt;code&gt;core/&lt;/code&gt;. The data shapes are in &lt;code&gt;models/&lt;/code&gt;. Nothing bleeds between them."&lt;/p&gt;

&lt;p&gt;That's worth a whole PR.&lt;/p&gt;




&lt;h2&gt;
  
  
  PR 11: The Debug Session From Hell
&lt;/h2&gt;

&lt;p&gt;I had a working structure. Clean architecture. Good patterns.&lt;/p&gt;

&lt;p&gt;Then I ran the API for the first time locally and nothing worked.&lt;/p&gt;

&lt;p&gt;The first error was a datetime serialization crash. &lt;code&gt;JSONResponse&lt;/code&gt; couldn't serialize a Python &lt;code&gt;datetime&lt;/code&gt; object. The fix was one method call — &lt;code&gt;model_dump(mode="json")&lt;/code&gt; instead of &lt;code&gt;model_dump()&lt;/code&gt;. Mode &lt;code&gt;"json"&lt;/code&gt; converts datetimes and UUIDs to strings. Mode &lt;code&gt;""&lt;/code&gt; leaves them as Python objects. I didn't know that distinction existed.&lt;/p&gt;

&lt;p&gt;The second error was the OpenAI API key not being found. I'd configured it with &lt;code&gt;pydantic-settings&lt;/code&gt;, which reads &lt;code&gt;.env&lt;/code&gt; files into a &lt;code&gt;Settings&lt;/code&gt; object. What I didn't know: &lt;code&gt;pydantic-settings&lt;/code&gt; populates the &lt;code&gt;Settings&lt;/code&gt; object, but it doesn't write to &lt;code&gt;os.environ&lt;/code&gt;. The OpenAI library reads &lt;code&gt;os.environ&lt;/code&gt;. So the key existed in &lt;code&gt;config.openai_api_key&lt;/code&gt; but the OpenAI client couldn't see it.&lt;/p&gt;

&lt;p&gt;Fix: explicitly pass the key at client initialization.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openai_api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'd never had to think about this before because I'd never mixed a settings manager with a third-party SDK that reads environment variables directly.&lt;/p&gt;

&lt;p&gt;Third error: &lt;code&gt;max_tokens&lt;/code&gt; is deprecated for GPT-5 models. Use &lt;code&gt;max_completion_tokens&lt;/code&gt;. Got a 400.&lt;/p&gt;

&lt;p&gt;Fourth error: &lt;code&gt;temperature&lt;/code&gt; isn't supported by GPT-5 mini. Another 400.&lt;/p&gt;

&lt;p&gt;Fifth error: responses came back empty. The conversation history was being sent in reverse order ( newest messages first ). The model received the conversation backwards and returned nothing useful. Changed &lt;code&gt;order_by(desc())&lt;/code&gt; to &lt;code&gt;order_by(asc())&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Sixth error: message history serialized as &lt;code&gt;{}&lt;/code&gt; in the response. SQLModel's &lt;code&gt;table=True&lt;/code&gt; models loaded from the database don't serialize cleanly through Pydantic v2 when using &lt;code&gt;sa_column&lt;/code&gt;. The fix: &lt;code&gt;Message.model_validate(msg)&lt;/code&gt; on each history item before returning.&lt;/p&gt;

&lt;p&gt;Six separate bugs in one session. Each one taught me something I couldn't have read in a tutorial, because tutorials don't show you what breaks when you combine five technologies at once.&lt;/p&gt;

&lt;p&gt;The API worked at the end of the day.&lt;/p&gt;




&lt;h2&gt;
  
  
  Context Window: The Feature I Underestimated
&lt;/h2&gt;

&lt;p&gt;GPT-5 mini has a 128k token context window. Sounds like enough. It isn't a reason to be lazy.&lt;/p&gt;

&lt;p&gt;Sending the entire conversation history on every request is wasteful. At scale it's expensive. And for a long conversation, the model spends time processing messages from hours ago that aren't relevant anymore.&lt;/p&gt;

&lt;p&gt;The initial implementation was message-count based: keep the last 15 messages. Simple, configurable via env var.&lt;/p&gt;

&lt;p&gt;But message count and token count aren't the same thing. Ten short messages is not the same as ten essay-length responses.&lt;/p&gt;

&lt;p&gt;After seeing GPT's assessment of the code, I moved to proper token-based trimming using &lt;code&gt;tiktoken&lt;/code&gt;. The trimmer preserves the system prompt and the latest user message always, those are non-negotiable. Then it walks backwards through history, removing the oldest pairs of messages until the total token count fits within the limit.&lt;/p&gt;

&lt;p&gt;When messages get evicted, they don't disappear. A background task runs after the main response is returned. It takes the evicted messages and calls a cheap utility model to update a rolling summary, which gets stored on the &lt;code&gt;Conversation&lt;/code&gt; record and injected back into context on the next request.&lt;/p&gt;

&lt;p&gt;The result: conversations can run indefinitely without the model losing the thread of what was discussed early on.&lt;/p&gt;

&lt;p&gt;This is the kind of problem that seems solved by "just use a big context window" until you start thinking about cost, latency, and what actually matters in a conversation.&lt;/p&gt;




&lt;h2&gt;
  
  
  PR 12: CRUD Endpoints and a Database Lesson
&lt;/h2&gt;

&lt;p&gt;PATCH to update a conversation title. DELETE to remove a conversation entirely.&lt;/p&gt;

&lt;p&gt;The DELETE looked simple. Fetch the conversation, delete the messages, delete the conversation, commit. Four lines.&lt;/p&gt;

&lt;p&gt;It crashed with a foreign key violation. PostgreSQL won't let you delete a &lt;code&gt;conversation&lt;/code&gt; record if &lt;code&gt;message&lt;/code&gt; records still reference it, even if you've already queued those messages for deletion in the same session. SQLAlchemy batches the deletes and executes them in the wrong order.&lt;/p&gt;

&lt;p&gt;The fix is one line: &lt;code&gt;db.flush()&lt;/code&gt; after deleting the messages. This forces the message deletions to hit the database before the conversation deletion is issued.&lt;/p&gt;

&lt;p&gt;I knew foreign keys existed. I didn't know SQLAlchemy's unit of work could silently reorder operations in a way that breaks FK constraints. Now I do.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Code Looks Like Now
&lt;/h2&gt;

&lt;p&gt;The API has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;POST &lt;code&gt;/chat/&lt;/code&gt; — new conversation&lt;/li&gt;
&lt;li&gt;POST &lt;code&gt;/chat/{conversation_id}&lt;/code&gt; — continue existing conversation&lt;/li&gt;
&lt;li&gt;GET &lt;code&gt;/chat/{conversation_id}&lt;/code&gt; — full message history&lt;/li&gt;
&lt;li&gt;GET &lt;code&gt;/chat/conversations/{user_id}&lt;/code&gt; — list user's conversations&lt;/li&gt;
&lt;li&gt;PATCH &lt;code&gt;/chat/{conversation_id}/title&lt;/code&gt; — rename a conversation&lt;/li&gt;
&lt;li&gt;DELETE &lt;code&gt;/chat/{conversation_id}&lt;/code&gt; — delete conversation and all messages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Token-based context trimming. Rolling conversation summary. Exponential backoff retry for empty responses. Dual model setup — expensive model for chat, cheap model for background tasks like summarization.&lt;/p&gt;

&lt;p&gt;The architecture is clean enough that adding streaming later won't require rewriting the controller ( just changing how the service returns its response ).&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned That I Didn't Expect to Learn
&lt;/h2&gt;

&lt;p&gt;OpenAI changes their API more than I expected. Parameters get deprecated between minor versions. A field that works on GPT-4o doesn't work on GPT-5 mini. If you don't read the changelog you get 400 errors with no obvious explanation.&lt;/p&gt;

&lt;p&gt;Background tasks in FastAPI are useful but need their own database sessions. The main request session might be closed by the time the background task runs. Passing the engine and creating a new session inside the task is the right pattern.&lt;/p&gt;

&lt;p&gt;TypeVar matters for decorator type safety. &lt;code&gt;@handle_openai_errors&lt;/code&gt; originally had &lt;code&gt;Any&lt;/code&gt; as the return type, which silently disabled type checking on every wrapped function. Fixing it to &lt;code&gt;Callable[..., Awaitable[T]]&lt;/code&gt; took ten minutes and caught nothing immediately, but it makes the codebase defensible.&lt;/p&gt;

&lt;p&gt;PR descriptions are not commit messages. A commit says what changed. A PR description explains what you changed, why you changed it, and what wasn't obvious. I'm still getting this wrong, still getting feedback from the senior dev on it, still improving.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where It Goes From Here
&lt;/h2&gt;

&lt;p&gt;Auto-title generation: the model names the conversation from the first message instead of truncating it to 50 characters.&lt;/p&gt;

&lt;p&gt;Streaming responses: instead of waiting for the full reply, tokens arrive as they're generated.&lt;/p&gt;

&lt;p&gt;Prompt loader: load system prompts from a file, select them via config. This is where the behavioral prompt work I've been doing separately gets wired in.&lt;/p&gt;

&lt;p&gt;After that: Docker, tests, then the RAG layer that connects this chatbot to the retail inventory data.&lt;/p&gt;

&lt;p&gt;The retail API is the foundation. This is the layer that makes it conversational. The third layer will make it actually know something about the domain.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Transitioning from retail operations to AI engineering. Building a fashion-focused retail API with a chatbot layer on top. &lt;br&gt;
Follow the journey: &lt;a href="https://github.com/chris-kechagias/simple-chatbot-api" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/chkechagias" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>fastapi</category>
      <category>openai</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Restructuring a FastAPI Project, Migrating to Supabase, and Hitting 97% Test Coverage</title>
      <dc:creator>Chris Kechagias</dc:creator>
      <pubDate>Tue, 17 Mar 2026 07:15:59 +0000</pubDate>
      <link>https://dev.to/kris_k/restructuring-a-fastapi-project-migrating-to-supabase-and-hitting-97-test-coverage-59eb</link>
      <guid>https://dev.to/kris_k/restructuring-a-fastapi-project-migrating-to-supabase-and-hitting-97-test-coverage-59eb</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 2 of building a retail inventory API from scratch.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;In &lt;a href="https://dev.to/kris_k/why-i-rebuilt-my-first-api-from-scratch-p63"&gt;Part 1&lt;/a&gt;, I explained why I archived my first API and started over. I ended that post with a confession: the new version still had a flat structure and no tests. Not portfolio material yet.&lt;/p&gt;

&lt;p&gt;This is what happened next.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Flat Structure Problem
&lt;/h2&gt;

&lt;p&gt;The v2 codebase worked, but everything lived at the root level. &lt;code&gt;products.py&lt;/code&gt;, &lt;code&gt;variants.py&lt;/code&gt;, &lt;code&gt;analytics.py&lt;/code&gt;, &lt;code&gt;database.py&lt;/code&gt; — all siblings, all imports pointing everywhere. When I added a new router, I had to touch three files. When something broke, I had to search across the whole project.&lt;/p&gt;

&lt;p&gt;The fix: move everything into an &lt;code&gt;app/&lt;/code&gt; package with proper separation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;app/
├── __init__.py
├── config.py
├── database.py
├── controllers/
├── models/
├── routers/
├── middleware/
└── utils/
    └── errors/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each folder has a clear responsibility. &lt;code&gt;controllers/&lt;/code&gt; is business logic. &lt;code&gt;routers/&lt;/code&gt; is just route definitions. &lt;code&gt;models/&lt;/code&gt; is data shapes. &lt;code&gt;middleware/&lt;/code&gt; handles logging and exception handling. &lt;code&gt;utils/errors/&lt;/code&gt; defines custom exceptions.&lt;/p&gt;

&lt;p&gt;This sounds obvious. It wasn't to me three months ago.&lt;/p&gt;




&lt;h2&gt;
  
  
  The &lt;code&gt;__init__.py&lt;/code&gt; Pattern
&lt;/h2&gt;

&lt;p&gt;Every folder has an &lt;code&gt;__init__.py&lt;/code&gt; that re-exports everything inside it. The goal: any file in the project should be able to import from a single clean path instead of hunting through nested modules.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/controllers/__init__.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;.products&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;create_product_controller&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;create_product_controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;delete_product_controller&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;delete_product_controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;get_product_controller&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;get_product_controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;get_products_controller&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;get_products_controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;update_product_controller&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;update_product_controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;as X&lt;/code&gt; syntax is not redundant — it tells the linter (ruff) that these re-exports are intentional, not unused imports. Without it, CI fails.&lt;/p&gt;

&lt;p&gt;Now a router just does:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;..controllers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_product_controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_product_controller&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean. One source of truth per namespace.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dev Tooling: uv + Taskipy + Ruff
&lt;/h2&gt;

&lt;p&gt;Three tools that changed how I work:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;uv&lt;/strong&gt; — package manager. Replaces pip. Dramatically faster, lock file included, virtual env handled automatically. No more &lt;code&gt;pip install -r requirements.txt&lt;/code&gt; and hoping for the best.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Taskipy&lt;/strong&gt; — task runner. Instead of remembering long commands, I define shortcuts in &lt;code&gt;pyproject.toml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[tool.taskipy.tasks]&lt;/span&gt;
&lt;span class="py"&gt;dev&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"uvicorn main:app --reload"&lt;/span&gt;
&lt;span class="py"&gt;build&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"docker compose up --build"&lt;/span&gt;
&lt;span class="py"&gt;lint&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ruff check ."&lt;/span&gt;
&lt;span class="py"&gt;fix&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ruff check . --fix"&lt;/span&gt;
&lt;span class="py"&gt;test&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"pytest -v"&lt;/span&gt;
&lt;span class="py"&gt;dev_test&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ruff check . &amp;amp;&amp;amp; pytest -v"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;task dev_test&lt;/code&gt; runs lint then tests in one shot. &lt;code&gt;task fix&lt;/code&gt; auto-corrects most ruff violations. Small thing, big quality of life improvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ruff&lt;/strong&gt; — linter and formatter. Replaces flake8, isort, and black in one tool. Fast, opinionated, catches real problems. Made me write better imports and stop accumulating dead code.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bug Safari
&lt;/h2&gt;

&lt;p&gt;Restructuring is not just moving files. Every import path breaks. Here's what I found:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reserved logging keys.&lt;/strong&gt; I had this in a controller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Product created&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This caused a 500 error. &lt;code&gt;name&lt;/code&gt; is a reserved key in Python's &lt;code&gt;LogRecord&lt;/code&gt; — you can't use it in &lt;code&gt;extra&lt;/code&gt;. Renamed to &lt;code&gt;product_name&lt;/code&gt;, fixed. The error only appeared at runtime, not at import time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Route ordering conflict.&lt;/strong&gt; &lt;code&gt;GET /products/total_value&lt;/code&gt; kept returning 422. The analytics router was registered &lt;em&gt;after&lt;/em&gt; the products router in &lt;code&gt;main.py&lt;/code&gt;. FastAPI matched &lt;code&gt;/total_value&lt;/code&gt; as a product ID (a string, hence 422). Moved analytics router first. Fixed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing &lt;code&gt;product_id&lt;/code&gt; in variant creation.&lt;/strong&gt; &lt;code&gt;ProductVariant.model_validate(variant)&lt;/code&gt; failed validation because &lt;code&gt;product_id&lt;/code&gt; was required but not in the incoming request body — it comes from the URL path. The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;db_variant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ProductVariant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pydantic v2's &lt;code&gt;update&lt;/code&gt; parameter merges extra fields into the model during validation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Migrating to Supabase
&lt;/h2&gt;

&lt;p&gt;The original setup used a local PostgreSQL container via Docker Compose. Fine for development, annoying for deployment. Supabase gives you a managed PostgreSQL instance with a connection pooler.&lt;/p&gt;

&lt;p&gt;The config change was minimal — just different environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Local Docker
DB_HOST=localhost
DB_PORT=5432

# Supabase (Transaction pooler)
DB_HOST=aws-0-eu-central-1.pooler.supabase.com
DB_PORT=6543
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One gotcha: Supabase has two pooler types. &lt;strong&gt;Session pooler&lt;/strong&gt; keeps a persistent connection per client. &lt;strong&gt;Transaction pooler&lt;/strong&gt; reuses connections across requests — better for serverless and low-traffic APIs on free tier. I use the Transaction pooler.&lt;/p&gt;

&lt;p&gt;The connection string is assembled in &lt;code&gt;database.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_username&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_password&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;@&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_host&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_port&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;connect_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sslmode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prefer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your password contains special characters (&lt;code&gt;@&lt;/code&gt;, &lt;code&gt;#&lt;/code&gt;, &lt;code&gt;!&lt;/code&gt;), they'll break the URL. Use &lt;code&gt;urllib.parse.quote_plus(password)&lt;/code&gt; to encode it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Writing Tests
&lt;/h2&gt;

&lt;p&gt;This was the part I'd been avoiding. Not because I didn't want tests — because I didn't know how to write them without hitting the real database.&lt;/p&gt;

&lt;p&gt;The solution: SQLite in-memory + FastAPI's dependency override pattern.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tests/conftest.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;contextlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asynccontextmanager&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.testclient&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TestClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlmodel&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SQLModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlmodel.pool&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StaticPool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;app.database&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_session&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite://&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;connect_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;check_same_thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;poolclass&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;StaticPool&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@asynccontextmanager&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lifespan_override&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt;

&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;session_fixture&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;SQLModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;SQLModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;
    &lt;span class="n"&gt;SQLModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;client&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;client_fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dependency_overrides&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_session&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lifespan_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lifespan_override&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;TestClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dependency_overrides&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things happening here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;SQLite in-memory replaces PostgreSQL. No cloud DB needed, no network, instant setup.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_session&lt;/code&gt; dependency is overridden — every request gets the test session, not a real DB connection.&lt;/li&gt;
&lt;li&gt;The lifespan is overridden with a no-op — prevents the app from trying to connect to Supabase on test startup.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each test gets a fresh database. No shared state, no cleanup between tests.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Tests Found
&lt;/h2&gt;

&lt;p&gt;This is the part worth highlighting.&lt;/p&gt;

&lt;p&gt;While writing tests for the variant endpoints, I noticed the &lt;code&gt;PATCH&lt;/code&gt; and &lt;code&gt;DELETE&lt;/code&gt; routes didn't validate product existence before operating on variants:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before: missing product check
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_product_variant_router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...):&lt;/span&gt;
    &lt;span class="n"&gt;variant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;update_product_variant_controller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;variant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ProductVariantNotFoundException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;variant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;variant&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You could &lt;code&gt;PATCH /products/9999/variants/1&lt;/code&gt; and it would happily update the variant — even if product 9999 didn't exist. Same for DELETE. Inconsistent behavior, not caught until tests tried to assert &lt;code&gt;PRODUCT_NOT_FOUND&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The fix was two lines per route:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_product_controller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ProductNotFoundException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is exactly what tests are for. Not just confirming the happy path — stress-testing assumptions.&lt;/p&gt;

&lt;p&gt;Final result: 29 tests, 97% coverage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TOTAL    412    11    97%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 3% uncovered is intentional: database error branches that only trigger on real DB failures, and the Supabase connection code skipped by the lifespan override.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where It Is Now
&lt;/h2&gt;

&lt;p&gt;The API is live on Render, connected to Supabase, monitored by UptimeRobot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live docs:&lt;/strong&gt; &lt;a href="https://retail-inventory-api-yati.onrender.com/docs" rel="noopener noreferrer"&gt;retail-inventory-api-yati.onrender.com/docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What's still missing: integration tests against the live deployment, an analytics expansion, and a repository pattern refactor I've been putting off.&lt;/p&gt;

&lt;p&gt;What's next: a chatbot API. This retail API becomes the data layer. The chatbot queries it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Transitioning from retail operations to AI engineering. Follow the journey:&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://github.com/chris-kechagias" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/chris-kechagias" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>fastapi</category>
      <category>testing</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Why I Rebuilt My First API From Scratch</title>
      <dc:creator>Chris Kechagias</dc:creator>
      <pubDate>Sat, 07 Mar 2026 12:59:28 +0000</pubDate>
      <link>https://dev.to/kris_k/why-i-rebuilt-my-first-api-from-scratch-p63</link>
      <guid>https://dev.to/kris_k/why-i-rebuilt-my-first-api-from-scratch-p63</guid>
      <description>&lt;p&gt;I  built my first API two months ago. It worked. I deployed it. Users could hit endpoints, get responses, everything functioned. By most measures, it was a success.&lt;/p&gt;

&lt;p&gt;Then I looked at the code with someone who actually knew what they were doing, and realized: working isn't the same as professional.&lt;/p&gt;

&lt;p&gt;So I archived it and started over.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Version
&lt;/h2&gt;

&lt;p&gt;My original retail inventory API did what it was supposed to do. CRUD operations, PostgreSQL database, running in production on Render. I followed tutorials, copied patterns I found online, debugged until things worked.&lt;/p&gt;

&lt;p&gt;The problem wasn't that it failed. The problem was I couldn't explain &lt;em&gt;why&lt;/em&gt; it succeeded.&lt;/p&gt;

&lt;p&gt;When someone asked me to walk through the architecture, I stumbled. Why did I structure the database this way? "That's what the tutorial showed." What's your error handling strategy? I didn't have one—just try-catch blocks scattered everywhere.&lt;/p&gt;

&lt;p&gt;I could make it run, but I couldn't defend the design. That bothered me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting Over
&lt;/h2&gt;

&lt;p&gt;The easy path: refactor incrementally. Add tests here, clean up there, gradually improve. But the foundation was shaky. I'd built on patterns I didn't understand. Every new feature would compound that.&lt;/p&gt;

&lt;p&gt;Starting from scratch felt wasteful. Two months of work, gone. But those two months taught me what questions to ask. The second time, I wasn't following tutorials blindly—I was making actual decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Database Design&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Version one: single &lt;code&gt;products&lt;/code&gt; table with basic fields. Version two: proper product-variant relationship.&lt;/p&gt;

&lt;p&gt;Products have category (Tees, Sweaters, Pants), name, color, price, and collection ("FW24", "SS25"). Variants add size (S-M, L-XL, One Size) with individual &lt;code&gt;quantity&lt;/code&gt; and &lt;code&gt;in_stock&lt;/code&gt; tracking.&lt;/p&gt;

&lt;p&gt;This solves real fashion problems: when a customer wants Size L-XL specifically, you need to know if &lt;em&gt;that&lt;/em&gt; variant is in stock, not just whether the base product exists. Categories and sizes use Enums—can't create invalid entries. Field validators catch edge cases like empty strings.&lt;/p&gt;

&lt;p&gt;More complex than a single table, but it mirrors how actual retail inventory works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Everything used to live in &lt;code&gt;main.py&lt;/code&gt;. Routes, business logic, database queries, all mixed. Worked for five endpoints. Would've been unmaintainable at fifty.&lt;/p&gt;

&lt;p&gt;Now: separation of concerns. &lt;code&gt;main.py&lt;/code&gt; launches. Services handle logic. Routes define endpoints. Models define data. When I need to change product creation, I know exactly where to look.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error Handling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Old way: wrap things in try-catch when they broke, handle locally, hope nothing unexpected happens.&lt;/p&gt;

&lt;p&gt;New way: centralized error handler. One place catches exceptions, formats responses consistently, logs properly. Something goes wrong anywhere, it gets handled the same way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Development Workflow&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Biggest change wasn't the code—it was how I wrote it. Version one was commits to main whenever something worked. Version two uses pull requests for every feature. Every change gets reviewed before merge.&lt;/p&gt;

&lt;p&gt;Felt slow. It is slower. But "slower" means "catching issues before production" and "understanding why patterns matter."&lt;/p&gt;

&lt;h2&gt;
  
  
  What Code Review Actually Teaches You
&lt;/h2&gt;

&lt;p&gt;Having someone review my code exposed gaps I didn't know existed.&lt;/p&gt;

&lt;p&gt;I renamed functions but didn't understand what I was renaming. I was splitting routers from controllers mechanically but couldn't explain the pattern. When asked "are you sure you only need &lt;code&gt;.venv&lt;/code&gt; in dockerignore?" I realized I was thinking about exclusions wrong—should've been excluding everything and whitelisting what I need.&lt;/p&gt;

&lt;p&gt;I named a docker service &lt;code&gt;web&lt;/code&gt; generically. Got told: "You're &lt;code&gt;api&lt;/code&gt;. When you add a frontend later, that's &lt;code&gt;web&lt;/code&gt;." Right. I was thinking "web server." Should've been thinking "what does this service actually do."&lt;/p&gt;

&lt;p&gt;These weren't "change this line" comments. They were "here's why this matters" and "think about scaling." That's what tutorials don't teach.&lt;/p&gt;

&lt;p&gt;Some PRs got approved quickly. Others took three rounds. The ones that took longer taught me more.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Gaps I'm Still Filling
&lt;/h2&gt;

&lt;p&gt;Rebuilding didn't make me an expert. Made me aware of what I don't know.&lt;/p&gt;

&lt;p&gt;I implement CRUD operations but couldn't clearly explain in an interview what CRUD means and why it matters as a pattern. I use an ORM (SQLModel) but can't give a solid answer on when to use ORM versus raw SQL and what the tradeoffs are. I have a centralized error handler but had to look up why it's "error handler" not "exception handler" and what that distinction means.&lt;/p&gt;

&lt;p&gt;These are basic things. Interview questions. The fact that I'm building working software while having these gaps means I need to go back to fundamentals—not to learn how to code, but to learn how to articulate what I'm doing and why.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;Original API: archived, private. Not bad code—just a snapshot of where I was three months ago. Not portfolio material.&lt;/p&gt;

&lt;p&gt;New version isn't production-ready either, still flat, need tests and a proper folder-structure architecture. I'm honest about that. But it's built on patterns I can explain. When I add tests next week, the architecture supports it. When I build a chatbot layer next month, I'll have confidence in the foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  If You're In The Same Position
&lt;/h2&gt;

&lt;p&gt;If you built something that works but can't explain it, consider rebuilding. Not because the first version was wrong—because the second version will teach you things the first couldn't.&lt;/p&gt;

&lt;p&gt;You'll see patterns you missed. You'll understand why approaches matter. You'll find knowledge gaps you didn't know existed. When someone asks you to explain design decisions, you'll have answers deeper than "the tutorial told me to."&lt;/p&gt;

&lt;p&gt;First version proves you can make it work. Second version proves you understand why it works.&lt;/p&gt;

&lt;p&gt;That difference matters.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Transitioning from retail operations to AI engineering. Building a fashion-focused retail API with FastAPI + PostgreSQL. Next: chatbot layer, then RAG, then multi-agent systems. Follow the journey: &lt;a href="https://github.com/chris-kechagias/retail-inventory-api" rel="noopener noreferrer"&gt;github.com/chris-kechagias/retail-inventory-api&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>fastapi</category>
      <category>coding</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
