<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Louis Sanna</title>
    <description>The latest articles on DEV Community by Louis Sanna (@louis-sanna).</description>
    <link>https://dev.to/louis-sanna</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2442380%2Ff36ea6e2-423c-4c75-8785-0d785e5ca20d.jpeg</url>
      <title>DEV Community: Louis Sanna</title>
      <link>https://dev.to/louis-sanna</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/louis-sanna"/>
    <language>en</language>
    <item>
      <title>The Hidden Bottleneck in LLM Streaming: Function Calls (And How to Fix It)</title>
      <dc:creator>Louis Sanna</dc:creator>
      <pubDate>Thu, 12 Dec 2024 16:27:46 +0000</pubDate>
      <link>https://dev.to/louis-sanna/the-hidden-bottleneck-in-llm-streaming-function-calls-and-how-to-fix-it-1622</link>
      <guid>https://dev.to/louis-sanna/the-hidden-bottleneck-in-llm-streaming-function-calls-and-how-to-fix-it-1622</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;📢 Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Picture this: You’re building a real-time LLM-powered app. Your users are expecting fast, continuous updates from the AI, but instead, they’re staring at a frozen screen. What gives?&lt;/p&gt;

&lt;p&gt;Spoiler alert — it’s not your LLM that’s slowing things down. It’s &lt;strong&gt;your function calls&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Every time your app makes a call to process data, hit an API, or load a large file, you risk blocking the stream. The result? Delays, lag, and an experience that feels anything but “real-time.”&lt;/p&gt;

&lt;p&gt;But don’t worry — this bottleneck has 3 simple fixes. In this post, I’ll show you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Why function calls block LLM streams&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The 3 strategies to prevent bottlenecks&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How to keep your streams fast, smooth, and uninterrupted&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s get into it. 🚀&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;❌ Why Function Calls Are Slowing You Down&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;LLM streaming works by sending a steady flow of small chunks of text to the client. But here’s the catch: Every time you call a function during the stream — to process data, access an API, or run a calculation — the stream &lt;em&gt;pauses&lt;/em&gt; until the function finishes.&lt;/p&gt;

&lt;p&gt;This happens because most functions are &lt;strong&gt;synchronous&lt;/strong&gt; by default, which means they block the current thread. Imagine you’re in a group chat, but one friend keeps pausing the conversation to answer a phone call. Annoying, right?&lt;/p&gt;

&lt;p&gt;Here’s what’s really happening:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔁 &lt;strong&gt;Synchronous (Blocking) Functions&lt;/strong&gt;: The stream has to “wait” for these functions to finish before sending the next chunk of data.&lt;/li&gt;
&lt;li&gt;🔥 &lt;strong&gt;Non-blocking (Asynchronous) Functions&lt;/strong&gt;: The stream continues while the function does its work in the background.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s a visual of the difference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ Blocking Call ] ---&amp;gt; Stream Pauses
[ Async Call ] ------&amp;gt; Stream Continues

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;🛠️ 3 Ways to Fix It&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To avoid blocking the stream, you need to make your app &lt;strong&gt;non-blocking&lt;/strong&gt;. Here are the &lt;strong&gt;3 best techniques&lt;/strong&gt; to do just that:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1️⃣ Use Asynchronous Functions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If your function is doing I/O (like hitting an API), make it asynchronous so it can "wait" for the API &lt;strong&gt;without pausing the stream&lt;/strong&gt;. Async functions allow the app to keep streaming while the function completes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When calling external APIs&lt;/li&gt;
&lt;li&gt;When reading/writing to files or databases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use Python’s &lt;code&gt;async def&lt;/code&gt; for your functions.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;await&lt;/code&gt; to “pause” the function without blocking the stream.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Example: Streaming an LLM While Calling an API&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StreamingResponse&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;async_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulate a slow API call
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data_chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;processed_chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;async_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;processed_chunk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulate delay between chunks
&lt;/span&gt;
&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;stream_generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔍 &lt;strong&gt;What’s happening here?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each chunk is being processed &lt;strong&gt;asynchronously&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The stream keeps flowing while &lt;code&gt;async_function&lt;/code&gt; is working.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pro Tip:&lt;/strong&gt; Use &lt;code&gt;await asyncio.sleep()&lt;/code&gt; to simulate non-blocking behavior. Replace this with actual I/O tasks like API calls, file reads, or database queries.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2️⃣ Leverage Background Tasks&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you have heavy computations (like ML inference), you don’t want to keep your stream waiting. Instead, offload the task into the &lt;strong&gt;background&lt;/strong&gt; and continue streaming while the computation runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When you have &lt;strong&gt;CPU-heavy computations&lt;/strong&gt; (e.g., model predictions)&lt;/li&gt;
&lt;li&gt;When dealing with large files or datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Move heavy functions into a &lt;strong&gt;background task&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Use FastAPI’s &lt;code&gt;BackgroundTasks&lt;/code&gt; to offload computations.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Example: Stream LLM Responses While Running a Background Computation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BackgroundTasks&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StreamingResponse&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;background_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulate a heavy ML computation
&lt;/span&gt;    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;background_tasks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data_chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;background_tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;background_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: Processing &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulate a slight delay
&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# Wait for all background tasks
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;background_tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BackgroundTasks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;stream_generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;background_tasks&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔍 &lt;strong&gt;What’s happening here?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The heavy computation (&lt;code&gt;background_task&lt;/code&gt;) runs in the background.&lt;/li&gt;
&lt;li&gt;The stream stays responsive, sending "Processing..." updates in real time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pro Tip:&lt;/strong&gt; Background tasks are perfect for &lt;strong&gt;CPU-bound operations&lt;/strong&gt; like ML inference, large file processing, and batch jobs.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3️⃣ Chunk Your Data&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you have to process &lt;strong&gt;large datasets&lt;/strong&gt;, break them into smaller "chunks" and process each one at a time. This keeps the stream alive, rather than forcing it to wait for the whole dataset to be processed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use it&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When dealing with large datasets (e.g., CSV files, large JSON)&lt;/li&gt;
&lt;li&gt;When paginating results from a large database query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Divide large datasets into chunks.&lt;/li&gt;
&lt;li&gt;Process each chunk and stream it immediately.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Example: Stream Responses While Processing Large Datasets&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StreamingResponse&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulate processing time
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data_chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;processed_chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;process_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;processed_chunk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulate delay between chunks
&lt;/span&gt;
&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;stream_generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔍 &lt;strong&gt;What’s happening here?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instead of processing a &lt;strong&gt;big file all at once&lt;/strong&gt;, the data is processed in chunks.&lt;/li&gt;
&lt;li&gt;The stream stays responsive, sending updates as each chunk finishes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pro Tip:&lt;/strong&gt; Use chunked processing for large datasets (like CSVs) to stream "partial results" instead of waiting for the whole job to finish.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;📊 Which Method Should You Use?&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Method&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Use For&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Use Case Example&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Async Functions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;I/O tasks (like APIs)&lt;/td&gt;
&lt;td&gt;Streaming responses from API calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Background Tasks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Heavy computation&lt;/td&gt;
&lt;td&gt;Running ML inference while streaming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chunked Processing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large datasets&lt;/td&gt;
&lt;td&gt;Streaming data from large files&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;🚀 Conclusion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When it comes to LLM streaming, blocking function calls are a &lt;strong&gt;hidden bottleneck&lt;/strong&gt;. They stop the stream, causing lags and bad user experiences.&lt;/p&gt;

&lt;p&gt;But now you know the &lt;strong&gt;3 ways to fix it&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;1️⃣ Use &lt;strong&gt;Async Functions&lt;/strong&gt; for I/O tasks.&lt;/p&gt;

&lt;p&gt;2️⃣ Use &lt;strong&gt;Background Tasks&lt;/strong&gt; for heavy computations.&lt;/p&gt;

&lt;p&gt;3️⃣ Use &lt;strong&gt;Chunked Processing&lt;/strong&gt; for large datasets.&lt;/p&gt;

&lt;p&gt;By using these techniques, you’ll keep your streams fast, smooth, and &lt;strong&gt;real-time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;💡 &lt;strong&gt;Want more LLM superpowers?&lt;/strong&gt; Check out Louis Sanna’s guide on &lt;strong&gt;Responsive LLM Applications with Server-Sent Events&lt;/strong&gt;. It’s the ultimate toolkit for building high-performance, real-time AI apps.&lt;/p&gt;

&lt;p&gt;Want to know more about building Responsive LLMs? Check out my course on newline: &lt;a href="https://www.newline.co/courses/responsive-llm-applications-with-server-sent-events" rel="noopener noreferrer"&gt;Responsive LLM Applications with Server-Sent Events&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Building the Ideal AI Agent: From Async Event Streams to Context-Aware State Management</title>
      <dc:creator>Louis Sanna</dc:creator>
      <pubDate>Thu, 12 Dec 2024 16:26:41 +0000</pubDate>
      <link>https://dev.to/louis-sanna/building-the-ideal-ai-agent-from-async-event-streams-to-context-aware-state-management-33</link>
      <guid>https://dev.to/louis-sanna/building-the-ideal-ai-agent-from-async-event-streams-to-context-aware-state-management-33</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The dream of an &lt;strong&gt;autonomous AI agent&lt;/strong&gt; isn’t just about generating smart responses — it’s about making those responses fast, interactive, and context-aware. To achieve this, you need to manage state across asynchronous tasks, handle real-time communication, and separate logic cleanly.&lt;/p&gt;

&lt;p&gt;In this blog, you’ll learn how to design an &lt;strong&gt;ideal AI agent&lt;/strong&gt; by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using &lt;strong&gt;asynchronous server-sent events (SSE)&lt;/strong&gt; to create live, real-time AI responses.&lt;/li&gt;
&lt;li&gt;Simplifying state management with &lt;strong&gt;context variables&lt;/strong&gt; (&lt;code&gt;contextvar&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Decoupling logic from network operations to build a &lt;strong&gt;scalable, maintainable architecture&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you’ll have a step-by-step understanding of how to design an agent that’s efficient, elegant, and easy to scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;1️⃣ The Architecture of an Ideal AI Agent&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most "basic" AI agents are tangled in a mess of network calls, domain logic, and asynchronous event management. This makes it difficult to debug and hard to scale.&lt;/p&gt;

&lt;p&gt;The ideal agent separates these concerns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Event-driven Communication&lt;/strong&gt;: Uses asynchronous &lt;strong&gt;Server-Sent Events (SSE)&lt;/strong&gt; to stream updates to users in real time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context-Aware State Management&lt;/strong&gt;: Manages context across multiple async calls using Python’s &lt;strong&gt;contextvar&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decoupled Business Logic&lt;/strong&gt;: Avoids tightly coupling logic with network operations, making it easier to maintain.&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Diagram: Ideal AI Agent Architecture&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------------------------+
|     Client (Web Browser)    |
|    Listens for SSE Events   |
+----------------------------+
            ⬇
+----------------------------+
|    FastAPI Backend          |
|    (Async Streaming)        |
+----------------------------+
            ⬇
+----------------------------+
|    Agent Logic              |
|  1️⃣ Generates Output       |
|  2️⃣ Emits Real-Time Events |
+----------------------------+
            ⬇
+----------------------------+
|    Event Queue + Context    |
|   Context-Aware State       |
+----------------------------+

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;2️⃣ Key Concepts to Build the Ideal Agent&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Asynchronous Event Streaming (SSE)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Instead of waiting for the entire AI response to finish, we can stream each "chunk" of the response to the user. This makes the interaction feel faster, even if the total response time is the same.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How It Works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The client opens an &lt;strong&gt;event stream&lt;/strong&gt; (&lt;code&gt;text/event-stream&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Every time the agent generates new content (like a sentence or paragraph), it streams that chunk to the client.&lt;/li&gt;
&lt;li&gt;When the full response is complete, the event stream closes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why It’s Important:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feels more &lt;strong&gt;interactive&lt;/strong&gt; to users.&lt;/li&gt;
&lt;li&gt;Allows for &lt;strong&gt;partial responses&lt;/strong&gt; — users can see content as it’s created.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Context-Aware State Management (&lt;code&gt;contextvar&lt;/code&gt;)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Agents often deal with &lt;strong&gt;asynchronous tasks&lt;/strong&gt; that happen in parallel. Without context, shared state between these tasks becomes difficult.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Problem:&lt;/p&gt;

&lt;p&gt;Two user requests hit the server at the same time. How do you ensure their states are separate?&lt;/p&gt;

&lt;p&gt;Solution:&lt;/p&gt;

&lt;p&gt;Use Python’s &lt;strong&gt;&lt;code&gt;contextvars&lt;/code&gt;&lt;/strong&gt;. This lets you manage request-specific variables, even when multiple requests happen at once. Think of it like &lt;strong&gt;thread-local storage&lt;/strong&gt; for async code.&lt;/p&gt;

&lt;p&gt;How It Works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When a new request arrives, a &lt;strong&gt;queue&lt;/strong&gt; is created in the context.&lt;/li&gt;
&lt;li&gt;This queue holds the &lt;strong&gt;event messages&lt;/strong&gt; (chunks) for that specific request.&lt;/li&gt;
&lt;li&gt;As the agent generates output, it &lt;strong&gt;emits chunks&lt;/strong&gt; into the queue.&lt;/li&gt;
&lt;li&gt;Once the queue is empty and the task is done, the context is &lt;strong&gt;cleaned up&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Decoupled Agent Logic&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The best AI agents keep &lt;strong&gt;network logic&lt;/strong&gt; and &lt;strong&gt;business logic&lt;/strong&gt; separate. Instead of directly streaming from the agent, we push events into a queue and handle event streaming &lt;strong&gt;separately&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This separation makes it easier to test, debug, and maintain the system.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Concepts You’ll Need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;emit_event()&lt;/strong&gt;: Adds events to the queue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;close()&lt;/strong&gt;: Closes the queue when the task finishes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming Response&lt;/strong&gt;: Sends chunks from the queue to the client.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;3️⃣ Step-by-Step Guide to Building the Ideal Agent&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 1: Setup the Environment&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Install the necessary libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi uvicorn pydantic

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;Step 2: Build the Context-Aware Event System&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This system tracks and streams events from the agent to the client. Here’s how:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use &lt;strong&gt;&lt;code&gt;contextvar&lt;/code&gt;&lt;/strong&gt; to store the event queue.&lt;/li&gt;
&lt;li&gt;Create functions to &lt;strong&gt;emit events&lt;/strong&gt; and &lt;strong&gt;close the queue&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The agent will generate chunks, add them to the queue, and the queue will stream them.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;context.py&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;contextvars&lt;/span&gt;

&lt;span class="c1"&gt;# Create a context variable to store request-specific data
&lt;/span&gt;&lt;span class="n"&gt;chat_context_var&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;contextvars&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ContextVar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Function to initialize the context (per request)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_chat_context&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;emit_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Signals to close the stream
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;emit_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;Step 3: The Chat Service&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is the core logic where the agent processes a request.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When a client sends a chat request, we create a new context.&lt;/li&gt;
&lt;li&gt;The context tracks the queue where messages (events) are stored.&lt;/li&gt;
&lt;li&gt;Each message chunk from the agent is streamed to the client in real time.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;chat.py&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;APIRouter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StreamingResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;build_chat_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chat_context_var&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="n"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;APIRouter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;emit_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_context_var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulate a delay to send chunks one-by-one
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;emit_event&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# End of stream
&lt;/span&gt;
&lt;span class="nd"&gt;@router.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/api/stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;emit_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_chat_context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;chat_context_var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;emit_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;process_messages&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How are you?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Goodbye&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;event_generator&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# End of stream
&lt;/span&gt;                &lt;span class="k"&gt;break&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;event_generator&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;Step 4: Run the Service&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Run the FastAPI server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvicorn chat:app &lt;span class="nt"&gt;--reload&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make a POST request to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:8000/api/stream

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Watch the &lt;strong&gt;real-time response&lt;/strong&gt; as chunks appear.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;4️⃣ Advanced Techniques&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Use &lt;code&gt;async def emit_event()&lt;/code&gt; Instead of &lt;code&gt;yield&lt;/code&gt;&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The "yield" method can cause issues when events come from multiple functions. Instead, push events to a queue using &lt;strong&gt;&lt;code&gt;emit_event()&lt;/code&gt;&lt;/strong&gt;. This avoids yield problems, especially when sub-functions need to send events.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Manage Long-Running Tasks&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Use &lt;strong&gt;&lt;code&gt;asyncio.create_task()&lt;/code&gt;&lt;/strong&gt; to process long-running tasks without blocking the entire stream. This allows multiple users to receive independent updates.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Use WebSockets Instead of SSE&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For more interactive experiences, use WebSockets. Unlike SSE, WebSockets support &lt;strong&gt;bi-directional communication&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;5️⃣ Key Takeaways&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context-aware agents&lt;/strong&gt; can separate network logic from agent logic.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;SSE (Server-Sent Events)&lt;/strong&gt; for real-time feedback.&lt;/li&gt;
&lt;li&gt;Manage agent state using Python’s &lt;strong&gt;contextvar&lt;/strong&gt; to keep state isolated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;emit_event()&lt;/strong&gt; makes it simple to send updates from any part of the agent logic.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;6️⃣ Full Code Example&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Here’s the complete file structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;├── context.py         # Handles contextvars and event system
├── chat.py            # The core logic for the streaming service
└── main.py            # Starts the FastAPI server

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;7️⃣ Final Thoughts&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Building an "ideal" AI agent isn’t just about improving its intelligence — it’s about making it more interactive, more maintainable, and more scalable. By using &lt;strong&gt;async events&lt;/strong&gt;, &lt;strong&gt;contextvars&lt;/strong&gt;, and &lt;strong&gt;real-time streams&lt;/strong&gt;, you can create an agent that "feels" fast and responsive.&lt;/p&gt;

&lt;p&gt;If you’re ready to &lt;strong&gt;level up your agents&lt;/strong&gt;, apply these principles to your next AI project.&lt;/p&gt;

&lt;p&gt;Want to learn more about building Responsive LLMs? Check out my course on newline: &lt;a href="https://www.newline.co/courses/responsive-llm-applications-with-server-sent-events" rel="noopener noreferrer"&gt;Responsive LLM Applications with Server-Sent Events&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I cover :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to design systems for AI applications&lt;/li&gt;
&lt;li&gt;How to stream the answer of a Large Language Model&lt;/li&gt;
&lt;li&gt;Differences between Server-Sent Events and WebSockets&lt;/li&gt;
&lt;li&gt;Importance of real-time for GenAI UI&lt;/li&gt;
&lt;li&gt;How asynchronous programming in Python works&lt;/li&gt;
&lt;li&gt;How to integrate LangChain with FastAPI&lt;/li&gt;
&lt;li&gt;What problems Retrieval Augmented Generation can solve&lt;/li&gt;
&lt;li&gt;How to create an AI agent
... and much more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Worth checking out if you want to build your own LLM applications.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Self-Correcting AI Agents: How to Build AI That Learns From Its Mistakes</title>
      <dc:creator>Louis Sanna</dc:creator>
      <pubDate>Thu, 12 Dec 2024 16:25:27 +0000</pubDate>
      <link>https://dev.to/louis-sanna/self-correcting-ai-agents-how-to-build-ai-that-learns-from-its-mistakes-39f1</link>
      <guid>https://dev.to/louis-sanna/self-correcting-ai-agents-how-to-build-ai-that-learns-from-its-mistakes-39f1</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;What if your AI agent could recognize its own mistakes, learn from them, and try again — without human intervention? Welcome to the world of &lt;strong&gt;self-correcting AI agents&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Most AI models generate outputs in a single attempt. But self-correcting agents go further. They can identify when an error occurs, analyze the cause, and apply a fix — all in real time. Think of it as an AI with a built-in "trial and error" mindset.&lt;/p&gt;

&lt;p&gt;In this blog, you’ll learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What &lt;strong&gt;self-correction&lt;/strong&gt; means for AI agents.&lt;/li&gt;
&lt;li&gt;How to build an agent that &lt;strong&gt;adapts to mistakes&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;How to apply the &lt;strong&gt;reflection pattern&lt;/strong&gt; in agent design.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you’ll know how to design AI agents that not only fail gracefully but also improve on every attempt.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1️⃣ What is a Self-Correcting Agent?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;self-correcting agent&lt;/strong&gt; is an AI system that can recognize its own failures and attempt a new strategy. If the initial approach doesn't work, the agent re-evaluates and tries an alternative path.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Analogy:&lt;/p&gt;

&lt;p&gt;Imagine asking a chef to bake a cake, but they use too much sugar the first time. A standard AI would keep making the same mistake. But a self-correcting AI would notice the error, reduce the sugar next time, and adjust until the cake tastes perfect.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why Do Self-Correcting Agents Matter?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Most AI tools (like ChatGPT) can only give you a single response. If it's wrong, you have to manually ask it to "try again." But a self-correcting agent can &lt;strong&gt;autonomously retry&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🛠️ Example Use Case:&lt;/p&gt;

&lt;p&gt;An AI is asked to write a Python function that calculates Fibonacci numbers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 1:&lt;/strong&gt; The AI writes a slow recursive function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-Correction:&lt;/strong&gt; It notices that recursion is too slow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 2:&lt;/strong&gt; The AI rewrites the function using dynamic programming, making it faster.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2️⃣ Key Techniques for Self-Correction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;How do we make an agent self-aware enough to recognize its mistakes? Here are three main techniques:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Error Detection&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Identify if the result is "wrong" (like an API call failure, incorrect output, or bad performance).&lt;/li&gt;
&lt;li&gt;Use error codes, exceptions, or test cases to detect failures.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Reflection&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Agents reflect on their decisions, ask, "What went wrong?", and plan their next step.&lt;/li&gt;
&lt;li&gt;Reflection can be achieved by logging errors, tracking unsuccessful API calls, or re-evaluating response quality.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Retry Logic&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;After reflection, agents retry with an improved strategy.&lt;/li&gt;
&lt;li&gt;This might mean switching API providers, using more efficient logic, or applying a backup approach.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Pro Tip:&lt;/p&gt;

&lt;p&gt;Error logs can be fed back into the AI model to improve future performance.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3️⃣ Self-Correction in Practice&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let’s build a self-correcting AI agent using Python and FastAPI.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;🧑‍💻 Step 1: The Problem&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We want an AI agent that can generate a Python function. If the function fails to run or produces the wrong output, the agent will automatically &lt;strong&gt;correct itself&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Write a Fibonacci function that calculates the 10th Fibonacci number.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge:&lt;/strong&gt; If the agent generates a recursive version (which is slow), it should recognize this and rewrite it using dynamic programming.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;🧑‍💻 Step 2: Set Up the Agent&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Install the necessary dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai fastapi uvicorn

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;🧑‍💻 Step 3: Write the Agent&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Here’s how the agent works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It generates a Python function using &lt;strong&gt;OpenAI’s API&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;It &lt;strong&gt;runs the function&lt;/strong&gt; to check if it works.&lt;/li&gt;
&lt;li&gt;If the function fails (slow, wrong, or error), it reflects and &lt;strong&gt;corrects the approach&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Code Implementation&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="c1"&gt;# 🔐 Replace with your OpenAI API key
&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_openai_api_key_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# 🎉 Step 1: Ask the AI to generate a Fibonacci function
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_fibonacci_function&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python function to calculate the 10th Fibonacci number.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ChatCompletion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;function_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;function_code&lt;/span&gt;

&lt;span class="c1"&gt;# 🧪 Step 2: Test the function to see if it works
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_fibonacci_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_code&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Run the function in a safe execution environment
&lt;/span&gt;        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fibonacci(10)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Call the function with n=10
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;55&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Correct Fibonacci value for n=10
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wrong_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 🌀 Step 3: Self-Correct by asking for a new version of the function
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;self_correct_function&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;max_attempts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_attempts&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🟢 Attempt &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Generate a new Fibonacci function
&lt;/span&gt;        &lt;span class="n"&gt;function_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generate_fibonacci_function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generated function:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;function_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Test the function to see if it works
&lt;/span&gt;        &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;test_fibonacci_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Success! Fibonacci(10) = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wrong_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ Incorrect result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Asking AI to try a better method.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;💥 Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Asking AI to try again.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ Max attempts reached. Could not generate a correct function.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 🔥 Run the correction process
&lt;/span&gt;&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;self_correct_function&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;4️⃣ How It Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Generate Function&lt;/strong&gt;: The AI writes a Python function for Fibonacci.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run the Function&lt;/strong&gt;: The agent executes the function and checks the result.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-Correct&lt;/strong&gt;: If the result is wrong, it prompts OpenAI to &lt;strong&gt;try again&lt;/strong&gt; with a smarter approach.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Output Example&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🟢 Attempt 1
❌ Incorrect result: 42. Asking AI to try a better method.

🟢 Attempt 2
💥 Error: NameError: name 'fibonacci' is not defined. Asking AI to try again.

🟢 Attempt 3
✅ Success! Fibonacci(10) = 55

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;5️⃣ Key Patterns in Self-Correcting Agents&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Error Detection&lt;/strong&gt;: Look for incorrect output, slow performance, or exceptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reflection&lt;/strong&gt;: Log the problem. Why did it fail?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retry Logic&lt;/strong&gt;: Call a new version of the function, but smarter this time.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Pro Tip:&lt;/p&gt;

&lt;p&gt;Use a &lt;strong&gt;feedback loop&lt;/strong&gt; to let the agent learn from mistakes. Feed logs back into the agent to help it recognize common issues.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;6️⃣ When Should You Use Self-Correcting Agents?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Self-correcting agents are useful in cases where failure is frequent, and manual intervention is costly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API Calls&lt;/strong&gt;: Retry if an API fails.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Generation&lt;/strong&gt;: Re-generate code if it throws errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Analysis&lt;/strong&gt;: Fix incorrect predictions in ML models.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;7️⃣ Benefits of Self-Correcting Agents&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Problem&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Solution&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Agent gets it wrong&lt;/td&gt;
&lt;td&gt;Retry with a better approach&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API request fails&lt;/td&gt;
&lt;td&gt;Retry with exponential backoff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code generation error&lt;/td&gt;
&lt;td&gt;Use a smarter prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;8️⃣ Take It to the Next Level&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use a Cache&lt;/strong&gt;: Store successful outputs so the agent doesn’t start from scratch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add Feedback Loops&lt;/strong&gt;: If a function fails often, feed logs into a training process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track Agent Confidence&lt;/strong&gt;: If the agent is unsure, have it run test cases.&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;9️⃣ Wrapping Up&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;You now have the blueprint for a &lt;strong&gt;self-correcting agent&lt;/strong&gt; that can write, test, and fix Python functions. Here’s what we covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The 3 pillars of self-correction: &lt;strong&gt;Error Detection, Reflection, Retry Logic&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;How to build an agent that generates and tests Python functions.&lt;/li&gt;
&lt;li&gt;Best practices for building smarter, more reliable agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💪 Challenge:&lt;/p&gt;

&lt;p&gt;Build a self-correcting agent that not only generates code but &lt;strong&gt;evaluates runtime performance&lt;/strong&gt;. If the function is too slow, have it re-write the function for optimization.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Want to learn more about building Responsive LLMs? Check out my course on newline: &lt;a href="https://www.newline.co/courses/responsive-llm-applications-with-server-sent-events" rel="noopener noreferrer"&gt;Responsive LLM Applications with Server-Sent Events&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I cover :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to design systems for AI applications&lt;/li&gt;
&lt;li&gt;How to stream the answer of a Large Language Model&lt;/li&gt;
&lt;li&gt;Differences between Server-Sent Events and WebSockets&lt;/li&gt;
&lt;li&gt;Importance of real-time for GenAI UI&lt;/li&gt;
&lt;li&gt;How asynchronous programming in Python works&lt;/li&gt;
&lt;li&gt;How to integrate LangChain with FastAPI&lt;/li&gt;
&lt;li&gt;What problems Retrieval Augmented Generation can solve&lt;/li&gt;
&lt;li&gt;How to create an AI agent
... and much more.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>langchain</category>
      <category>fastapi</category>
      <category>llm</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Build Smarter AI Agents with Dynamic Tooling</title>
      <dc:creator>Louis Sanna</dc:creator>
      <pubDate>Thu, 12 Dec 2024 16:23:59 +0000</pubDate>
      <link>https://dev.to/louis-sanna/how-to-build-smarter-ai-agents-with-dynamic-tooling-12i</link>
      <guid>https://dev.to/louis-sanna/how-to-build-smarter-ai-agents-with-dynamic-tooling-12i</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Imagine having an AI agent that can access real-time weather data, process complex calculations, and improve itself after making a mistake — all without human intervention. Sounds kinda neat, right? Well, it’s not as hard to build as you might think.&lt;/p&gt;

&lt;p&gt;Large Language Models (LLMs) like GPT-4 are impressive, but they have limits. Out-of-the-box, they can't access live data or perform calculations that require real-time inputs. But with &lt;strong&gt;dynamic tooling&lt;/strong&gt;, you can break these limits, allowing agents to fetch live information, make decisions, and even self-correct when things go wrong.&lt;/p&gt;

&lt;p&gt;In this guide, we’ll walk you through how to build an AI agent that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access real-time API data.&lt;/li&gt;
&lt;li&gt;Self-correct and improve its performance.&lt;/li&gt;
&lt;li&gt;Use a clean, maintainable architecture for future upgrades.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you'll have the tools you need to build an agent that's as flexible as it is powerful.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1️⃣ What are Dynamic Tools in AI Agents?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Dynamic tools allow AI agents to go beyond static responses. Instead of just generating text, agents can "call" specific actions, like fetching real-time data, executing scripts, or correcting their own mistakes.&lt;/p&gt;

&lt;p&gt;Here’s a simple analogy:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Static AI is like a person who answers only from their memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic AI&lt;/strong&gt; is like someone who can use a search engine, calculator, or dictionary to give you better answers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why Does This Matter?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With dynamic tools, you can build smarter AI agents that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Call external APIs&lt;/strong&gt; for real-time data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process information&lt;/strong&gt; like calculations, translations, or data transformation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-correct&lt;/strong&gt; errors in their own logic or execution.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;🛠️ Example Use Case:&lt;/p&gt;

&lt;p&gt;"What's the average temperature in Tokyo and Paris right now?"&lt;/p&gt;

&lt;p&gt;A static AI would fail, but a dynamic AI can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Call a weather API for Tokyo.&lt;/li&gt;
&lt;li&gt;Call a weather API for Paris.&lt;/li&gt;
&lt;li&gt;Calculate the average temperature.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2️⃣ Building a Real-Time API Agent&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let’s create an AI agent that can fetch real-time weather data from an API and compute the average temperature between two cities.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Tools Required&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FastAPI&lt;/strong&gt;: To build the backend.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asyncio&lt;/strong&gt;: For asynchronous event handling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An External Weather API&lt;/strong&gt;: For real-time temperature data (like OpenWeatherMap or WeatherAPI).&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;🧑‍💻 Step 1: Setting up the Environment&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Start by installing the necessary packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi uvicorn httpx

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;🧑‍💻 Step 2: Writing the Agent&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Here’s how the agent works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It receives a list of cities (like Tokyo and Paris).&lt;/li&gt;
&lt;li&gt;It fetches live weather data for each city using &lt;strong&gt;HTTPX&lt;/strong&gt; (an async HTTP client).&lt;/li&gt;
&lt;li&gt;It calculates the average temperature and returns it.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Code Implementation&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 🌦️ Step 1: Function to get weather data
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch the current temperature for a given city using an API.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_api_key_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.openweathermap.org/data/2.5/weather?q=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;appid=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;units=metric&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;main&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;

&lt;span class="c1"&gt;# 🌦️ Step 2: Agent to calculate average temperature
&lt;/span&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/average-temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_average_temperature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Takes a list of cities and returns the average temperature.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;cities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please provide a list of cities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;temperatures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cities&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;average_temp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperatures&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperatures&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;average_temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;average_temp&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;How It Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;get_weather(city)&lt;/code&gt;&lt;/strong&gt;: Calls the weather API to get the temperature for a city.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;calculate_average_temperature(request)&lt;/code&gt;&lt;/strong&gt;: Loops over a list of cities, fetches weather for each, and calculates the average temperature.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;🔥 Test It!&lt;br&gt;
Start the server:&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvicorn filename:app &lt;span class="nt"&gt;--reload&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Send a POST request to &lt;code&gt;http://127.0.0.1:8000/average-temperature&lt;/code&gt; with this JSON body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Tokyo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Paris"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response will look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"average_temperature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;18.5&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;3️⃣ Building Self-Correcting Agents&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;What if the agent calls the API but gets a rate-limiting error or the data is incomplete? Can it &lt;strong&gt;self-correct&lt;/strong&gt;? Yes, it can!&lt;/p&gt;

&lt;p&gt;Self-correcting agents work by analyzing failures and then attempting a new approach. Here’s a simple example.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;🧑‍💻 Step 1: Detecting Errors&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When we call the weather API, we may receive an error. Instead of crashing, the agent should recognize the failure and &lt;strong&gt;retry&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Updated Code for Resilient Agent&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch the current temperature, retrying if necessary.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_api_key_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.openweathermap.org/data/2.5/weather?q=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;appid=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;units=metric&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# Retry up to 3 times
&lt;/span&gt;        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;main&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Attempt &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to get weather data for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; after 3 attempts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;4️⃣ How to Build a Smarter Agent&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To make our agent &lt;strong&gt;smarter&lt;/strong&gt;, we can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add Self-Correction Loops&lt;/strong&gt;: Retry API calls on failure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Reflection&lt;/strong&gt;: If a calculation fails, reattempt using a new approach (e.g., switch API providers).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modularize Tools&lt;/strong&gt;: Use dynamic tools so the agent can call functions &lt;strong&gt;as needed&lt;/strong&gt; rather than using them all the time.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s an enhanced design for our agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1️⃣ Agent Receives User Request (e.g., "What's the average temperature in Tokyo and Paris?")
2️⃣ Agent Uses Tool: Calls get_weather(city) for each city.
3️⃣ Agent Handles Errors: If a call fails, it retries with a new method.
4️⃣ Agent Returns the Result: Final response is sent to the user.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 Pro Tip:&lt;/p&gt;

&lt;p&gt;Use &lt;strong&gt;reflection&lt;/strong&gt; by logging all errors and decisions the agent makes. This log can be used to self-correct on future requests.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;5️⃣ Best Practices for Dynamic Tool Agents&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use context-aware logging&lt;/strong&gt;: Track failures and successes so the agent can learn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limit API calls&lt;/strong&gt;: Use caching to avoid unnecessary calls to the same data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decouple logic&lt;/strong&gt;: Split agent logic (like get_weather) into independent "tool" functions.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;6️⃣ Key Concepts Recap&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Tools&lt;/strong&gt;: Functions that agents can call (like APIs, calculators, or scripts).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-Correction&lt;/strong&gt;: Agents analyze their own failures and retry tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple Code, Smart Logic&lt;/strong&gt;: Make your agents smarter with minimal code changes (like retries).&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;7️⃣ Wrapping Up&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Congratulations! You've built a smarter AI agent with &lt;strong&gt;dynamic tools&lt;/strong&gt;. It can access weather APIs, calculate the average temperature, and retry when things go wrong. The key concepts here — &lt;strong&gt;API calls, self-correction, and modular tools&lt;/strong&gt; — are foundational for building more advanced agents that handle &lt;strong&gt;any&lt;/strong&gt; user request.&lt;/p&gt;

&lt;p&gt;Here’s a quick recap of what we did:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built a &lt;strong&gt;weather-fetching agent&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Added &lt;strong&gt;resilience with self-correction&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Discussed best practices for agent development.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to take it to the next level? Here’s a challenge:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Build an AI agent that can pull data from multiple sources, merge it, and generate insights.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Want to learn more about building Responsive LLMs? Check out my course on newline: &lt;a href="https://www.newline.co/courses/responsive-llm-applications-with-server-sent-events" rel="noopener noreferrer"&gt;Responsive LLM Applications with Server-Sent Events&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I cover :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to design systems for AI applications&lt;/li&gt;
&lt;li&gt;How to stream the answer of a Large Language Model&lt;/li&gt;
&lt;li&gt;Differences between Server-Sent Events and WebSockets&lt;/li&gt;
&lt;li&gt;Importance of real-time for GenAI UI&lt;/li&gt;
&lt;li&gt;How asynchronous programming in Python works&lt;/li&gt;
&lt;li&gt;How to integrate LangChain with FastAPI&lt;/li&gt;
&lt;li&gt;What problems Retrieval Augmented Generation can solve&lt;/li&gt;
&lt;li&gt;How to create an AI agent
... and much more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Worth checking out if you want to build your own LLM applications.&lt;/p&gt;

&lt;p&gt;🚀 &lt;strong&gt;Now it’s your turn to build smarter agents. Happy coding!&lt;/strong&gt; 🚀&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>fastapi</category>
      <category>llm</category>
    </item>
    <item>
      <title>Mastering Real-Time AI: A Developer’s Guide to Building Streaming LLMs with FastAPI and Transformers</title>
      <dc:creator>Louis Sanna</dc:creator>
      <pubDate>Thu, 12 Dec 2024 16:00:08 +0000</pubDate>
      <link>https://dev.to/louis-sanna/mastering-real-time-ai-a-developers-guide-to-building-streaming-llms-with-fastapi-and-transformers-2be8</link>
      <guid>https://dev.to/louis-sanna/mastering-real-time-ai-a-developers-guide-to-building-streaming-llms-with-fastapi-and-transformers-2be8</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;Introduction: Why Real-Time Streaming AI is the Future&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Real-time AI is transforming how users experience applications. Gone are the days when users had to wait for entire responses to load. Instead, modern apps stream data in chunks.&lt;/p&gt;

&lt;p&gt;For developers, this shift isn't just a "nice-to-have" — it's essential. Chatbots, search engines, and AI-powered customer support apps are now expected to integrate streaming LLM (Large Language Model) responses. But how do you actually build one?&lt;/p&gt;

&lt;p&gt;This guide walks you through the process, step-by-step, using FastAPI, Transformers, and a healthy dose of asynchronous programming. By the end, you'll have a working streaming endpoint capable of serving LLM-generated text in real-time.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Who This Is For:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Software Engineers&lt;/strong&gt; who want to upgrade their back-end skills with text streaming and event-driven programming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Scientists&lt;/strong&gt; who want to repurpose ML skills for production-ready AI services.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Table of Contents&lt;/strong&gt;
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What Is a Streaming LLM and Why It Matters?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tech Stack Overview: The Tools You'll Need&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Project Walkthrough: Building the Streaming LLM Backend&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Environment Setup&lt;/li&gt;
&lt;li&gt;Setting Up FastAPI&lt;/li&gt;
&lt;li&gt;Building the Streaming Endpoint&lt;/li&gt;
&lt;li&gt;Connecting the LLM with Transformers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Client-Side Integration: Consuming the Stream&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deploying Your Streaming AI App&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Conclusion and Next Steps&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;1️⃣ What Is a Streaming LLM and Why It Matters?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When you type into ChatGPT or ask a question in Google Bard, you'll notice the response appears &lt;strong&gt;one word at a time&lt;/strong&gt;.  Streaming LLMs send chunks of text as they're generated instead of waiting for the entire message, i.e. they deliver in real-time.&lt;/p&gt;

&lt;p&gt;Here’s why you should care as a developer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Faster User Feedback&lt;/strong&gt;: Users see responses sooner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lower Latency Perception&lt;/strong&gt;: Users &lt;em&gt;feel&lt;/em&gt; like the system is faster, even if total time is the same.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved UX for AI Chatbots&lt;/strong&gt;: Streaming text "feels" human, mimicking natural conversation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’ve used ChatGPT, you’ve already experienced this. Now it’s time to learn how to build one yourself.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;2️⃣ Tech Stack Overview: The Tools You'll Need&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To build your streaming LLM backend, you’ll need the following tools:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;📦 Core Technologies&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Tool&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FastAPI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Handles API requests and real-time streaming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Uvicorn&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runs the FastAPI app as an ASGI server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transformers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Access pre-trained language models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;asyncio&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Handles asynchronous event loops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;contextvars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Keeps track of context in async tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Server-Sent Events (SSE)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Streams messages to the client&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Docker&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Optional for containerization and deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Note: Server-Sent Events (SSE) is different from WebSockets. SSE allows the server to push data to the client, while WebSockets support bi-directional communication. For LLM streaming, SSE is simpler and more efficient.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;3️⃣ Project Walkthrough: Building the Streaming LLM Backend&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 1: Environment Setup&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Install Python and Pip&lt;/strong&gt;: Ensure Python 3.7+ is installed.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Create a Virtual Environment&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate  &lt;span class="c"&gt;# On Windows: venv\Scripts\activate&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Install Dependencies&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi uvicorn transformers asyncio

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Step 2: Set Up FastAPI&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Create a file named &lt;code&gt;app.py&lt;/code&gt;. Here’s the basic FastAPI setup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;root&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Welcome to Real-Time LLM Streaming!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvicorn app:app &lt;span class="nt"&gt;--reload&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Visit &lt;code&gt;http://127.0.0.1:8000/&lt;/code&gt; in your browser. You should see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Welcome to Real-Time LLM Streaming!"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;Step 3: Build the Streaming Endpoint&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Instead of returning a single response, we’ll &lt;strong&gt;stream it chunk-by-chunk&lt;/strong&gt;. Here’s the idea:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The client makes a request to &lt;code&gt;/stream&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The server "yields" parts of the response as they are generated.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s the code for the streaming endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StreamingResponse&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;event_stream&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulate response delay
&lt;/span&gt;        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: Message &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_response&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;event_stream&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;🔥 Test It:&lt;/p&gt;

&lt;p&gt;Run the server and visit &lt;code&gt;http://127.0.0.1:8000/stream&lt;/code&gt; — you'll see "Message 0", "Message 1", etc., appear every second.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Step 4: Connect the LLM with Transformers&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now, let’s swap out the dummy messages for &lt;strong&gt;LLM-generated responses&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StreamingResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;llm_pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;llm_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_full_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;generated_text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;🔥 Test It:&lt;/p&gt;

&lt;p&gt;Run the server and visit:&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://127.0.0.1:8000/stream?prompt=Once upon a time

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll see the AI model stream the response live.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;4️⃣ Client-Side Integration: Consuming the Stream&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;On the front end, you can use &lt;strong&gt;EventSource&lt;/strong&gt; (a native browser API) to consume the stream.&lt;/p&gt;

&lt;p&gt;Here’s the simplest way to do it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&lt;/span&gt; &lt;span class="na"&gt;lang=&lt;/span&gt;&lt;span class="s"&gt;"en"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;body&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;LLM Streaming Demo&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;pre&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"stream-output"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/pre&amp;gt;&lt;/span&gt;

  &lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getElementById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;stream-output&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;eventSource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;EventSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://127.0.0.1:8000/stream?prompt=Tell me a story&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;eventSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onmessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerText&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will display a live feed of the AI response on your webpage.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;5️⃣ Deploying Your Streaming AI App&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You’ve got it working locally, but now you want to &lt;strong&gt;deploy it to the world&lt;/strong&gt;. Here’s how:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 1: Dockerize the App&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Create a file called &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8

WORKDIR /app
COPY . /app

RUN pip install -r /app/requirements.txt

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "80"]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Step 2: Build and Run the Docker Image&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; streaming-llm &lt;span class="nb"&gt;.&lt;/span&gt;
docker run &lt;span class="nt"&gt;-p&lt;/span&gt; 80:80 streaming-llm

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;6️⃣ Conclusion: What’s Next?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Congratulations! 🎉 You’ve built a real-time, streaming LLM from scratch using &lt;strong&gt;FastAPI&lt;/strong&gt;, &lt;strong&gt;Transformers&lt;/strong&gt;, and &lt;strong&gt;Server-Sent Events&lt;/strong&gt;. Here's what you’ve learned:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;How streaming works&lt;/strong&gt; (and why it matters).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How to use FastAPI for streaming endpoints&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How to stream LLM responses with Hugging Face Transformers&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Where to Go Next?&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Optimize Your LLM&lt;/strong&gt;: Use Hugging Face models like GPT-J or distilGPT2 for smaller, faster models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explore WebSockets&lt;/strong&gt;: For two-way streaming (not just server-&amp;gt;client).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy to Cloud&lt;/strong&gt;: Deploy your app to AWS, GCP, or Heroku.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;🧠 Pro Tip: Add interactive client-side UI, like a chat interface, to create your own mini ChatGPT!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;With this guide, you're ready to &lt;strong&gt;level up your developer skills&lt;/strong&gt; and build interactive, AI-driven experiences. 🚀&lt;/p&gt;

&lt;p&gt;Want to learn more about building Responsive LLMs? Check out my course on newline: &lt;a href="https://www.newline.co/courses/responsive-llm-applications-with-server-sent-events" rel="noopener noreferrer"&gt;Responsive LLM Applications with Server-Sent Events&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I cover :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to design systems for AI applications&lt;/li&gt;
&lt;li&gt;How to stream the answer of a Large Language Model&lt;/li&gt;
&lt;li&gt;Differences between Server-Sent Events and WebSockets&lt;/li&gt;
&lt;li&gt;Importance of real-time for GenAI UI&lt;/li&gt;
&lt;li&gt;How asynchronous programming in Python works&lt;/li&gt;
&lt;li&gt;How to integrate LangChain with FastAPI&lt;/li&gt;
&lt;li&gt;What problems Retrieval Augmented Generation can solve&lt;/li&gt;
&lt;li&gt;How to create an AI agent
... and much more.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>langchain</category>
      <category>fastapi</category>
      <category>llm</category>
    </item>
    <item>
      <title>Integrating LangChain with FastAPI for Asynchronous Streaming</title>
      <dc:creator>Louis Sanna</dc:creator>
      <pubDate>Thu, 12 Dec 2024 15:55:17 +0000</pubDate>
      <link>https://dev.to/louis-sanna/integrating-langchain-with-fastapi-for-asynchronous-streaming-5d0o</link>
      <guid>https://dev.to/louis-sanna/integrating-langchain-with-fastapi-for-asynchronous-streaming-5d0o</guid>
      <description>&lt;p&gt;LangChain and FastAPI working in tandem provide a strong setup for the asynchronous streaming endpoints that LLM-integrated applications need. Modern chat applications live or die by how effectively they handle live data streams and how quickly they can respond.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction to LangChain
&lt;/h2&gt;

&lt;p&gt;LangChain is a library that simplifies the incorporation of language models into applications. It provides an abstracted layer over various components such as large language models (LLMs), data retrievers, and vector storage solutions. This abstraction allows developers to integrate and switch between different backend providers or technologies seamlessly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction to FastAPI
&lt;/h2&gt;

&lt;p&gt;FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints. It is designed for creating RESTful APIs quickly and efficiently, with automatic interactive API documentation provided by Swagger and ReDoc.&lt;/p&gt;

&lt;h2&gt;
  
  
  Combining LangChain with FastAPI
&lt;/h2&gt;

&lt;p&gt;By combining LangChain with FastAPI, developers can create robust, asynchronous streaming APIs that handle real-time data efficiently. This integration is particularly useful for applications that require live updates, such as chat applications or real-time analytics dashboards.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up the FastAPI Project
&lt;/h2&gt;

&lt;p&gt;First, install the necessary packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi langchain pydantic uvicorn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, define the FastAPI router and Pydantic models to structure and validate incoming messages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;APIRouter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;APIRouter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ChatPayload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;schema_extra&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;example&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Who are you?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating the Streaming API Endpoint
&lt;/h2&gt;

&lt;p&gt;Create an endpoint to receive chat messages and stream responses back to the client using LangChain. This is done by emitting server-sent events (SSE).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi.responses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StreamingResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="nd"&gt;@router.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/api/completion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatPayload&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;send_completion_events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_completion_events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;patch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;astream_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;op&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/streamed_output/-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
                &lt;span class="n"&gt;json_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_chunk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="n"&gt;json_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json_str&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;n&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;include_router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the FastAPI Application&lt;/p&gt;

&lt;p&gt;Run the FastAPI application using Uvicorn, an ASGI server implementation for Python.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;uvicorn main:app --reload
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Navigate to &lt;a href="http://127.0.0.1:8000/docs" rel="noopener noreferrer"&gt;http://127.0.0.1:8000/docs&lt;/a&gt; to see the interactive API documentation generated by FastAPI. This documentation provides an easy way to test the API endpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why JSON Patch?
&lt;/h2&gt;

&lt;p&gt;LangChain's astream_log method uses JSON Patch to stream events, which is why understanding JSON Patch is essential for implementing this integration effectively. JSON Patch provides an efficient way to update parts of a JSON document incrementally without needing to send the entire document. This is particularly useful in real-time applications where data needs to be updated frequently and incrementally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Brief Overview of JSON Patch
&lt;/h2&gt;

&lt;p&gt;JSON Patch supports several operation types, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add: Inserts a new value into the JSON document at the specified path.&lt;/li&gt;
&lt;li&gt;Remove: Removes the value at the specified path.&lt;/li&gt;
&lt;li&gt;Replace: Replaces the value at the specified path with a new value.&lt;/li&gt;
&lt;li&gt;Move: Moves a value from one path to another in the document.&lt;/li&gt;
&lt;li&gt;Copy: Copies a value from one path to another.&lt;/li&gt;
&lt;li&gt;Test: Tests that a specified value at a specified path exists.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Consider the original document:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"baz"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qux"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"foo"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bar"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Applying the patch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"op"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"replace"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/baz"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"boo"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"op"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"add"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/hello"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"world"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"op"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"remove"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/foo"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"baz"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"boo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hello"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"world"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;JSON Patch allows for efficient, incremental updates, making it ideal for applications that require frequent or real-time updates to their data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;By integrating LangChain with FastAPI, developers can build efficient asynchronous streaming APIs capable of handling real-time data. This setup is ideal for applications like chatbots, where timely responses and data processing are crucial. FastAPI's ease of use and LangChain's abstraction capabilities, combined with the efficiency of JSON Patch, make this combination a powerful tool for modern web development.&lt;/p&gt;

&lt;p&gt;Want to learn more about building Responsive LLMs? Check out my course on newline: &lt;a href="https://www.newline.co/courses/responsive-llm-applications-with-server-sent-events" rel="noopener noreferrer"&gt;Responsive LLM Applications with Server-Sent Events&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I cover :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to design systems for AI applications&lt;/li&gt;
&lt;li&gt;How to stream the answer of a Large Language Model&lt;/li&gt;
&lt;li&gt;Differences between Server-Sent Events and WebSockets&lt;/li&gt;
&lt;li&gt;Importance of real-time for GenAI UI&lt;/li&gt;
&lt;li&gt;How asynchronous programming in Python works&lt;/li&gt;
&lt;li&gt;How to integrate LangChain with FastAPI&lt;/li&gt;
&lt;li&gt;What problems Retrieval Augmented Generation can solve&lt;/li&gt;
&lt;li&gt;How to create an AI agent
... and much more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Worth checking out if you want to build your own LLM applications - in the course I will provide extensive code examples and help you go from concept to deployment.&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>fastapi</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
