<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: DigitalOcean</title>
    <description>The latest articles on DEV Community by DigitalOcean (@digitalocean).</description>
    <link>https://dev.to/digitalocean</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F175%2F369f1227-0eac-4a88-8d3c-08851bf0b117.png</url>
      <title>DEV Community: DigitalOcean</title>
      <link>https://dev.to/digitalocean</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/digitalocean"/>
    <language>en</language>
    <item>
      <title>The Hidden Cost of Complex AI Platforms: Why Developer Experience Matters</title>
      <dc:creator>Shaoni Mukherjee</dc:creator>
      <pubDate>Thu, 28 May 2026 16:00:00 +0000</pubDate>
      <link>https://dev.to/digitalocean/the-hidden-cost-of-complex-ai-platforms-why-developer-experience-matters-1bb5</link>
      <guid>https://dev.to/digitalocean/the-hidden-cost-of-complex-ai-platforms-why-developer-experience-matters-1bb5</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Developer experience is a real cost, not a soft metric&lt;/strong&gt;: Time lost in setup, debugging, and switching tools directly slows down how fast teams can build and iterate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Most friction comes from fragmented workflows&lt;/strong&gt;: When model hosting, compute, and deployment live in different places, even simple tasks become multi-step processes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time-to-First-Value (TTFV) is a critical signal:&lt;/strong&gt; The longer it takes to get a working output, the more likely teams are to lose momentum or abandon ideas early.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling introduces a hidden breaking point:&lt;/strong&gt; Moving from a simple API to dedicated infrastructure often forces teams to relearn workflows and rebuild systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;This is a systems problem, not a feature gap&lt;/strong&gt;: Many platforms weren’t designed end-to-end, which leads to disconnected experiences as teams grow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The fastest teams aren’t just using better models&lt;/strong&gt;: They’re working in environments where they can build, test, and scale without constant reconfiguration.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;The &lt;a href="https://www.digitalocean.com/resources/articles/leading-ai-cloud-providers" rel="noopener noreferrer"&gt;cloud AI platform&lt;/a&gt; ecosystem today looks more powerful than ever, with access to powerful GPUs like NVIDIA H100 and H200, massive libraries of pre-trained models, and full pipelines for &lt;a href="https://www.digitalocean.com/community/tutorials/fine-tuning-llms-on-budget-digitalocean-gpu" rel="noopener noreferrer"&gt;fine-tuning&lt;/a&gt; and inference.&lt;/p&gt;

&lt;p&gt;​​I recently tried deploying a simple inference endpoint for a model. Ideally, it should have taken a few minutes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;provision compute&lt;/li&gt;
&lt;li&gt;load the model&lt;/li&gt;
&lt;li&gt;send a request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead, it took closer to &lt;strong&gt;two hours&lt;/strong&gt; before I got a successful response.&lt;/p&gt;

&lt;p&gt;Not because the model was difficult to run, but because of everything around it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Figuring out where to start&lt;/li&gt;
&lt;li&gt;No clear documentation&lt;/li&gt;
&lt;li&gt;Generating and configuring the right credentials&lt;/li&gt;
&lt;li&gt;Troubleshooting why the instance wasn’t accessible&lt;/li&gt;
&lt;li&gt;Installing dependencies that weren’t preconfigured&lt;/li&gt;
&lt;li&gt;Retrying after unclear or failed setup steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these steps was particularly complex on its own. But together, they created enough friction to delay even a basic task.&lt;/p&gt;

&lt;p&gt;This pattern shows up often when working with AI platforms today.&lt;/p&gt;

&lt;p&gt;Most discussions focus on visible costs like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compute pricing&lt;/li&gt;
&lt;li&gt;Storage usage&lt;/li&gt;
&lt;li&gt;API costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But in practice, the higher cost is harder to measure.&lt;/p&gt;

&lt;p&gt;It’s the time spent navigating setup, resolving infrastructure issues, and figuring out how different parts of a platform fit together before any real work begins.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real cost of building AI systems
&lt;/h2&gt;

&lt;p&gt;When teams evaluate AI platforms, the focus usually stays on obvious metrics like compute pricing or model performance. But the actual cost of &lt;a href="https://www.digitalocean.com/community/tutorials/build-ai-agents-the-right-way" rel="noopener noreferrer"&gt;building AI systems&lt;/a&gt; runs much deeper. It shows up in how long it takes to get started, how mentally demanding the platform is, and how much time is lost dealing with infrastructure instead of building products.&lt;/p&gt;

&lt;p&gt;One of the most overlooked factors is &lt;strong&gt;Time-to-First-Value (TTFV)&lt;/strong&gt;, the time it takes to go from signing up on a platform to getting your first meaningful output.&lt;/p&gt;

&lt;p&gt;But when TTFV stretches into hours or even days due to setup issues, unclear steps, or complex configuration, it creates friction right from the start. Developers lose patience, delay experimentation, or abandon the platform altogether. Over time, this directly impacts developer retention and slows down innovation, because fewer ideas make it past the initial stage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fragmentation: When one platform feels like many
&lt;/h2&gt;

&lt;p&gt;Imagine when a developer tries to log in and finds out multiple logins to separate platforms, which feels not only confusing but also hard to understand. When a single platform feels like multiple disconnected products stitched together.&lt;/p&gt;

&lt;p&gt;On the surface, everything may exist under one umbrella. But once you start using it, the experience tells a different story.&lt;/p&gt;

&lt;h3&gt;
  
  
  Split product surfaces
&lt;/h3&gt;

&lt;p&gt;On platforms like &lt;a href="https://nebius.com/" rel="noopener noreferrer"&gt;Nebius&lt;/a&gt;, you have AI Cloud and Token Factory, which require separate logins; this infrastructure feels like two separate worlds.&lt;/p&gt;

&lt;p&gt;You might provision compute in one place, manage models in another, and handle access or tokens somewhere else entirely. Each part works on its own, but they don’t always feel connected.&lt;/p&gt;

&lt;p&gt;For example, a developer might:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set up a GPU instance in one interface&lt;/li&gt;
&lt;li&gt;Switch to another section to access models&lt;/li&gt;
&lt;li&gt;Move again to configure authentication or tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even though it’s technically one platform, it doesn’t feel like a single, cohesive system. This lack of cohesion forces developers to constantly piece together workflows on their own.&lt;/p&gt;

&lt;h3&gt;
  
  
  Confusing navigation
&lt;/h3&gt;

&lt;p&gt;Fragmentation often leads to a simple but frustrating question:&lt;br&gt;
 &lt;strong&gt;“Where do I even start?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When features are spread across different sections or products, developers are left guessing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which interface should I use first?&lt;/li&gt;
&lt;li&gt;Where do I run my model?&lt;/li&gt;
&lt;li&gt;Where do I manage credentials or access?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of a clear starting point, the experience becomes exploratory—and not in a good way.&lt;/p&gt;

&lt;p&gt;A common situation is having to jump between different portals just to complete a basic setup. For instance, setting up access in one place and then realizing you need to log into a completely different interface to actually use it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Broken flow
&lt;/h3&gt;

&lt;p&gt;This fragmentation becomes even more apparent when workflows are interrupted.&lt;/p&gt;

&lt;p&gt;Developers may encounter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separate logins for different parts of the platform&lt;/li&gt;
&lt;li&gt;Different dashboards that don’t share context&lt;/li&gt;
&lt;li&gt;Disconnected user experiences that don’t carry over progress&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What fragmentation looks like
&lt;/h3&gt;

&lt;p&gt;A typical workflow, for example, building and deploying an agent, might look simple:&lt;/p&gt;

&lt;p&gt;But instead of happening in a single, continuous flow, each step exists in a different part of the platform.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Compute is managed in one dashboard&lt;/li&gt;
&lt;li&gt;Model configuration happens in another section&lt;/li&gt;
&lt;li&gt;Workflows are defined in a separate interface&lt;/li&gt;
&lt;li&gt;Logs and monitoring are located somewhere else&lt;/li&gt;
&lt;li&gt;Access and credentials are handled independently&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each step works on its own.&lt;/p&gt;

&lt;h3&gt;
  
  
  The hidden cost
&lt;/h3&gt;

&lt;p&gt;Fragmentation usually doesn’t hurt in the beginning. When a single developer is experimenting, it’s still manageable to move between different sections of a platform and piece things together. The problem starts when the team grows, and the workflow becomes more complex. This typically happens when:&lt;/p&gt;

&lt;p&gt;1) Multiple components like models, agents, and data sources are involved,&lt;/p&gt;

&lt;p&gt;2) More than one developer is working on the system, and&lt;/p&gt;

&lt;p&gt;3) Faster iteration and debugging become important.&lt;/p&gt;

&lt;p&gt;At this stage, constantly switching between interfaces, tools, and dashboards slows everything down because there is no single place to see or manage the full workflow. This issue exists because most platforms are not built as a unified system from the start.&lt;/p&gt;

&lt;p&gt;Fragmentation is not about missing features, but it is about how those features are connected to make it feel like a single system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The anti-developer experience
&lt;/h2&gt;

&lt;p&gt;A common pattern across many AI platforms is asking developers to commit before they’ve had a chance to see real value.&lt;/p&gt;

&lt;p&gt;In some cases, you’re required to add billing details even before running your first model. In others, the free credits are so limited that you can barely complete a meaningful experiment. You might start testing an idea, only to run out of credits halfway through, without fully understanding whether it works.&lt;/p&gt;

&lt;p&gt;This creates psychological friction.&lt;/p&gt;

&lt;p&gt;Instead of freely exploring, developers become cautious. They hesitate to try new models, avoid running multiple experiments, and constantly think about cost rather than creativity. The experience shifts from curiosity to calculation.&lt;/p&gt;

&lt;p&gt;But better-designed platforms take a different approach.&lt;/p&gt;

&lt;p&gt;They give developers enough room to explore properly, sometimes even offering generous free credits, so you can actually spin up resources, run models, and experiment without immediate pressure. You can try things, make mistakes, and learn before worrying about billing.&lt;/p&gt;

&lt;p&gt;Because once developers see something work, they’re far more likely to continue building.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scaling cliff nobody talks about
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.digitalocean.com/resources/articles/inference-as-service" rel="noopener noreferrer"&gt;Inference-as-a-service&lt;/a&gt; feels effortless in the beginning. You send a request to an API, get a response, and move on. There is no need to think about infrastructure, scaling, or deployment. This makes it incredibly effective during the early stages, where the focus is on building quickly, experimenting, and testing ideas without friction.&lt;/p&gt;

&lt;p&gt;In this phase, everything works because the system is still small.&lt;/p&gt;

&lt;p&gt;1) The number of requests is low,&lt;/p&gt;

&lt;p&gt;2) Latency is not critical, and&lt;/p&gt;

&lt;p&gt;3) Occasional failures are acceptable.&lt;/p&gt;

&lt;p&gt;The platform handles everything behind the scenes, allowing developers to focus entirely on the product.&lt;/p&gt;

&lt;p&gt;The problem starts when the system begins to grow.&lt;/p&gt;

&lt;p&gt;As usage increases, the same setup is now operating under very different conditions. More users mean more requests, often happening at the same time. Latency is no longer just a technical detail; it becomes part of the user experience. Failures are no longer minor inconveniences; they directly impact reliability.&lt;/p&gt;

&lt;p&gt;This is where cracks begin to appear.&lt;/p&gt;

&lt;h3&gt;
  
  
  A common scaling cliff in inference
&lt;/h3&gt;

&lt;p&gt;A typical early setup looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A hosted model endpoint&lt;/li&gt;
&lt;li&gt;Pay-per-request pricing&lt;/li&gt;
&lt;li&gt;No infrastructure management&lt;/li&gt;
&lt;li&gt;Acceptable latency (often in the 300–500 ms range)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At low to moderate usage, this model works well. Teams can ship quickly, iterate rapidly, and avoid thinking about GPUs or deployment complexity.&lt;/p&gt;

&lt;p&gt;The problem is not at the start, but it emerges when usage becomes &lt;em&gt;predictable and sustained&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where things start breaking
&lt;/h3&gt;

&lt;p&gt;As request volume grows (for example, into the range of thousands of requests per day), a consistent pattern of issues begins to appear:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Latency variability increases&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cold starts become more frequent&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://medium.com/javarevisited/mastering-latency-metrics-p90-p95-p99-d5427faea879" rel="noopener noreferrer"&gt;P95&lt;/a&gt; latency spikes unpredictably&lt;/li&gt;
&lt;li&gt;Limited ability to tune performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Cost efficiency degrades&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pay-per-request pricing scales linearly with usage&lt;/li&gt;
&lt;li&gt;No optimization for steady workloads&lt;/li&gt;
&lt;li&gt;The same workload becomes disproportionately expensive&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Lack of capacity guarantees&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No predictable throughput&lt;/li&gt;
&lt;li&gt;No visibility into resource allocation&lt;/li&gt;
&lt;li&gt;No way to reserve or prioritize compute&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this stage, the limitation is not a missing feature but a mismatch between the pricing and deployment model and the workload.&lt;/p&gt;

&lt;h3&gt;
  
  
  The forced transition
&lt;/h3&gt;

&lt;p&gt;The natural next step is moving to dedicated infrastructure.&lt;/p&gt;

&lt;p&gt;In practice, this transition introduces significant complexity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Selecting GPU types without clear workload mapping&lt;/li&gt;
&lt;li&gt;Configuring deployment environments manually&lt;/li&gt;
&lt;li&gt;Implementing autoscaling policies&lt;/li&gt;
&lt;li&gt;Managing routing, load balancing, and failure handling&lt;/li&gt;
&lt;li&gt;Rebuilding abstractions that were previously handled by the platform&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What begins as a simple API integration evolves into a full infrastructure problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  The real cost
&lt;/h3&gt;

&lt;p&gt;Teams are forced to shift from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product iteration → infrastructure management&lt;/li&gt;
&lt;li&gt;Application logic → deployment tuning&lt;/li&gt;
&lt;li&gt;Fast experimentation → operational maintenance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shift directly impacts development velocity.&lt;/p&gt;

&lt;p&gt;In many cases, the bottleneck is no longer model performance or GPU access, but the effort required to operate the system reliably at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this matters
&lt;/h3&gt;

&lt;p&gt;Inference is often presented as two separate modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Serverless APIs for getting started&lt;/li&gt;
&lt;li&gt;Dedicated infrastructure for scaling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, the transition between these modes is fragmented.&lt;/p&gt;

&lt;p&gt;This creates a gap where teams:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Overpay for convenience longer than they should&lt;/li&gt;
&lt;li&gt;Delay scaling due to operational complexity&lt;/li&gt;
&lt;li&gt;Or prematurely invest in infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The issue is not the availability of tools.&lt;br&gt;
 It is the lack of a smooth, continuous path between them.&lt;/p&gt;

&lt;p&gt;This is a structural problem in the current inference ecosystem — and one that directly impacts how quickly teams can move from prototype to production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it feels like a cliff
&lt;/h3&gt;

&lt;p&gt;This shift feels difficult not just because there is more to do, but because the change is abrupt.&lt;/p&gt;

&lt;p&gt;Teams go from a world where everything is abstracted behind a simple API to one where they are responsible for compute, scaling, and reliability. There is no gradual transition between these two states.&lt;/p&gt;

&lt;p&gt;There is no middle layer that offers both simplicity and control.&lt;/p&gt;

&lt;p&gt;That is why it feels like a cliff instead of a smooth progression.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this happens
&lt;/h3&gt;

&lt;p&gt;This gap exists because platforms are built with different starting points. Inference-focused platforms are designed for simplicity and fast onboarding, so they abstract away infrastructure details. Compute-focused platforms, on the other hand, are built for flexibility and performance, which means they require deeper involvement from the developer.&lt;/p&gt;

&lt;p&gt;Over time, both types of platforms try to expand their capabilities. Inference platforms add more control, and compute platforms add higher-level abstractions. But these additions are layered on top rather than designed as a unified system.&lt;/p&gt;

&lt;p&gt;As a result, the transition between simplicity and control is not seamless.&lt;/p&gt;

&lt;h3&gt;
  
  
  The real impact
&lt;/h3&gt;

&lt;p&gt;This shift usually happens at a critical moment, when the product is gaining traction and needs to scale reliably.&lt;/p&gt;

&lt;p&gt;Instead of focusing on improving the product, teams find themselves dealing with infrastructure, performance issues, and system stability. The pace of development slows down, not because the problem is harder, but because the platform now requires significantly more effort to manage.&lt;/p&gt;

&lt;p&gt;It is what happens when they begin to work at scale, and the platform that once made things easy is no longer enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  What good AI platforms actually look like
&lt;/h2&gt;

&lt;p&gt;After all the friction, the starting problem, platform debugging, understanding the documentation, and platform fragmentation, it is easy to think the problem is missing features, but it's not.&lt;/p&gt;

&lt;p&gt;Most platforms already have the same core capabilities. What actually matters is how much effort it takes to go from an idea to something that works and keep it working as it grows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1&lt;/strong&gt;: Building an AI Agent in an Integrated Workflow&lt;br&gt;
Consider building a simple AI agent or &lt;a href="https://www.digitalocean.com/community/tutorials/build-ai-agent-chatbot-with-gradient-platform" rel="noopener noreferrer"&gt;chatbot&lt;/a&gt; on an integrated platform where models, Knowledge bases, embedding models, and workflows are available in one place.&lt;/p&gt;

&lt;p&gt;A simpler platform will make this process pretty straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select the model&lt;/li&gt;
&lt;li&gt;Define the agent logic&lt;/li&gt;
&lt;li&gt;Add appropriate knowledge base&lt;/li&gt;
&lt;li&gt;Add a data source to your knowledge base&lt;/li&gt;
&lt;li&gt;Run a test input&lt;/li&gt;
&lt;li&gt;Make your agent publicly available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And that's it. What stands out in this setup is not the number of features, but how the flow behaves.&lt;/p&gt;

&lt;p&gt;You don’t need to switch between multiple interfaces to connect components. The model, workflow, and execution are visible in the same place. When you make a change, it reflects immediately without requiring additional setup or restarts.&lt;/p&gt;

&lt;p&gt;If something fails, the issue is tied directly to the step where it happened. You don’t have to search across different dashboards to understand what went wrong.&lt;/p&gt;

&lt;p&gt;The experience feels continuous.&lt;/p&gt;

&lt;p&gt;You start with an idea, implement it, and see the result without getting pulled into infrastructure or configuration issues.&lt;/p&gt;

&lt;p&gt;This is what a unified workflow looks like in practice, not just having all the pieces, but having them work together in a way that reduces effort at every step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2:&lt;/strong&gt; Consider a setup where a team moves from a basic API-based workflow to dedicated inference in order to handle real user traffic more reliably.&lt;/p&gt;

&lt;p&gt;The goal is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy a model with dedicated capacity&lt;/li&gt;
&lt;li&gt;Send requests through a stable endpoint&lt;/li&gt;
&lt;li&gt;Maintain consistent response times&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What changes in this setup is not the workflow itself, but how predictable it becomes.&lt;/p&gt;

&lt;p&gt;Once the model is deployed on dedicated infrastructure, requests are no longer competing for shared resources. Response times become more consistent, even as usage increases. Instead of worrying about rate limits or sudden slowdowns, the system behaves in a way that is easier to reason about.&lt;/p&gt;

&lt;p&gt;At the same time, the transition does not require rebuilding everything from scratch. The way requests are sent and responses are handled remains familiar. The difference is that there is more control over how the system performs under load.&lt;/p&gt;

&lt;p&gt;If something needs to be adjusted, such as scaling capacity or tuning performance, it can be done without changing the core application logic.&lt;/p&gt;

&lt;p&gt;This is where dedicated inference makes a difference in practice, not by adding complexity, but by making the system more stable as it grows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You don’t switch contexts to get basic work done&lt;/strong&gt;
In a well-designed platform, deploying a model, testing it, and monitoring it all happen in one place. You’re not jumping between dashboards, CLI tools, and cloud consoles just to complete a single workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time-to-First-Value (TTFV) stays consistently low&lt;/strong&gt;
It shouldn’t take hours to figure out how to get a model running. A good platform makes the “first successful response” happen quickly — not just in ideal conditions, but even when you’re unfamiliar with the setup. If you’re spending time debugging environment issues instead of validating outputs, that’s a design failure, not a user error.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The path from prototype to scale doesn’t change shape&lt;/strong&gt;
One of the biggest failure points in current platforms is that the workflow breaks when you scale. A well-designed system keeps the same mental model — the way you deploy and interact with a model at a small scale should still work when traffic increases. You shouldn’t need to relearn everything just to handle more requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure decisions are abstracted until they actually matter&lt;/strong&gt;
You shouldn’t need to think about GPU types, networking, or provisioning just to test an idea. Good platforms delay these decisions without hiding them completely — they only surface when you have a real reason to care, like optimizing latency or cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failure modes are visible and easy to debug&lt;/strong&gt;
When something breaks, it’s obvious where and why. You’re not digging through multiple systems trying to trace a failed request. Logs, errors, and performance signals are tied directly to the workflow you’re already using, so debugging doesn’t become a separate project.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The hardest part of building AI systems today isn’t getting access to models or GPUs, but it’s everything that happens around them.&lt;/p&gt;

&lt;p&gt;It’s the time lost moving between tools.&lt;/p&gt;

&lt;p&gt;It’s the friction of stitching together workflows that were never designed to work as one.&lt;br&gt;
It’s the moment when something that worked at a small scale suddenly forces a complete rewrite.&lt;/p&gt;

&lt;p&gt;And most of this doesn’t show up in benchmarks or pricing comparisons. It shows up in delays, workarounds, and abandoned ideas.&lt;/p&gt;

&lt;p&gt;The teams that will win on inference aren’t the ones with the most compute. They’re the ones that can move from idea to working system and then to scale without having to change how they build along the way.&lt;/p&gt;

&lt;p&gt;The real question isn’t which platform has the best features.&lt;/p&gt;

&lt;p&gt;It’s this:&lt;br&gt;
&lt;strong&gt;How many times does your workflow break before you get to something that actually works?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/javarevisited/mastering-latency-metrics-p90-p95-p99-d5427faea879" rel="noopener noreferrer"&gt;Mastering Latency Metrics: P90, P95, P99&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/bulk-inference-content-pipeline-digitalocean-serverless" rel="noopener noreferrer"&gt;How I Built a Content Generation Pipeline Using DigitalOcean Serverless Inference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wired.com/story/the-ai-industrys-scaling-obsession-is-headed-for-a-cliff/" rel="noopener noreferrer"&gt;The AI Industry’s Scaling Obsession Is Headed for a Cliff&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>developers</category>
      <category>discuss</category>
      <category>devex</category>
    </item>
    <item>
      <title>How to Deploy Hermes' Self-Improving AI Agent</title>
      <dc:creator>haimantika mitra</dc:creator>
      <pubDate>Tue, 26 May 2026 16:00:00 +0000</pubDate>
      <link>https://dev.to/digitalocean/how-to-deploy-hermes-self-improving-ai-agent-4gm6</link>
      <guid>https://dev.to/digitalocean/how-to-deploy-hermes-self-improving-ai-agent-4gm6</guid>
      <description>&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You host Hermes on a DigitalOcean Droplet (Ubuntu 24.04, at least 2 vCPUs and 4 GB RAM), install with the official script, and run &lt;code&gt;hermes setup&lt;/code&gt; for your LLM provider.&lt;/li&gt;
&lt;li&gt;The Telegram bot uses &lt;code&gt;hermes gateway setup&lt;/code&gt; and a persistent gateway service so Hermes stays reachable after you close SSH.&lt;/li&gt;
&lt;li&gt;Skills are Markdown files under &lt;code&gt;~/.hermes/skills/&lt;/code&gt;. MCP servers go in Hermes config and expose tools such as grocery search and cart actions.&lt;/li&gt;
&lt;li&gt;HTTP MCP with OAuth on a headless server uses the URL Hermes prints plus an SSH tunnel from your laptop browser to the Droplet loopback port.&lt;/li&gt;
&lt;li&gt;The grocery walkthrough uses Swiggy Instamart where available; you swap in another MCP URL for your region or stack.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;2026 is the year of agents doing the task for us rather than just guiding us to. We saw the rise of OpenClaw and now we have &lt;a href="https://hermes-agent.nousresearch.com/docs/" rel="noopener noreferrer"&gt;Hermes agent&lt;/a&gt;. It is an open-source, self-improving AI agent built by &lt;a href="https://nousresearch.com/" rel="noopener noreferrer"&gt;Nous Research&lt;/a&gt;. It has a built-in learning loop that creates skills from experience, improves them during use, nudges itself to persist knowledge, and builds a deepening model of who you are across sessions.&lt;/p&gt;

&lt;p&gt;In this tutorial, you will deploy Hermes on a &lt;a href="https://www.digitalocean.com/products/droplets" rel="noopener noreferrer"&gt;DigitalOcean Droplet&lt;/a&gt;, connect it to Telegram, and extend it with a custom skill. As a practical example, you will see how to build a grocery tracking agent that monitors daily consumption, alerts you when stock is low, and places orders automatically through a grocery delivery service.&lt;/p&gt;

&lt;p&gt;The grocery example uses &lt;a href="https://github.com/Swiggy/swiggy-mcp-server-manifest" rel="noopener noreferrer"&gt;Swiggy Instamart&lt;/a&gt;, which is available in India. But the approach works with any MCP-compatible service, such as a local grocery API, a task manager, a calendar, or a home automation system. The pattern is the same no matter what you connect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before you begin, you will need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A DigitalOcean account. If you do not have one, &lt;a href="https://cloud.digitalocean.com/registrations/new" rel="noopener noreferrer"&gt;sign up here&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;A &lt;a href="https://www.digitalocean.com/community/tutorials/initial-server-setup-with-ubuntu" rel="noopener noreferrer"&gt;DigitalOcean Droplet&lt;/a&gt; running Ubuntu 24.04 with at least 2 vCPUs and 4 GB RAM.
&lt;/li&gt;
&lt;li&gt;A Telegram account.
&lt;/li&gt;
&lt;li&gt;API key from an LLM provider. Hermes supports Anthropic, OpenAI, OpenRouter, and others.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What are we building
&lt;/h2&gt;

&lt;p&gt;Hermes is model-agnostic and platform-agnostic. It can connect to any tool that supports the &lt;a href="https://www.digitalocean.com/community/tutorials/control-apps-using-mcp-server" rel="noopener noreferrer"&gt;Model Context Protocol (MCP)&lt;/a&gt;, and it can send and receive messages on Telegram, WhatsApp, Discord, and Slack. You can find more information on their &lt;a href="https://hermes-agent.nousresearch.com/docs/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The architecture in this tutorial looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv19jyonedh6059nid2fo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv19jyonedh6059nid2fo.png" alt="Architecture" width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s get started building:&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1. Create the Droplet
&lt;/h2&gt;

&lt;p&gt;Sign in to your DigitalOcean account and &lt;a href="https://cloud.digitalocean.com/droplets/new" rel="noopener noreferrer"&gt;create a new Droplet&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;On the creation page, select:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Region:&lt;/strong&gt; The one closest to you
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image:&lt;/strong&gt; Ubuntu 24.04 LTS
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan:&lt;/strong&gt; Basic, with at least 2 vCPUs and 4 GB RAM (the &lt;code&gt;s-2vcpu-4gb&lt;/code&gt; size)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authentication:&lt;/strong&gt; SSH Key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you need help adding an SSH key, follow the &lt;a href="https://docs.digitalocean.com/products/droplets/how-to/add-ssh-keys/" rel="noopener noreferrer"&gt;DigitalOcean SSH key guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Once the Droplet is created, SSH into it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh root@YOUR_DROPLET_IP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Update the system before installing anything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt upgrade &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2. Install Hermes Agent
&lt;/h2&gt;

&lt;p&gt;Hermes provides an install script that handles all dependencies including Python, &lt;code&gt;uv&lt;/code&gt;, and the &lt;code&gt;hermes&lt;/code&gt; binary itself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once it finishes, reload your shell so the &lt;code&gt;hermes&lt;/code&gt; command is available:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source&lt;/span&gt; ~/.bashrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the setup wizard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes setup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The wizard will ask you to choose an LLM provider and enter your API key. Select your provider, paste your key, and Hermes will save it to &lt;code&gt;~/.hermes/.env&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Verify the installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3. Connect Hermes to Telegram
&lt;/h2&gt;

&lt;p&gt;Hermes connects to Telegram through a bot. Run the gateway setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes gateway setup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When it asks which platform to use, select Telegram. Hermes will walk you through creating a bot with Telegram's BotFather. Follow the steps in your terminal.&lt;/p&gt;

&lt;p&gt;Once setup is complete, start the gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes gateway start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To keep it running after you disconnect from the Droplet, enable it as a system service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;hermes-gateway
systemctl start hermes-gateway
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open Telegram, find the bot you just created, and send it a message like "hello." If it responds, Hermes is connected and ready.&lt;/p&gt;

&lt;p&gt;You can now talk to Hermes from anywhere on your phone. Ask it to check the weather, set a reminder, search the web, or run a command on your Droplet. It handles all of this out of the box.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4. Understand skills and MCP servers
&lt;/h3&gt;

&lt;p&gt;Before building the automation example, let's understand two core Hermes concepts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills&lt;/strong&gt; - These are Markdown files that teach Hermes how to handle specific tasks. You write a skill file describing what the task is, what triggers it, and what steps to follow. Hermes reads all skill files in &lt;code&gt;~/.hermes/skills/&lt;/code&gt; at startup and uses them to handle relevant requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP Servers&lt;/strong&gt; - It gives Hermes access to external services. &lt;a href="https://www.digitalocean.com/community/tutorials/model-context-protocol" rel="noopener noreferrer"&gt;MCP (Model Context Protocol)&lt;/a&gt; is an open standard that lets AI agents communicate with APIs in a structured way. If a service publishes an MCP server, Hermes can search its products, manage carts, read calendars, create tasks, and more. You add MCP servers to your Hermes config and Hermes uses them automatically when relevant.&lt;/p&gt;

&lt;p&gt;Together, skills and MCP servers let you build automations that are specific to your life and the tools you use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5. Build real-world automation: Grocery tracking
&lt;/h2&gt;

&lt;p&gt;Personally, I am very bad at keeping a tab of groceries and often find myself out of essential stock just when I am about to cook. So I wanted to automate this process for me. I eat similar food everyday and also measure my calories. I used all of this information to automate grocery shopping for me.&lt;/p&gt;

&lt;p&gt;The goal of Hermes was to track my daily grocery consumption, alert me on Telegram when something is running low, and place an order through a grocery delivery service when I say yes.&lt;/p&gt;

&lt;p&gt;This example uses the &lt;a href="https://github.com/Swiggy/swiggy-mcp-server-manifest" rel="noopener noreferrer"&gt;Swiggy Instamart MCP server&lt;/a&gt;, which is available in India. If you are in a different country, you can swap it out for any MCP-compatible grocery or delivery service. The skill logic stays exactly the same.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating the skill file
&lt;/h3&gt;

&lt;p&gt;Create a directory for the skill:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.hermes/skills/grocery
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create the skill file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nano ~/.hermes/skills/grocery/grocery_tracker.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paste the following and edit the &lt;code&gt;groceries:&lt;/code&gt; section to match your actual diet and quantities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Grocery Auto-Order Skill&lt;/span&gt;

&lt;span class="gu"&gt;### My Daily Grocery List&lt;/span&gt;

groceries:
&lt;span class="p"&gt;  -&lt;/span&gt; name: "Milk"
    unit: "ml"
    daily_consumption: 500
    reorder_quantity: 2000
    alert_threshold_days: 3
    search_query: "fresh milk 1 litre"
&lt;span class="p"&gt;
  -&lt;/span&gt; name: "Eggs"
    unit: "pieces"
    daily_consumption: 2
    reorder_quantity: 12
    alert_threshold_days: 3
    search_query: "eggs 12 pack"
&lt;span class="p"&gt;
  -&lt;/span&gt; name: "Oats"
    unit: "gm"
    daily_consumption: 60
    reorder_quantity: 1000
    alert_threshold_days: 7
    search_query: "rolled oats 1kg"

  # Add your own items following the same format

&lt;span class="gu"&gt;### Instructions for Hermes&lt;/span&gt;

&lt;span class="gu"&gt;#### Daily check&lt;/span&gt;
Read ~/.hermes/grocery_inventory.json. Subtract each item's
daily_consumption from its current quantity. For any item where
quantity / daily_consumption is less than or equal to
alert_threshold_days, send a Telegram alert listing the low
items and ask if the user wants to place an order.

&lt;span class="gu"&gt;#### On YES&lt;/span&gt;
Search each low item using its search_query at the user's saved
delivery address. Add reorder_quantity of each to cart. Send a
cart summary via Telegram with prices and total. Wait for CONFIRM.

&lt;span class="gu"&gt;#### On CONFIRM&lt;/span&gt;
Place the order. Update grocery_inventory.json with the restocked
quantities. Send a delivery confirmation message.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save with &lt;code&gt;Ctrl+O&lt;/code&gt;, then &lt;code&gt;Enter&lt;/code&gt;, then exit with &lt;code&gt;Ctrl+X&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating the inventory file
&lt;/h3&gt;

&lt;p&gt;The inventory file tracks how much of each item you have at home right now.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nano ~/.hermes/grocery_inventory.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"last_updated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-07"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"delivery_address"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR FULL ADDRESS HERE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"snooze_until"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Milk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"quantity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"unit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ml"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Eggs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"quantity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"unit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pieces"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Oats"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"quantity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"unit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gm"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace the quantities with what you actually have at home. Save and exit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connect to an MCP server
&lt;/h3&gt;

&lt;p&gt;Add the grocery service MCP to your Hermes config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes config edit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scroll to the bottom of the file and add your MCP server. For Swiggy Instamart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;mcp_servers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;swiggy-instamart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://mcp.swiggy.com/im"&lt;/span&gt;
    &lt;span class="na"&gt;auth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;oauth&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For any other MCP-compatible service, replace the name and URL with the values from that service's documentation.&lt;/p&gt;

&lt;p&gt;Save and exit, then verify it appears:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes mcp list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 6. Authenticate with your MCP server
&lt;/h2&gt;

&lt;p&gt;Most MCP servers require OAuth authentication. Because your Droplet is headless, the OAuth callback needs to reach your browser through an &lt;a href="https://www.digitalocean.com/community/tutorials/ssh-port-forwarding" rel="noopener noreferrer"&gt;SSH tunnel&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Run the login command on your Droplet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes mcp login swiggy-instamart
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hermes will print a URL containing a port number in the redirect URI, like &lt;code&gt;http://127.0.0.1:45123/callback&lt;/code&gt;. Note that port number.&lt;/p&gt;

&lt;p&gt;On your local machine, open a new terminal and run the SSH tunnel using that port:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh &lt;span class="nt"&gt;-i&lt;/span&gt; ~/.ssh/id_ed25519 &lt;span class="nt"&gt;-L&lt;/span&gt; 45123:127.0.0.1:45123 root@YOUR_DROPLET_IP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep that terminal open, then immediately open the URL from your Droplet in your browser and complete the login. You have about 30 seconds before the token expires, so have both terminals ready before you start.&lt;/p&gt;

&lt;p&gt;When authentication succeeds, your Droplet terminal will show:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✓ Authenticated — tools available
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Telling Hermes Your Current Stock
&lt;/h3&gt;

&lt;p&gt;Start Hermes and initialize the inventory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Set up my grocery tracker. Ask me for current stock levels for each item in the grocery_tracker skill.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hermes will ask about each item one by one. Answer with what you currently have at home and it will update &lt;code&gt;grocery_inventory.json&lt;/code&gt; automatically. Once you have given the update, you will receive a message similar to this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fio61346lvkbkrs9fvvvy.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fio61346lvkbkrs9fvvvy.jpeg" alt="image2" width="708" height="1536"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up the daily alert
&lt;/h3&gt;

&lt;p&gt;Still inside Hermes, set up the cron job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Add a daily cron at 8am: check my grocery inventory using the grocery_tracker skill, subtract daily consumption, and send me a Telegram message if anything is running low.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hermes will schedule this and confirm. From now on, every morning it checks your stock and messages you on Telegram if you need to reorder. This is how it looks:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcclwz1w6o4zjddrvi1r.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcclwz1w6o4zjddrvi1r.jpeg" alt="image3" width="800" height="1482"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7. Test the full flow
&lt;/h2&gt;

&lt;p&gt;Send a test message to your Telegram bot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Check my groceries and tell me what's running low.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should receive a Telegram message like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flblvjle66s7ejl9uvviu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flblvjle66s7ejl9uvviu.png" alt="image4" width="800" height="1734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reply &lt;strong&gt;YES&lt;/strong&gt;. Hermes searches your connected grocery service, builds a cart, and sends a summary.&lt;/p&gt;

&lt;p&gt;Once everything is set up, you can manage your agent entirely from Telegram:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Message&lt;/th&gt;
&lt;th&gt;What Hermes does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Check my groceries&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Shows all items and days of stock remaining&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;I bought 12 eggs&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Updates egg stock in the inventory file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Order groceries now&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Skips the check and goes straight to ordering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;What's running low?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Lists items near their alert threshold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SKIP&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Snoozes today's alert until tomorrow&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You now have a self-hosted AI agent running on DigitalOcean that works for you around the clock. Hermes connects to Telegram and more messaging platforms so you can reach it from anywhere, and skills plus MCP servers let you extend it to handle almost anything.&lt;/p&gt;

&lt;p&gt;The grocery automation you built in this tutorial is a starting point. The pattern is reusable: write a skill file that describes the task, connect an MCP server that gives Hermes access to the right service, and set a cron job to trigger it automatically. The grocery automation is one example. The same pattern works for any repetitive task in your life. A few ideas to get you started:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bill reminders-&lt;/strong&gt; Create a skill that tracks recurring payments, calculates due dates, and alerts you three days before each bill is due.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Health tracking-&lt;/strong&gt;  Log your workouts or meals via Telegram and have Hermes summarize your week every Sunday.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Home automation-&lt;/strong&gt; Connect Hermes to a smart home MCP server and have it adjust lights, thermostats, or appliances based on a schedule or your location.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/nousresearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent on GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;Model Context Protocol documentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.digitalocean.com/products/droplets/" rel="noopener noreferrer"&gt;DigitalOcean Droplets documentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.digitalocean.com/products/droplets/how-to/add-ssh-keys/" rel="noopener noreferrer"&gt;How to add SSH keys to Droplets&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://core.telegram.org/bots/tutorial" rel="noopener noreferrer"&gt;Telegram BotFather guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>hermes</category>
      <category>ai</category>
      <category>agents</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>April 2026 DigitalOcean Tutorials: Inference Optimization and AI Infrastructure</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Fri, 22 May 2026 21:55:02 +0000</pubDate>
      <link>https://dev.to/digitalocean/april-2026-digitalocean-tutorials-inference-optimization-and-ai-infrastructure-5fcf</link>
      <guid>https://dev.to/digitalocean/april-2026-digitalocean-tutorials-inference-optimization-and-ai-infrastructure-5fcf</guid>
      <description>&lt;p&gt;Most AI teams hit the same walls once they move past prototyping. The RAG pipeline that worked flawlessly in a demo starts hallucinating under real traffic. Inference costs climb without clear optimization levers. GPU resources sit underutilized while workloads spike elsewhere. &lt;/p&gt;

&lt;p&gt;Most of the time, the root cause traces back to architecture decisions that weren't pressure-tested for production. This month's &lt;a href="https://www.digitalocean.com/community/tutorials" rel="noopener noreferrer"&gt;DigitalOcean tutorials&lt;/a&gt; focus on diagnosing and fixing those failure points across the AI infrastructure stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/conceptual-articles/why-rag-systems-fail-in-production" rel="noopener noreferrer"&gt;Why RAG Systems Fail in Production&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Why do seemingly solid RAG demos collapse under real-world conditions? This article traces failures back to retrieval quality, latency tradeoffs, and embedding drift. You’ll get a clear picture of how upstream decisions—such as chunking strategy and ranking—directly affect downstream LLM outputs. If your team is building production pipelines, evaluation, monitoring, and retrieval engineering matter just as much as model choice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogkanycez0ox4kj1fjmi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogkanycez0ox4kj1fjmi.png" alt="RAG System architecture" width="800" height="576"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/conceptual-articles/dedicated-vs-serverless-inference-at-scale" rel="noopener noreferrer"&gt;Dedicated vs. Serverless Inference as You Scale&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;The choice between serverless and dedicated inference isn't a one-time decision but an evolution driven by how your workload changes over time. Early on, serverless makes sense because traffic is unpredictable and iteration speed matters more than performance optimization. As usage stabilizes, the cracks show up—latency variability frustrates users and per-request pricing gets expensive for always-on systems. Walk-throughs of Modal and Together.ai show where that transition point hits and why delaying it costs you.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/conceptual-articles/serverless-fine-tuned-llms" rel="noopener noreferrer"&gt;Fine-Tuned LLMs on Serverless Architecture&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Parameter-efficient methods like LoRA let platforms serve hundreds of fine-tuned model variants from a single GPU by layering small adapter weights on top of a shared frozen base model. This makes serverless, pay-per-token inference possible for custom models without dedicated GPU deployments. The tradeoff is cold starts: idle adapters get evicted from VRAM and need to be reloaded, adding a few hundred milliseconds of latency to the first token. You’ll learn how to minimize that with keep-alive requests, adapter rank tuning, and smarter layer targeting.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/model-silent-versioning-problem" rel="noopener noreferrer"&gt;The Silent Versioning Problem in AI Inference&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;This one is a cautionary tale about what happens when the model behind your endpoint changes and nobody tells you. The serving stack is full of moving parts that can shift independently of the model name, and the result is silent regressions that break prompt tuning and invalidate your evaluations before you even know something moved. It includes a practical buyer's checklist for pressing inference platforms on snapshot pinning, retention commitments, and how they handle disclosure when something in the stack changes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fai6i5tt8aydh7zx752l8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fai6i5tt8aydh7zx752l8.png" alt="Silent versioning framework" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/conceptual-articles/bottlenecks-llm-inference-optimization" rel="noopener noreferrer"&gt;The Hidden Bottlenecks in LLM Inference and How to Fix Them&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Faster GPUs are not the answer if the rest of your serving stack can't keep up. Spoiler: the bottlenecks are GPU underutilization from rigid batching, memory bandwidth constraints during decode, KV cache fragmentation, and CPU-side overhead from tokenization and prompt assembly. Click through for a deeper look at each one and practical fixes.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/ai-platform-security-review" rel="noopener noreferrer"&gt;We Built a Private-Document AI App to Test Platform Security. Here Is What We Could Actually Verify&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;AI security should always be treated as a first-class concern, not an afterthought. This tutorial puts that to the test by building a private-document chatbot and running the same workflow across six inference platforms: DigitalOcean, Baseten, Nebius, Fireworks AI, Modal, and Together AI. Each platform is evaluated on access controls, data retention defaults, network isolation, audit logging, and shared responsibility clarity. It doubles as a practical framework for figuring out what you can actually verify before sensitive data is in flight.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/post-inference-querying-mongodb" rel="noopener noreferrer"&gt;Post-Inference Storage and Querying with MongoDB&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Many inference tutorials stop at the model response. This one keeps going. You'll build a FastAPI app that sends images through a vision model, stores the structured predictions in MongoDB, and then exposes endpoints that let you filter by detected labels and confidence scores or run aggregation pipelines across your full dataset. It's a practical blueprint for turning raw model output into something queryable and operational.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-build-multi-agent-ai-system-docker-agent-digitalocean" rel="noopener noreferrer"&gt;How to Build a Multi-Agent AI System with Docker and DigitalOcean&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Instead of routing everything through a single model, multi-agent systems let you split a workflow across specialized agents that each handle a different part of the problem and pass results between them. The tradeoff is coordination complexity. This walkthrough covers how to containerize each agent with Docker, manage communication between them, and deploy the full system on DigitalOcean. You'll come away with a working deployment pattern you can adapt to your own orchestration needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://www.digitalocean.com/community/tutorials/gpu-fleet-optimizer" rel="noopener noreferrer"&gt;Building an AI-Powered GPU Fleet Optimizer with the DigitalOcean AI Platform ADK&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;A single idle GPU Droplet left running overnight can add hundreds of dollars to your monthly bill, and standard CPU monitoring won't catch it because it can't see whether the GPU is actually doing work. This tutorial builds an AI-powered agent using the DigitalOcean AI Platform ADK that scrapes NVIDIA DCGM metrics like VRAM usage, engine utilization, and power draw across your fleet in real time. It compares those metrics against configurable thresholds to flag idle resources before they inflate your cloud spend. The repo is designed to be forked and customized to your own workloads, including adding tools that let the agent take action like powering off idle nodes.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>infrastructure</category>
      <category>tutorial</category>
      <category>learning</category>
    </item>
    <item>
      <title>Tutorial: This AI Now Tells You if a Meeting Could Be an Email</title>
      <dc:creator>Andrew Dugan</dc:creator>
      <pubDate>Thu, 21 May 2026 16:00:00 +0000</pubDate>
      <link>https://dev.to/digitalocean/tutorial-this-ai-now-tells-you-if-a-meeting-could-be-an-email-2m3f</link>
      <guid>https://dev.to/digitalocean/tutorial-this-ai-now-tells-you-if-a-meeting-could-be-an-email-2m3f</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;DigitalOcean's Inference Router semantically routes prompts to the most appropriate model based on custom instructions. The setup process is 'point-and-click', with no hardcoded "if/else" logic required.&lt;/li&gt;
&lt;li&gt;The router is built directly into the inference pipeline. Users can make inference requests normally, and the router automatically handles the workflow.&lt;/li&gt;
&lt;li&gt;In our workflow, it determines the nature of the task and routes the request to a cheaper, faster model to write an email or a larger, more advanced model to write a meeting agenda. This architecture can scale beyond meetings and can be used for support tickets, code reviews, legal documents, and more.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Think back to the last time you received a calendar invite with no agenda, 12 attendees, and a title that says "Quick Sync". We've all either held or attended meetings that "could have been an email" at some point, but what if there was a way to have a gentle nudge built straight into your workflow that only leads us into a meeting when the task requires it. Instead of defaulting to a meeting, one could describe the details of the task that needs to be addressed, and immediately either an email is written for you to send out or a meeting agenda is written ready to attach to your calendar invites. To take it a step further, emails and meeting agendas require different levels of depth and consideration, and ultimately different &lt;a href="https://www.digitalocean.com/resources/articles/large-language-models" rel="noopener noreferrer"&gt;LLMs&lt;/a&gt; to write them.&lt;/p&gt;

&lt;p&gt;We've built exactly this using DigitalOcean's new &lt;a href="https://docs.digitalocean.com/products/inference/how-to/use-inference-router/" rel="noopener noreferrer"&gt;Inference Router&lt;/a&gt;, a policy-driven routing layer that matches each incoming prompt to the right model based on task complexity without hardcoded "if/else" logic required. In this tutorial, we will cover the "Could have been an email" router that we built using this new feature, how it works, and how to build your own custom router with DigitalOcean's tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the router works with DigitalOcean
&lt;/h2&gt;

&lt;p&gt;Traditional LLM (large language model) inference involves sending a request to a single model and getting a response. The better or worse the model, the better or worse the response. LLM routers are a layer in between you and a group of models that takes your request, identifies the best model for the request, and has that specific model handle it. Routers can be customized to choose models based on speed, price, specific task, or any other optimization you are looking for. It allows teams to set up a single endpoint for a wide range of needs while getting the best possible price and speed for each request.&lt;/p&gt;

&lt;p&gt;In our case, we built a router with two tasks. The first task we made is &lt;code&gt;write_email&lt;/code&gt;. It is backed by a cheap, fast model (&lt;a href="https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct" rel="noopener noreferrer"&gt;Llama 3.3 Instruct 70B&lt;/a&gt;) for writing a simple email. The second task is &lt;code&gt;write_meeting_agenda&lt;/code&gt;. It is backed by a frontier model (Anthropic &lt;a href="https://www.anthropic.com/news/claude-opus-4-7" rel="noopener noreferrer"&gt;Claude Opus 4.7&lt;/a&gt;) to create a detailed meeting plan to discuss decisions that genuinely require talking to each other. In the request, you describe what you need done, the topic, the stakeholders, and any agenda items, and the router reads that description, matches it against the task definitions, and routes it to whichever model fits. If the request lands on the &lt;code&gt;write_email&lt;/code&gt; task, the router delivers a verdict of "this could be an email" and generates a ready-to-send email draft. If it lands on &lt;code&gt;write_meeting_agenda&lt;/code&gt;, the app confirms the meeting is warranted and produces a structured agenda with talking points and action items. The routing decision itself is the verdict. No additional classification logic is needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1. Build the router
&lt;/h2&gt;

&lt;p&gt;The first step to building a router is to log in to your DigitalOcean cloud account, or create an account if you don't have one already. Navigate to the &lt;a href="https://cloud.digitalocean.com/model-studio/router/" rel="noopener noreferrer"&gt;router page&lt;/a&gt; and select "Create Router".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdoimages.nyc3.cdn.digitaloceanspaces.com%2F010AI-ML%2F2025%2FAndrew%2F19_Meeting_or_Email%2F1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdoimages.nyc3.cdn.digitaloceanspaces.com%2F010AI-ML%2F2025%2FAndrew%2F19_Meeting_or_Email%2F1.png" title="Create a Router in the DigitalOcean Control Panel" alt="The DigitalOcean Create a Router page showing the name and description fields" width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the Create a Router page, give the router a unique name and a description. That description is not just metadata. It serves as a routing prompt, giving the router overall context so it can identify the most appropriate task for each incoming request. From there you define the tasks that make up the router's logic. Each task combines a name, a description, and a model pool with a selection policy. You can either add pre-configured tasks that DigitalOcean has already benchmarked and optimized, or define fully custom tasks that specify exactly which models to use and how to rank them, whether by cost efficiency, speed (&lt;a href="https://www.digitalocean.com/blog/llm-inference-benchmarking" rel="noopener noreferrer"&gt;Time To First Token&lt;/a&gt;), or a manual ranking you control.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.digitalocean.com%2Fscreenshots%2Finference%2Fadd-custom-task.fc4a50918dd6700b40e7f2bdca0e20b358a2f6b7322dc31921cbe8d3f448a21b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.digitalocean.com%2Fscreenshots%2Finference%2Fadd-custom-task.fc4a50918dd6700b40e7f2bdca0e20b358a2f6b7322dc31921cbe8d3f448a21b.png" title="Add a custom task to the router" alt="The Add Custom Task dialog showing task name, description, and model pool fields" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once your tasks are in place, the last piece is specifying fallback models. Fallback models catch any request that does not cleanly match one of your configured tasks, and they are tried in the priority order you set. This gives the router a safety net so that even if the incoming prompt is ambiguous or outside the scope of your named tasks, a response is still generated rather than failing silently. For our email/meeting router, that means a borderline "is this a meeting or an email?" input never goes unanswered.&lt;/p&gt;

&lt;p&gt;If you prefer automation over the control panel, you can also create the router with a single POST request to &lt;code&gt;https://api.digitalocean.com/v2/gen-ai/models/routers&lt;/code&gt;, passing in the same names, task definitions, selection policies, and fallback models as a JSON body, which is also useful for version-controlling your router alongside your application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2. Build the app
&lt;/h2&gt;

&lt;p&gt;With the router created, integrating it into an application is straightforward because the router is a drop-in replacement for any direct model call. You use the same Chat Completions endpoint (&lt;code&gt;https://inference.do-ai.run/v1/chat/completions&lt;/code&gt;) and the same request shape, but instead of naming a specific model you prefix your router's name with &lt;code&gt;router:&lt;/code&gt; in the &lt;code&gt;model&lt;/code&gt; field. For this app, the field would look like &lt;code&gt;"model": "router:meeting-or-email"&lt;/code&gt;. Authentication works the same way. You generate a Model Access Key from the DigitalOcean Control Panel, export it as &lt;code&gt;MODEL_ACCESS_KEY&lt;/code&gt;, and pass it as a Bearer token in your request header. The user's meeting description, agenda, and attendee list become the message content, and the router takes it from there.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="n"&gt;meeting_or_email&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;meeting_or_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://inference.do-ai.run/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &amp;lt;^&amp;gt;YOUR_MODEL_ACCESS_KEY&amp;lt;^&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;router:meeting-or-email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a workplace productivity assistant that evaluates whether a task &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;requires a live meeting or can be handled asynchronously via email. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If the request involves a straightforward update, announcement, or single-topic &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;communication with no real-time decision-making needed, write a concise, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;professional email draft and state that this could have been an email. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If the request requires discussion, real-time collaboration, debate, or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coordination among multiple stakeholders with competing priorities, produce &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a structured meeting agenda with talking points and action items, and confirm &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;that a meeting is warranted. Always begin your response by clearly stating &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your verdict: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;This could be an email.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;This warrants a meeting.&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;
                &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Message: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;model&lt;/code&gt; field in the response body tells you exactly which model the router selected for that request. Requests the router judged as routine land on the cheaper, faster model, while requests it judged as genuinely complex land on the frontier model. The &lt;code&gt;x-model-router-selected-route&lt;/code&gt; response header tells you which task was matched, for example &lt;code&gt;write_email&lt;/code&gt; vs &lt;code&gt;write_meeting_agenda&lt;/code&gt;, or &lt;code&gt;fallback&lt;/code&gt; if none of the tasks matched. The app does not need any if/else logic to decide what kind of meeting it is. It reads the header the router already populated and maps it to a verdict message for the user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;meeting_or_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I need to plan a large event with multiple stakeholders that will all be involved.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[secondary_label Output]
Model: anthropic-claude-opus-4.7
Message: This warrants a meeting.

Coordinating a large event with multiple stakeholders involves competing priorities, real-time negotiation of responsibilities, and collaborative decision-making that simply cannot be handled efficiently via email threads. Below is a structured agenda to make the meeting productive.

---

## Event Planning Kickoff Meeting
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see above that with a large project the task is routed to Opus 4.7. With a smaller task that just warrants an email, below, the task is routed to Llama3.3.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;meeting_or_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I have some metrics I want to share with my team.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[secondary_label Output]
Model: llama3.3-70b-instruct
Message: This could be an email. 

Here's a draft email you could send to your team:

Subject: Update on Key Metrics

Dear Team,
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3. Deploy to DigitalOcean App Platform
&lt;/h2&gt;

&lt;p&gt;Before deploying your own router, it is worth spending a few minutes in the Inference Router playground to validate that the router is routing the way you expect. From the &lt;code&gt;My Routers&lt;/code&gt; tab, click the menu next to your router and select a model to compare it against. The Playground opens in a split view where you can type a meeting description and see both the router's response and the comparison model's response side by side. Each result shows the cost difference, end-to-end latency, the specific model the router selected, and the task that was matched for that query. This is a useful check to confirm that your task descriptions are correctly discriminating between routine syncs and complex-coordination requests before any real traffic hits the router.&lt;/p&gt;

&lt;p&gt;Once deployed, the Analyze tab gives you a live view of how the router is performing in production. You can see aggregate metrics across all your routers or drill into a specific one, including total requests, total token usage, model match rate, and fallback rate. Model match rate is the percentage of requests matched to a configured task, and fallback rate is the percentage that fell through to the fallback models instead. For accuracy evaluation, the Router Evaluation tool in the Playground tab lets you upload a labeled dataset and run an LLM-as-a-Judge evaluation that scores responses on completeness, correctness, token usage, and latency. Together these two views give you what you need to iterate on your task descriptions and model pools after launch as you accumulate real meeting data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The meeting app we built is a thin wrapper around a genuinely powerful idea. You do not have to choose which model handles a request, you just have to describe the conditions under which each model makes sense and let the router enforce those conditions at runtime. The router does not just save money on tokens. It changes how you think about designing for complexity. Instead of building one prompt that works adequately for everything, you build narrow, well-described task buckets and let semantic matching handle the dispatch.&lt;/p&gt;

&lt;p&gt;The broader lesson here extends well beyond meetings and emails. The same pattern applies anywhere you have a mix of requests hitting a single endpoint. This could include a customer support queue where most tickets are simple FAQs but a few require nuanced reasoning, a code review pipeline where style fixes and architecture feedback warrant very different models, or a legal document classifier where boilerplate and novel clauses should not cost the same to process. Once you have written a router description and a pair of task definitions, you have infrastructure that scales horizontally without adding branching logic to your application code. DigitalOcean's &lt;a href="https://www.digitalocean.com/" rel="noopener noreferrer"&gt;platform&lt;/a&gt; keeps that infrastructure on one bill and one security model, which removes the operational overhead that typically discourages teams from adopting multi-model strategies in the first place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.digitalocean.com/products/inference/how-to/use-inference-router/" rel="noopener noreferrer"&gt;How to Use Inference Router&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/how-to-build-parallel-agentic-workflows-with-python" rel="noopener noreferrer"&gt;How to Build Parallel Agentic Workflows with Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/mistral-7b-fine-tuning" rel="noopener noreferrer"&gt;Fine-Tune Mistral-7B with LoRA: A Quickstart Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>agentskills</category>
      <category>inference</category>
    </item>
    <item>
      <title>Tutorial: Build a Cost-Aware AI Support Triage API</title>
      <dc:creator>James Skelton</dc:creator>
      <pubDate>Tue, 19 May 2026 23:10:07 +0000</pubDate>
      <link>https://dev.to/digitalocean/tutorial-build-a-cost-aware-ai-support-triage-api-24m5</link>
      <guid>https://dev.to/digitalocean/tutorial-build-a-cost-aware-ai-support-triage-api-24m5</guid>
      <description>&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AI applications use a single endpoint to handle multiple complex tasks: classification, urgency scoring, customer-facing drafting, and long-form summarization. &lt;/li&gt;
&lt;li&gt;This does not account for varying cost, latency, and quality requirements. &lt;/li&gt;
&lt;li&gt;Building a FastAPI and using serverless inference infrastructure makes it possible to address these requirements through effective routing.
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Most AI applications start with a single model hard-coded into the app. That works well for a prototype, but it breaks down the moment a single endpoint has to handle multiple complex task categories: classification, urgency scoring, customer-facing drafting, and long-form summarization all benefit from different model choices. Those tasks do not share the same cost, latency, or quality requirements.&lt;/p&gt;

&lt;p&gt;Support triage is the cleanest example of this. A user types "how do I reset my password?" and you spend the same per-token rate as you do on a multi-paragraph escalation from an enterprise customer with logs pasted in. You can branch on ticket type in your app code and pick a different model per branch, but now your model selection logic lives inside your handler, your fallback strategy is a try/except, and every pricing change means a redeploy. The consequences include a 70B model classifying one-word tickets, no fallback when that model is slow, and a redeploy every time pricing shifts.&lt;/p&gt;

&lt;p&gt;In this tutorial, we'll use &lt;a href="https://docs.digitalocean.com/products/inference/how-to/use-serverless-inference/" rel="noopener noreferrer"&gt;serverless inference via DigitalOcean's Inference router&lt;/a&gt; to easily and quickly build a &lt;a href="https://fastapi.tiangolo.com/" rel="noopener noreferrer"&gt;FastAPI&lt;/a&gt; support triage endpoint that solves all these problems at once. By the end, you'll route classification, urgency scoring, customer replies, and escalation summaries to the right model for each job — automatically, with built-in fallback, and without a single model name in your application code. You'll have a production-ready API that's 71% cheaper than running everything on a frontier model.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you're building
&lt;/h2&gt;

&lt;p&gt;Let's construct a single endpoint, &lt;code&gt;POST /triage&lt;/code&gt;, that takes a ticket payload and returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Classification: the issue category (billing, bug, how-to, account, etc.)&lt;/li&gt;
&lt;li&gt;Urgency + sentiment: a severity score and a read on customer mood&lt;/li&gt;
&lt;li&gt;Drafted reply: a short, customer-facing response&lt;/li&gt;
&lt;li&gt;Escalation summary: a structured brief for a human agent, generated only when the ticket is complex enough to need one&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture moves from this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;App → hardcoded model &lt;span class="o"&gt;(&lt;/span&gt;one model handles every task&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;App → Serverless inference via Inference Router → best-fit model per task
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The router is what makes the second diagram possible without your app knowing anything about which models exist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serverless Inference and DigitalOcean's Inference Router
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://www.digitalocean.com/products/inference-engine" rel="noopener noreferrer"&gt;Inference Router&lt;/a&gt; lets you define tasks and model pools, then routes incoming prompts to the best-fit model based on those task definitions and selection policies. A task is a named job with a description: "&lt;code&gt;classify_ticket&lt;/code&gt;, for example. A model pool is the set of candidate models the router can choose from for that task, governed by a selection policy: lowest cost, lowest latency, a manually set ranking, or a fallback order. You configure all of this once at the router level, and your app calls the router instead of any specific model."&lt;/p&gt;

&lt;p&gt;Serverless inference lets you send API requests to models without having to create an AI agent or worry about managing infrastructure. This allow you to get started quickly without managing any components behind an inference endpoint.&lt;/p&gt;

&lt;p&gt;The API surface is OpenAI-compatible. The base URL is &lt;code&gt;https://inference.do-ai.run/v1/&lt;/code&gt;, and a single model access key covers both foundation models and routers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project setup
&lt;/h2&gt;

&lt;p&gt;In order to continue, you need Python 3.10+, a &lt;a href="https://cloud.digitalocean.com/login" rel="noopener noreferrer"&gt;DigitalOcean account&lt;/a&gt; with &lt;a href="https://docs.digitalocean.com/products/inference/how-to/use-serverless-inference/" rel="noopener noreferrer"&gt;Serverless Inference&lt;/a&gt; enabled, and a &lt;a href="https://docs.digitalocean.com/products/inference/how-to/model-access-keys/" rel="noopener noreferrer"&gt;model access key&lt;/a&gt;. We have already configured the &lt;a href="https://github.com/Jameshskelton/triage_app" rel="noopener noreferrer"&gt;full project in this repository&lt;/a&gt; for your convenience, but follow along in this next section to build out your own version of the API and learn why we made specific choices for the API.&lt;/p&gt;

&lt;p&gt;The project layout is intentionally small:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;support-triage/
├── main.py
├── sample_tickets.json
├── requirements.txt
└── .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;main.py&lt;/code&gt; holds the application code, &lt;code&gt;requirements.txt&lt;/code&gt; the required packages,  &lt;code&gt;sample_tickets.json&lt;/code&gt; is a sample for testing the router, and &lt;code&gt;.env&lt;/code&gt; holds the required secrets, keys, and URL base values.&lt;/p&gt;

&lt;p&gt;To get started, clone the repo onto your machine and install everything by pasting the following into your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Jameshskelton/triage_app
&lt;span class="nb"&gt;cd &lt;/span&gt;triage_app
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv_triage
&lt;span class="nb"&gt;source &lt;/span&gt;venv_triage/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The OpenAI SDK works as-is for DigitalOcean's Serverless Inference: you just point base_url and api_key at DigitalOcean instead of OpenAI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: The baseline - direct model calls
&lt;/h3&gt;

&lt;p&gt;Before we touch the router, let's build the version most developers would write first: one model, hardcoded, doing all four jobs. The next few step sections outline the work we did to build the application demo. If you would like to just test the final version, check out our &lt;a href="https://github.com/Jameshskelton/triage_app" rel="noopener noreferrer"&gt;repository&lt;/a&gt; where we stored this project.&lt;/p&gt;

&lt;p&gt;To get started, we created &lt;code&gt;main.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DO_INFERENCE_BASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DO_MODEL_ACCESS_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.3-70b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# one model for everything
&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Ticket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/triage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;triage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Ticket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Subject: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify this support ticket into one of: billing, bug, how-to, account, other. Reply with one word.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;urgency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Score urgency from 1 (low) to 5 (critical) and note sentiment. Reply as &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score: N, sentiment: X&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;reply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a short, professional reply to this customer. Maximum 4 sentences.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize this ticket for a human agent. Include the problem, what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s been tried, and recommended next steps.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urgency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;urgency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation_summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we have set up our &lt;code&gt;.env&lt;/code&gt; file correctly with the right API keys and values, we can run it using the code below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvicorn main:app &lt;span class="nt"&gt;--reload&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s test it with two tickets (one trivial, one complex), and audit the results.&lt;/p&gt;

&lt;p&gt;Example input 1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST localhost:8000/triage &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
  "subject": "Password reset",
  "body": "How do I reset my password?"
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example output 1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"category"&lt;/span&gt;: &lt;span class="s2"&gt;"account"&lt;/span&gt;,
  &lt;span class="s2"&gt;"urgency"&lt;/span&gt;: &lt;span class="s2"&gt;"score: 1, sentiment: neutral"&lt;/span&gt;,
  &lt;span class="s2"&gt;"reply"&lt;/span&gt;: &lt;span class="s2"&gt;"You can reset your password by selecting the Forgot password link on the sign-in page and following the email instructions. If you do not receive the reset email, check your spam folder or contact support for help."&lt;/span&gt;,
  &lt;span class="s2"&gt;"escalation_summary"&lt;/span&gt;: &lt;span class="s2"&gt;"The customer is asking how to reset their password. No signs of account compromise, outage, or escalation risk. Recommended next step: provide standard password reset instructions."&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example input 2:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST localhost:8000/triage &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
  "subject": "Production outage on enterprise account",
  "body": "Our team has been unable to access the dashboard since 09:14 UTC. We have ~200 internal users blocked. Attached are logs showing 502s from the API gateway..."
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us something like the corresponding example output 2:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"category"&lt;/span&gt;: &lt;span class="s2"&gt;"bug"&lt;/span&gt;,
  &lt;span class="s2"&gt;"urgency"&lt;/span&gt;: &lt;span class="s2"&gt;"score: 5, sentiment: frustrated"&lt;/span&gt;,
  &lt;span class="s2"&gt;"reply"&lt;/span&gt;: &lt;span class="s2"&gt;"Thank you for reporting this. We understand that a production dashboard outage affecting around 200 users is urgent, and we are escalating this to our engineering team immediately. Please continue to share any relevant logs or timestamps while we investigate."&lt;/span&gt;,
  &lt;span class="s2"&gt;"escalation_summary"&lt;/span&gt;: &lt;span class="s2"&gt;"Enterprise customer reports a production dashboard outage beginning at 09:14 UTC. Approximately 200 internal users are blocked. Logs indicate 502 responses from the API gateway. Recommended next steps: escalate to engineering, inspect gateway and upstream service health, correlate errors around 09:14 UTC, and provide the customer with frequent status updates."&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both responses are useful. That is exactly why this baseline is tempting.&lt;/p&gt;

&lt;p&gt;But look at what just happened: the same 70B model handled everything. The model classified "How do I reset my password?" into a simple category, scored urgency, drafted a short reply, and wrote an escalation summary that the ticket did not really need. Then it handled the enterprise outage, where the larger model actually makes sense.&lt;/p&gt;

&lt;p&gt;That is the problem. The trivial ticket and the production outage have very different cost, latency, and quality requirements, but the app treats them the same. You are paying overkill rates for simple work, there is no fallback if the model is slow or unavailable, and any model-selection change means editing application code and redeploying. Let's fix that.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Configure the Inference Router
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdoimages.nyc3.cdn.digitaloceanspaces.com%2F010AI-ML%2F2026%2FJames%2Finference%2520router.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdoimages.nyc3.cdn.digitaloceanspaces.com%2F010AI-ML%2F2026%2FJames%2Finference%2520router.gif" alt="image" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://cloud.digitalocean.com" rel="noopener noreferrer"&gt;DigitalOcean control panel&lt;/a&gt;, navigate to the Inference Router using the left-hand sidebar. Then, create a new Inference Router. Name your Router appropriately, and give it a descriptive description of what it will do. For example, we named ours &lt;code&gt;triage-router&lt;/code&gt;, and described it as “Demo Triage API for DO tutorial”.&lt;/p&gt;

&lt;p&gt;The router then needs its four tasks, each with a description and a model pool with a selection policy. Each of these is outlined below. If you want to copy them to recreate this experiment, copy and paste the values within to the Router tasks individually. This will make probabilistically similar results to what we have.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskqr7zhd8cvix23kr48e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskqr7zhd8cvix23kr48e.png" alt="image" width="800" height="1443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task name&lt;/th&gt;
&lt;th&gt;Description (fed to the router)&lt;/th&gt;
&lt;th&gt;Model pool strategy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;classify_ticket&lt;/td&gt;
&lt;td&gt;Categorize short support messages into issue types (billing, bug, how-to, account).&lt;/td&gt;
&lt;td&gt;Lowest cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;urgency_detection&lt;/td&gt;
&lt;td&gt;Detect severity, sentiment, and escalation risk in a single pass.&lt;/td&gt;
&lt;td&gt;Lowest latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;draft_customer_reply&lt;/td&gt;
&lt;td&gt;Generate a short, professional customer-facing reply.&lt;/td&gt;
&lt;td&gt;Manual ranking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;escalate_complex_issue&lt;/td&gt;
&lt;td&gt;Summarize complex tickets into structured briefs for a human agent.&lt;/td&gt;
&lt;td&gt;Manual ranking&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkoio5hpfuv1l6enql9fj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkoio5hpfuv1l6enql9fj.png" alt="image" width="800" height="1312"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we are creating the description, selecting the router prioritization policy, and selecting the model, we need to consider the exact task we want completed to optimize our results. Here are a few things worth noting as you configure these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Task descriptions matter. The router uses them to match incoming requests to the right task. Be specific about what the task does, what kind of input it expects, and the format of the output.&lt;/li&gt;
&lt;li&gt;Put at least two models in every pool. A pool of one is a single point of failure. Even your "lowest cost" pool should have a fallback in case the primary is unavailable.&lt;/li&gt;
&lt;li&gt;The selection policy is enforced inside the pool, not across pools. "Lowest cost" means "the cheapest model in this pool that's currently healthy," not "the cheapest model on the platform."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the router is saved, you'll get a router ID. That's what your app will call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Refactor the app to use the router
&lt;/h3&gt;

&lt;p&gt;Now the satisfying part. Replace the hardcoded MODEL constant with the router ID, and pass the task name through the request. Below is an example of what you could do to make it work, though not exactly what we did in our final release.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ROUTER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-router-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# from the DigitalOcean control panel
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_urgency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urgency_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract the integer score from &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score: N, sentiment: X&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. Defaults to 3 if unparseable.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score:\s*(\d)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;urgency_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ROUTER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# router uses this to pick the pool
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;served_by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# the model the router actually picked
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/triage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;triage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Ticket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Subject: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify_ticket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify this support ticket into one of: billing, bug, how-to, account, other. Reply with one word.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;urgency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urgency_detection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Score urgency from 1 (low) to 5 (critical) and note sentiment. Reply as &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score: N, sentiment: X&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;reply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;draft_customer_reply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a short, professional reply to this customer. Maximum 4 sentences.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Only escalate when urgency warrants a human brief
&lt;/span&gt;    &lt;span class="n"&gt;urgency_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_urgency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urgency&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;urgency_score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalate_complex_issue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize this ticket for a human agent. Include the problem, what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s been tried, and recommended next steps.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urgency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;urgency&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urgency_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;urgency_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation_summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;routing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify_ticket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;served_by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urgency_detection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;urgency&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;served_by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;draft_customer_reply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;served_by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalate_complex_issue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;served_by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole change. It’s already done for you in the GitHub version, so there’s no need to manually do it yourself.&lt;/p&gt;

&lt;p&gt;With this, there are no model names anywhere in the app. The router decides which model handles each task, using the policies you configured. If you want to swap the underlying model for draft_customer_reply next month, you do it in the router, not in this file.&lt;/p&gt;

&lt;p&gt;The app triages one ticket by breaking it into smaller AI jobs instead of asking one model to do everything at once. When you call POST /triage, main.py builds the ticket text, then sends separate router calls for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classify_ticket: decides the ticket category, like billing, bug, how-to, account, or other.&lt;/li&gt;
&lt;li&gt;urgency_detection: scores severity from 1 to 5 and detects sentiment; the code uses the score to decide whether to escalate.&lt;/li&gt;
&lt;li&gt;draft_customer_reply: writes a short customer-facing response.&lt;/li&gt;
&lt;li&gt;escalate_complex_issue: Tickets scoring 4 or 5 on urgency trigger the escalation summary; lower scores skip it entirely, which is where most of the cost savings live.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key thing: the app always calls your DigitalOcean router ID from .env as the model, and the router decides which underlying model should handle each prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Run mixed tickets through the router
&lt;/h3&gt;

&lt;p&gt;With the router wired in, let's test it. The interesting behavior shows up when you feed the endpoint a mix of simple and complex examples. Here's a small batch of simple to complex examples in sample_tickets.json:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;
  &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"subject"&lt;/span&gt;: &lt;span class="s2"&gt;"Password reset"&lt;/span&gt;, &lt;span class="s2"&gt;"body"&lt;/span&gt;: &lt;span class="s2"&gt;"How do I reset my password?"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"subject"&lt;/span&gt;: &lt;span class="s2"&gt;"Invoice question"&lt;/span&gt;, &lt;span class="s2"&gt;"body"&lt;/span&gt;: &lt;span class="s2"&gt;"Why was I charged twice on invoice INV-3382?"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"subject"&lt;/span&gt;: &lt;span class="s2"&gt;"This is ridiculous"&lt;/span&gt;, &lt;span class="s2"&gt;"body"&lt;/span&gt;: &lt;span class="s2"&gt;"Third time this week your dashboard has gone down during our standup. We're seriously evaluating alternatives."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"subject"&lt;/span&gt;: &lt;span class="s2"&gt;"Dashboard weird"&lt;/span&gt;, &lt;span class="s2"&gt;"body"&lt;/span&gt;: &lt;span class="s2"&gt;"the dashboard is weird since yesterday"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"subject"&lt;/span&gt;: &lt;span class="s2"&gt;"Production outage"&lt;/span&gt;, &lt;span class="s2"&gt;"body"&lt;/span&gt;: &lt;span class="s2"&gt;"Our team has been unable to access the dashboard since 09:14 UTC. ~200 internal users blocked. Logs attached show 502s from the API gateway, traced to..."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"subject"&lt;/span&gt;: &lt;span class="s2"&gt;"Feature request + complaint"&lt;/span&gt;, &lt;span class="s2"&gt;"body"&lt;/span&gt;: &lt;span class="s2"&gt;"Can you add bulk export? Also the existing export is too slow and crashes on &amp;gt;10k rows."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"subject"&lt;/span&gt;: &lt;span class="s2"&gt;"API auth"&lt;/span&gt;, &lt;span class="s2"&gt;"body"&lt;/span&gt;: &lt;span class="s2"&gt;"Getting 401s after rotating my key. Following the docs at /auth/rotate but the new key returns invalid."&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In order to test them in sequence, we have provided &lt;code&gt;run_batch.py&lt;/code&gt; to facilitate this test. You can run it yourself with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 run_batch.py sample_tickets.json &lt;span class="nt"&gt;--json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Loop through them and you'll see the routing do its job. The one-line "how do I reset my password?" hits the lowest-cost pool for classification and a small, fast model for urgency. The angry churn-risk message gets flagged high-urgency quickly, but the drafted reply comes from the higher-quality pool because that response is going to a real customer. The production outage gets routed to the higher-quality pool for the escalation summary, because that summary is what a human engineer is going to read at 09:15 UTC.&lt;/p&gt;

&lt;p&gt;Because call_router surfaces resp.model as served_by, every response now tells you exactly which model handled each task. Here's what the production outage ticket returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"category"&lt;/span&gt;: &lt;span class="s2"&gt;"bug"&lt;/span&gt;,
  &lt;span class="s2"&gt;"urgency"&lt;/span&gt;: &lt;span class="s2"&gt;"score: 5, sentiment: frustrated"&lt;/span&gt;,
  &lt;span class="s2"&gt;"urgency_score"&lt;/span&gt;: 5,
  &lt;span class="s2"&gt;"reply"&lt;/span&gt;: &lt;span class="s2"&gt;"Thank you for reporting this..."&lt;/span&gt;,
  &lt;span class="s2"&gt;"escalation_summary"&lt;/span&gt;: &lt;span class="s2"&gt;"Enterprise customer reports a production dashboard outage..."&lt;/span&gt;,
  &lt;span class="s2"&gt;"routing"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"classify_ticket"&lt;/span&gt;: &lt;span class="s2"&gt;"openai-gpt-5-nano"&lt;/span&gt;,
    &lt;span class="s2"&gt;"urgency_detection"&lt;/span&gt;: &lt;span class="s2"&gt;"anthropic-claude-haiku-4.5"&lt;/span&gt;,
    &lt;span class="s2"&gt;"draft_customer_reply"&lt;/span&gt;: &lt;span class="s2"&gt;"anthropic-claude-sonnet-4.6"&lt;/span&gt;,
    &lt;span class="s2"&gt;"escalate_complex_issue"&lt;/span&gt;: &lt;span class="s2"&gt;"anthropic-claude-opus-4.7"&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One request, four different models, zero model names in your application code. The cheap classifier handled the one-word category decision, Haiku scored urgency in a single fast pass, Sonnet drafted the customer-facing reply, and Opus produced the brief your on-call engineer reads. Run the password-reset ticket and the routing.escalate_complex_issue field comes back as null — the urgency score didn't clear the threshold, and that null is real money saved.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this actually saves you
&lt;/h2&gt;

&lt;p&gt;Let's put numbers on it. Assume an average ticket is 300 input tokens, with output tokens varying by task (40 for classification, 30 for urgency, 150 for a reply, 250 for an escalation summary). In our 7-ticket sample, 2-3 score high enough to escalate; we use 20% as a steady-state estimate.&lt;/p&gt;

&lt;p&gt;Using DigitalOcean's published serverless inference rates:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Per-ticket cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;classify_ticket&lt;/td&gt;
&lt;td&gt;GPT-5 Nano&lt;/td&gt;
&lt;td&gt;$0.000031&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;urgency_detection&lt;/td&gt;
&lt;td&gt;Claude Haiku 4.5&lt;/td&gt;
&lt;td&gt;$0.000450&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;draft_customer_reply&lt;/td&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;$0.003150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;escalate_complex_issue (fires ~20% of tickets)&lt;/td&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;$0.007750&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At 100,000 tickets/month, three strategies compared:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Monthly cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hardcoded Llama 3.3 70B for everything&lt;/td&gt;
&lt;td&gt;$109&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Router (cost-aware)&lt;/td&gt;
&lt;td&gt;$518&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardcoded Claude Opus 4.7 for everything&lt;/td&gt;
&lt;td&gt;$1,775&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The honest result: the router isn't the cheapest option. Hardcoded Llama 70B is. But Llama 70B writing your enterprise outage reply is the cost. You're only saving money by treating a churn-risk ticket the same as a password reset.&lt;/p&gt;

&lt;p&gt;The fair comparison is against the realistic alternative: once you decide Llama's customer-facing replies aren't good enough, the choice is Opus-for-everything or the router. The router is 71% cheaper than all-Opus while only routing the expensive Opus 4.7 model to the tickets that actually need it.&lt;/p&gt;

&lt;p&gt;Run this math on your own ticket mix before committing. The ratio of trivial-to-complex tickets is the biggest lever: a queue that's 80% password resets saves far more than one that's 80% escalations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production checklist
&lt;/h2&gt;

&lt;p&gt;Before you put this in front of real tickets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Log task type, latency, token usage, and selected model on every call. You can't tune what you can't see, and the router's value is invisible without per-task metrics.&lt;/li&gt;
&lt;li&gt;Build a small eval set per task. Maybe 20 tickets per task with known-good outputs. Run it before changing pool composition. The whole point of the router is that you can swap models without code changes, but you still want to know whether the swap was an improvement.&lt;/li&gt;
&lt;li&gt;Keep at least one fallback in every pool. A pool of one defeats half the reason to use a router.&lt;/li&gt;
&lt;li&gt;Use direct model calls for controlled benchmarks. When you're measuring a specific model's behavior, you don't want the router making your benchmark non-deterministic.&lt;/li&gt;
&lt;li&gt;Revisit routing rules quarterly. Model pricing and quality shift. The pool that was "lowest cost" six months ago might not be today.&lt;/li&gt;
&lt;li&gt;Treat task descriptions as production config. Version them, review changes, don't edit them in the UI without a record.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;The app you ended up with isn't bigger than the one you started with: it's actually smaller, because the model selection logic moved out of the code and into the router. The router is doing the work that used to be a match statement: matching tasks to models, falling back when something's unavailable, and giving you a single place to change strategy. Serverless inference via DigitalOcean's Inference Router enables your app more flexibility and efficiency without any of the hassle of a hardcoded setup.&lt;/p&gt;

&lt;p&gt;From here, a few natural next steps: stream the &lt;code&gt;draft_customer_reply&lt;/code&gt; task back to the client so agents can start reading before generation finishes; wire the escalation summaries into your real ticketing system; or stand up a second router for an unrelated workflow and reuse the same access key.&lt;/p&gt;

&lt;p&gt;The full sample code is available in the companion repo, and the router configuration takes about five minutes in the &lt;a href="https://cloud.digitalocean.com" rel="noopener noreferrer"&gt;DigitalOcean control panel&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>api</category>
      <category>inference</category>
    </item>
    <item>
      <title>Python Decorators: From Basics to Real-World Use Cases</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Tue, 12 May 2026 21:04:07 +0000</pubDate>
      <link>https://dev.to/digitalocean/python-decorators-from-basics-to-real-world-use-cases-n5f</link>
      <guid>https://dev.to/digitalocean/python-decorators-from-basics-to-real-world-use-cases-n5f</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by Shaoni Mukherjee (AI Technical Writer)&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key takeaways
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Python decorators allow additional functionality to be added to functions without changing the original function code.&lt;/li&gt;
&lt;li&gt;Decorators help reduce repeated code and improve code reusability.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;@decorator_name&lt;/code&gt; syntax is a cleaner way of wrapping functions.&lt;/li&gt;
&lt;li&gt;Decorators are commonly used for logging, authentication, caching, validation, and performance monitoring.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;*args&lt;/code&gt; and &lt;code&gt;**kwargs&lt;/code&gt; make decorators flexible enough to work with different function arguments.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;functools.wraps&lt;/code&gt; helps preserve the original function metadata and should be considered a best practice.&lt;/li&gt;
&lt;li&gt;Multiple decorators can be chained together to add multiple layers of functionality.&lt;/li&gt;
&lt;li&gt;Frameworks like Flask and Django rely heavily on decorators for routing, authentication, and request handling.&lt;/li&gt;
&lt;li&gt;Decorators should be kept simple and focused to maintain readability and easier debugging.&lt;/li&gt;
&lt;li&gt;Understanding decorators is important for writing cleaner and more maintainable Python applications.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;While building real-world &lt;a href="https://www.digitalocean.com/community/tutorials/python-tutorial" rel="noopener noreferrer"&gt;Python&lt;/a&gt; applications, a common challenge is the repetition of certain logic codes, such as logging, authentication, validation, time, or performance monitoring across multiple functions. For instance, API endpoints often require user authentication checks, and performance-critical functions may need execution time tracking.&lt;/p&gt;

&lt;p&gt;Adding the same logic code within each function often leads to cluttered code, reduced readability, and increased maintenance effort. Decorators address this problem by creating the separation of such cross-cutting concerns into reusable components that can be applied to functions in a clean and consistent manner. In frameworks like &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-create-your-first-web-application-using-flask-and-python-3" rel="noopener noreferrer"&gt;Flask&lt;/a&gt;, the &lt;code&gt;@app.route("/")&lt;/code&gt; decorator links a URL to a function without requiring explicit routing logic, while in &lt;a href="https://www.digitalocean.com/solutions/django-hosting" rel="noopener noreferrer"&gt;Django&lt;/a&gt;, decorators such as &lt;code&gt;@login_required&lt;/code&gt; enforce access control by restricting views to authenticated users. This approach promotes modularity, improves code clarity, and simplifies the overall structure of applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are Python decorators?
&lt;/h3&gt;

&lt;p&gt;Decorators are basically a wrapper around a function to modify it for better use. The function remains the same, but the decorator adds an extra something to the function.&lt;/p&gt;

&lt;h4&gt;
  
  
  The core idea
&lt;/h4&gt;

&lt;p&gt;Say you have a simple function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, world!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now imagine you want to print a line before and after every function you write, without modifying each one. A decorator lets you do exactly that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;my_decorator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--- Before ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;           &lt;span class="c1"&gt;# calls the original function
&lt;/span&gt;        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--- After ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;

&lt;span class="nd"&gt;@my_decorator&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, world!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;---&lt;/span&gt; &lt;span class="n"&gt;Before&lt;/span&gt; &lt;span class="o"&gt;---&lt;/span&gt;
&lt;span class="n"&gt;Hello&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;world&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;
&lt;span class="o"&gt;---&lt;/span&gt; &lt;span class="n"&gt;After&lt;/span&gt; &lt;span class="o"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;@my_decorator&lt;/code&gt; line is just shorthand for &lt;code&gt;greet = my_decorator(greet)&lt;/code&gt;. Python replaces your function with the wrapped version automatically.&lt;br&gt;
To understand the concept better, let us take a real-world example of timing a function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;timer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;        &lt;span class="c1"&gt;# *args lets it work with ANY function
&lt;/span&gt;        &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; took &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;

&lt;span class="nd"&gt;@timer&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;slow_task&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task done!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;slow_task&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Task&lt;/span&gt; &lt;span class="n"&gt;done&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;
&lt;span class="n"&gt;slow_task&lt;/span&gt; &lt;span class="n"&gt;took&lt;/span&gt; &lt;span class="mf"&gt;1.0012&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Why decorators matter (especially in real projects)
&lt;/h4&gt;

&lt;p&gt;They're everywhere in Python. Common use cases include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;@staticmethod&lt;/code&gt; / &lt;code&gt;@classmethod&lt;/code&gt;&lt;/strong&gt; — built into Python for class methods&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;@app.route('/home')&lt;/code&gt;&lt;/strong&gt; — Flask/Django use them to define web routes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;@login_required&lt;/code&gt;&lt;/strong&gt; — Django uses this to protect pages behind authentication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logging, caching, retrying failed requests&lt;/strong&gt; — all cleanly handled with decorators&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A decorator takes a function, adds behavior around it, and returns a new function without touching the original code.&lt;/p&gt;

&lt;h2&gt;
  
  
  How decorators work internally
&lt;/h2&gt;

&lt;p&gt;To understand decorators better, we will first need to understand a few core Python concepts:&lt;/p&gt;

&lt;h3&gt;
  
  
  Foundation: Functions are objects in Python
&lt;/h3&gt;

&lt;p&gt;In Python, functions aren't special, but they're just objects like integers or strings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;say_hello&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Pass a function as an argument
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;run_it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;say_hello&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# prints: Hello!
&lt;/span&gt;
&lt;span class="c1"&gt;# Assign a function to a variable
&lt;/span&gt;&lt;span class="n"&gt;my_func&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;say_hello&lt;/span&gt;
&lt;span class="nf"&gt;my_func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;           &lt;span class="c1"&gt;# prints: Hello!
&lt;/span&gt;
&lt;span class="c1"&gt;# Return a function from another function
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_greeter&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;say_hi&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;say_hi&lt;/span&gt;   &lt;span class="c1"&gt;# returning the function, not calling it
&lt;/span&gt;
&lt;span class="n"&gt;greeter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_greeter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;greeter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;           &lt;span class="c1"&gt;# prints: Hi!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the entire foundation that decorators are built on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why are decorators needed?
&lt;/h3&gt;

&lt;p&gt;Imagine there are many functions in a project, and each function needs logging.&lt;/p&gt;

&lt;h4&gt;
  
  
  Without decorators:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function ended&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function ended&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repeated code&lt;/li&gt;
&lt;li&gt;Hard to maintain in large projects&lt;/li&gt;
&lt;li&gt;If logging changes, every function must be updated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Decorators solve this problem by reusing common functionality.&lt;/p&gt;

&lt;h4&gt;
  
  
  With decorators:
&lt;/h4&gt;

&lt;p&gt;Using decorators, the repeated code (&lt;code&gt;"Function started"&lt;/code&gt; and &lt;code&gt;"Function ended"&lt;/code&gt;) can be moved into a single reusable decorator.&lt;br&gt;
Instead of writing the same lines inside every function, the decorator handles it automatically.&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 1: Create the decorator
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function ended&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Step 2: Apply the Decorator&lt;/strong&gt;
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@log_function&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;


&lt;span class="nd"&gt;@log_function&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Calling the Functions&lt;/strong&gt;
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Function&lt;/span&gt; &lt;span class="n"&gt;started&lt;/span&gt;
&lt;span class="n"&gt;Function&lt;/span&gt; &lt;span class="n"&gt;ended&lt;/span&gt;
&lt;span class="mi"&gt;5&lt;/span&gt;

&lt;span class="n"&gt;Function&lt;/span&gt; &lt;span class="n"&gt;started&lt;/span&gt;
&lt;span class="n"&gt;Function&lt;/span&gt; &lt;span class="n"&gt;ended&lt;/span&gt;
&lt;span class="mi"&gt;20&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  What changed?
&lt;/h4&gt;

&lt;p&gt;The functions now only contain their main logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The extra behavior (logging) is handled by the decorator separately.&lt;/p&gt;

&lt;h3&gt;
  
  
  Visual understanding
&lt;/h3&gt;

&lt;p&gt;When this runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python internally does this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;add&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;log_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the actual flow becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="err"&gt;├──&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="err"&gt;├──&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="n"&gt;original&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="err"&gt;├──&lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function ended&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="err"&gt;└──&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Better Version Using *args and **kwargs
&lt;/h2&gt;

&lt;p&gt;The previous decorator only works for functions with two arguments.&lt;br&gt;
A more reusable decorator looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function ended&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now it works with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;any number of arguments&lt;/li&gt;
&lt;li&gt;positional arguments&lt;/li&gt;
&lt;li&gt;keyword arguments&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this is powerful
&lt;/h2&gt;

&lt;p&gt;Imagine 100 functions needing logging.&lt;/p&gt;

&lt;p&gt;Without decorators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated code everywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With decorators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;write logging once&lt;/li&gt;
&lt;li&gt;reuse everywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is one of the biggest reasons decorators are widely used in real-world Python projects and frameworks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://flask.palletsprojects.com/en/stable/" rel="noopener noreferrer"&gt;Flask&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.djangoproject.com/" rel="noopener noreferrer"&gt;Django&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://fastapi.tiangolo.com/" rel="noopener noreferrer"&gt;FastAPI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/pytorch-101-advanced" rel="noopener noreferrer"&gt;PyTorch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/introduction-to-tensorflow-build-ai-across-domains" rel="noopener noreferrer"&gt;TensorFlow&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common practical examples of Python decorators
&lt;/h2&gt;

&lt;p&gt;A few of the most common practical examples are listed here, from solo projects to production systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Timing and performance measurement
&lt;/h3&gt;

&lt;p&gt;Useful when profiling slow functions or benchmarking code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;timer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ran in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;

&lt;span class="nd"&gt;@timer&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;

&lt;span class="nf"&gt;process_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# process_data ran in 0.0312s
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;perf_counter()&lt;/code&gt; is preferred over &lt;code&gt;time.time()&lt;/code&gt; for short measurements, and it's higher resolution and is not affected by system clock adjustments.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Logging
&lt;/h3&gt;

&lt;p&gt;Instead of adding print statements everywhere, a logging decorator handles it in one place.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;

&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_calls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calling &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; | args=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; kwargs=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; returned &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;

&lt;span class="nd"&gt;@log_calls&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;

&lt;span class="nf"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# INFO: Calling multiply | args=(4, 5) kwargs={}
# INFO: multiply returned 20
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production, you'd swap &lt;code&gt;logging.info&lt;/code&gt; for a structured logger like &lt;code&gt;structlog&lt;/code&gt; or a cloud logging sink.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Retry on failure
&lt;/h3&gt;

&lt;p&gt;Critical for network calls, API requests, or anything that can fail transiently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decorator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Attempt &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; failed after &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; attempts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;decorator&lt;/span&gt;

&lt;span class="nd"&gt;@retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;times&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.example.com/data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Attempt 1 failed: Connection timeout
# Attempt 2 failed: Connection timeout
# Attempt 3 failed: Connection timeout
# Exception: fetch_data failed after 3 attempts
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice this is a &lt;strong&gt;decorator factory&lt;/strong&gt; — &lt;code&gt;retry(times=3)&lt;/code&gt; returns the actual decorator. This is how you pass arguments to decorators.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Caching memoization
&lt;/h3&gt;

&lt;p&gt;Avoids recomputing expensive results by storing previous outputs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;memoize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cache miss — computing for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cache hit for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;

&lt;span class="nd"&gt;@memoize&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Cache miss — computing for (6,)
# Cache miss — computing for (5,)
# ...
&lt;/span&gt;&lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Cache hit for (6,)   ← instantly returns stored result
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python actually ships a production-grade version of this built in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lru_cache&lt;/span&gt;

&lt;span class="nd"&gt;@lru_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maxsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;lru_cache&lt;/code&gt; (Least Recently Used) is thread-safe and evicts old entries when the cache is full — use it over a hand-rolled version in real projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Access control authorization
&lt;/h3&gt;

&lt;p&gt;A staple in web frameworks like Flask and Django.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;require_role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decorator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;PermissionError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Access denied. Required role: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;decorator&lt;/span&gt;

&lt;span class="nd"&gt;@require_role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delete_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Deleting user &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;admin&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Shaoni&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;guest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Guest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;viewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;delete_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;admin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# Deleting user 42
&lt;/span&gt;&lt;span class="nf"&gt;delete_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;guest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# PermissionError: Access denied. Required role: admin
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Django's &lt;code&gt;@login_required&lt;/code&gt; and &lt;code&gt;@permission_required&lt;/code&gt; follow this exact pattern internally.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Input validation
&lt;/h3&gt;

&lt;p&gt;Validate arguments before they even reach your function's logic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_positive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;arg_positions&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decorator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;arg_positions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Argument at position &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; must be positive, got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;decorator&lt;/span&gt;

&lt;span class="nd"&gt;@validate_positive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_area&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;

&lt;span class="nf"&gt;calculate_area&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# 50
&lt;/span&gt;&lt;span class="nf"&gt;calculate_area&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# ValueError: Argument at position 0 must be positive
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;7. Rate Limiting&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Preventing a function from being called too frequently is very common in API clients.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rate_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;calls_per_second&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;min_interval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;calls_per_second&lt;/span&gt;
    &lt;span class="n"&gt;last_called&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="c1"&gt;# mutable container to hold state in closure
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decorator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;last_called&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;wait&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;min_interval&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;wait&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rate limit: waiting &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;last_called&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;decorator&lt;/span&gt;

&lt;span class="nd"&gt;@rate_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;calls_per_second&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calling &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;call_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;call_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/posts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# Rate limit: waiting 0.49s
&lt;/span&gt;&lt;span class="nf"&gt;call_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/comments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Rate limit: waiting 0.49s
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick reference
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decorator&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Real-world Equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@timer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Measure execution time&lt;/td&gt;
&lt;td&gt;Profiling, benchmarking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@log_calls&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Audit function calls&lt;/td&gt;
&lt;td&gt;Observability, debugging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@retry&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Handle transient failures&lt;/td&gt;
&lt;td&gt;API clients, DB connections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@lru_cache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cache expensive results&lt;/td&gt;
&lt;td&gt;ML inference, DB queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@require_role&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Guard endpoints by role&lt;/td&gt;
&lt;td&gt;Django, Flask auth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@validate_positive&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sanitize inputs early&lt;/td&gt;
&lt;td&gt;Data pipelines, APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@rate_limit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Throttle call frequency&lt;/td&gt;
&lt;td&gt;External API clients&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Real-world use cases in frameworks
&lt;/h2&gt;

&lt;p&gt;Decorators are heavily used in modern Python frameworks because they provide a clean and reusable way to add functionality to applications without modifying the core business logic.&lt;br&gt;
Frameworks such as Flask and Django use decorators for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Routing&lt;/li&gt;
&lt;li&gt;Authentication&lt;/li&gt;
&lt;li&gt;Authorization&lt;/li&gt;
&lt;li&gt;Caching&lt;/li&gt;
&lt;li&gt;Request validation&lt;/li&gt;
&lt;li&gt;Restricting HTTP methods&lt;/li&gt;
&lt;li&gt;Logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These decorators make applications cleaner, easier to maintain, and more readable.&lt;/p&gt;
&lt;h3&gt;
  
  
  Flask routing decorator
&lt;/h3&gt;

&lt;p&gt;One of the most common examples of decorators appears in Flask routing.&lt;br&gt;
Using Flask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;home&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Homepage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;is a decorator.&lt;br&gt;
It tells Flask:&lt;br&gt;
“When a user visits &lt;code&gt;/&lt;/code&gt;, execute the &lt;code&gt;home()&lt;/code&gt; function.”&lt;/p&gt;
&lt;h3&gt;
  
  
  Flask authentication decorator
&lt;/h3&gt;

&lt;p&gt;Decorators are also commonly used for authentication.&lt;br&gt;
Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/dashboard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@login_required&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dashboard&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dashboard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@login_required&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;checks whether the user is logged in before allowing access to the dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this is useful
&lt;/h3&gt;

&lt;p&gt;Without decorators, authentication checks would need to be repeated inside every protected function.&lt;br&gt;
Example without decorator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dashboard&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;logged_in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please log in&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dashboard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using decorators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;avoids repeated code&lt;/li&gt;
&lt;li&gt;keeps route definitions clean&lt;/li&gt;
&lt;li&gt;centralizes authentication logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This becomes extremely useful in large applications with many protected routes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Django authentication decorator
&lt;/h3&gt;

&lt;p&gt;Django also uses decorators extensively.&lt;br&gt;
Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.contrib.auth.decorators&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;login_required&lt;/span&gt;
&lt;span class="nd"&gt;@login_required&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dashboard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;HttpResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Welcome&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;@login_required&lt;/code&gt; decorator ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;only authenticated users can access the view&lt;/li&gt;
&lt;li&gt;unauthorized users are redirected to the login page&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Reusable security checks&lt;/li&gt;
&lt;li&gt;Cleaner view functions&lt;/li&gt;
&lt;li&gt;Better maintainability&lt;/li&gt;
&lt;li&gt;Centralized authentication handling&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Django HTTP method restriction
&lt;/h3&gt;

&lt;p&gt;Django provides decorators to restrict HTTP request methods.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.views.decorators.http&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;require_POST&lt;/span&gt;
&lt;span class="nd"&gt;@require_POST&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;HttpResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Submitted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The decorator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@require_POST&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ensures the function only accepts POST requests.&lt;br&gt;
If a GET request is sent, Django automatically returns an error.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why this matters
&lt;/h3&gt;

&lt;p&gt;This helps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;enforce API rules&lt;/li&gt;
&lt;li&gt;improve security&lt;/li&gt;
&lt;li&gt;prevent invalid request types&lt;/li&gt;
&lt;li&gt;simplify validation logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without decorators, manual checks would be needed inside every function.&lt;/p&gt;
&lt;h3&gt;
  
  
  Django caching decorator
&lt;/h3&gt;

&lt;p&gt;Decorators are also used for performance optimization.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.views.decorators.cache&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cache_page&lt;/span&gt;
&lt;span class="nd"&gt;@cache_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;my_view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;HttpResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cached&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@cache_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;stores the response for 60 seconds.&lt;br&gt;
If another user requests the same page during that time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Django serves the cached version&lt;/li&gt;
&lt;li&gt;the function does not run again&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Advanced decorator concepts
&lt;/h2&gt;

&lt;p&gt;Once the basic concepts are understood, the next step is to learn how decorators are implemented in production-grade Python applications. Advanced decorator patterns solve practical problems such as preserving function metadata, creating configurable decorators, and combining multiple decorators together.&lt;/p&gt;

&lt;p&gt;These concepts are widely used in frameworks, libraries, and enterprise-level Python applications.&lt;/p&gt;
&lt;h3&gt;
  
  
  Preserving function metadata with &lt;code&gt;functools.wraps&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;One common issue with decorators is that they replace the original function with the wrapper function. As a result, important metadata such as the function name, documentation string, annotations, and debugging information may be lost.&lt;/p&gt;

&lt;p&gt;Consider the following decorator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decorator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@decorator&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;This function greets the user&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now checking the function name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;wrapper&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of returning &lt;code&gt;"greet"&lt;/code&gt;, Python returns &lt;code&gt;"wrapper"&lt;/code&gt; because the original metadata has been overridden by the wrapper function.&lt;br&gt;
This creates problems for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;debugging&lt;/li&gt;
&lt;li&gt;logging&lt;/li&gt;
&lt;li&gt;API documentation&lt;/li&gt;
&lt;li&gt;introspection&lt;/li&gt;
&lt;li&gt;testing frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To solve this problem, Python provides &lt;code&gt;functools.wraps&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Using &lt;code&gt;functools.wraps&lt;/code&gt;
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decorator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

   &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Using it again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@decorator&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;This function greets the user&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;greet&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;@wraps(func)&lt;/code&gt; decorator copies the original function metadata into the wrapper function. This is considered a best practice when writing decorators in production applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decorators with arguments
&lt;/h3&gt;

&lt;p&gt;In many real-world scenarios, decorators need configuration values. This requires creating decorators that accept arguments.&lt;br&gt;
A decorator with arguments introduces an additional level of nesting.&lt;br&gt;
Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decorator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

       &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

           &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
               &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;

   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;decorator&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Hello&lt;/span&gt;
&lt;span class="n"&gt;Hello&lt;/span&gt;
&lt;span class="n"&gt;Hello&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Understanding the structure
&lt;/h2&gt;

&lt;p&gt;This example contains three functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;        &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;accepts&lt;/span&gt; &lt;span class="n"&gt;decorator&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;
&lt;span class="nf"&gt;decorator&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;     &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;accepts&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;original&lt;/span&gt; &lt;span class="n"&gt;function&lt;/span&gt;
&lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;       &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;executes&lt;/span&gt; &lt;span class="n"&gt;additional&lt;/span&gt; &lt;span class="n"&gt;logic&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution flow becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;greet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern is heavily used in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retry mechanisms&lt;/li&gt;
&lt;li&gt;caching systems&lt;/li&gt;
&lt;li&gt;rate limiting&lt;/li&gt;
&lt;li&gt;authorization frameworks&lt;/li&gt;
&lt;li&gt;logging systems&lt;/li&gt;
&lt;li&gt;timeout handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, a retry decorator may accept the number of retries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A caching decorator may accept an expiration time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expire&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Decorator arguments make decorators significantly more flexible and reusable.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Chaining Multiple Decorators&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Python allows multiple decorators to be applied to the same function.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@decorator_one&lt;/span&gt;
&lt;span class="nd"&gt;@decorator_two&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is internally interpreted as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;decorator_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;decorator_two&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The execution order is important.&lt;/p&gt;

&lt;p&gt;Python applies decorators from bottom to top:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;decorator_two&lt;/code&gt; wraps the function first&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;decorator_one&lt;/code&gt; wraps the result next&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Example of chained decorators
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decorator_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
       &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Decorator One - Before&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

       &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

       &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Decorator One - After&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decorator_two&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
       &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Decorator Two - Before&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

       &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

       &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Decorator Two - After&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Applying both decorators:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@decorator_one&lt;/span&gt;
&lt;span class="nd"&gt;@decorator_two&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;greet&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Decorator&lt;/span&gt; &lt;span class="n"&gt;One&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;Before&lt;/span&gt;
&lt;span class="n"&gt;Decorator&lt;/span&gt; &lt;span class="n"&gt;Two&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;Before&lt;/span&gt;
&lt;span class="n"&gt;Hello&lt;/span&gt;
&lt;span class="n"&gt;Decorator&lt;/span&gt; &lt;span class="n"&gt;Two&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;After&lt;/span&gt;
&lt;span class="n"&gt;Decorator&lt;/span&gt; &lt;span class="n"&gt;One&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;After&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding the execution flow
&lt;/h3&gt;

&lt;p&gt;The function call stack becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;decorator_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="nf"&gt;decorator_two&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;greet&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates nested execution layers where each decorator adds behavior before and after the wrapped function. Decorator chaining is extensively used in frameworks. For example, a web route may simultaneously use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;authentication&lt;/li&gt;
&lt;li&gt;caching&lt;/li&gt;
&lt;li&gt;rate limiting&lt;/li&gt;
&lt;li&gt;logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/dashboard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@login_required&lt;/span&gt;
&lt;span class="nd"&gt;@cache_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dashboard&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dashboard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each decorator contributes a separate layer of functionality while keeping the core business logic clean and isolated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Python decorators provide a clean and powerful way to add extra functionality to functions without modifying the original code. They help reduce code duplication, improve reusability, and make applications easier to maintain.&lt;/p&gt;

&lt;p&gt;From simple logging examples to advanced use cases in frameworks like Flask and Django, decorators play an important role in modern Python development. Understanding how decorators work helps in writing cleaner, more scalable, and more professional Python code.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>tutorial</category>
      <category>learning</category>
    </item>
    <item>
      <title>NVIDIA B300 Blackwell Ultra: A Technical Deep Dive</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Thu, 07 May 2026 23:53:39 +0000</pubDate>
      <link>https://dev.to/digitalocean/nvidia-b300-blackwell-ultra-a-technical-deep-dive-5c6i</link>
      <guid>https://dev.to/digitalocean/nvidia-b300-blackwell-ultra-a-technical-deep-dive-5c6i</guid>
      <description>&lt;p&gt;The NVIDIA B300 (Blackwell Ultra) is NVIDIA's latest data center GPU, built for AI training and inference. In this deep dive, we break down the full architecture, from its dual-die design and 5th-generation tensor cores to NVFP4 precision and NVLink 5 scaling.        &lt;/p&gt;

&lt;p&gt;What we cover:&lt;br&gt;
  &lt;a href="https://www.youtube.com/watch?v=Kf_3n_pxa0I" rel="noopener noreferrer"&gt;00:00&lt;/a&gt; - Introduction&lt;br&gt;
  &lt;a href="https://www.youtube.com/watch?v=Kf_3n_pxa0I&amp;amp;t=56s" rel="noopener noreferrer"&gt;00:56&lt;/a&gt; - Why the B300 exists&lt;br&gt;
  &lt;a href="https://www.youtube.com/watch?v=Kf_3n_pxa0I&amp;amp;t=145s" rel="noopener noreferrer"&gt;02:25&lt;/a&gt; - B300 vs B200 vs H100 — the numbers&lt;br&gt;
  &lt;a href="https://www.youtube.com/watch?v=Kf_3n_pxa0I&amp;amp;t=239s" rel="noopener noreferrer"&gt;03:59&lt;/a&gt; - Dual-reticle design &amp;amp; NV-HBI interconnect&lt;br&gt;
  &lt;a href="https://www.youtube.com/watch?v=Kf_3n_pxa0I&amp;amp;t=304s" rel="noopener noreferrer"&gt;05:04&lt;/a&gt; - 5th-gen tensor cores &amp;amp; NVFP4&lt;br&gt;
  &lt;a href="https://www.youtube.com/watch?v=Kf_3n_pxa0I&amp;amp;t=476s" rel="noopener noreferrer"&gt;07:56&lt;/a&gt; - 288GB HBM3e memory breakdown&lt;br&gt;
  &lt;a href="https://www.youtube.com/watch?v=Kf_3n_pxa0I&amp;amp;t=544s" rel="noopener noreferrer"&gt;09:04&lt;/a&gt; - Multi-GPU &amp;amp; NVLink 5 architecture&lt;br&gt;
  &lt;a href="https://youtu.be/watch?v=Kf_3n_pxa0I&amp;amp;t=578s" rel="noopener noreferrer"&gt;10:38&lt;/a&gt; - Performance &amp;amp; efficiency summary&lt;/p&gt;

</description>
      <category>nvidia</category>
      <category>ai</category>
      <category>gpu</category>
      <category>hardware</category>
    </item>
    <item>
      <title>Video Demo: How Does Model Compression Change AI Reasoning?</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Thu, 30 Apr 2026 16:00:00 +0000</pubDate>
      <link>https://dev.to/digitalocean/video-demo-how-does-model-compression-change-ai-reasoning-3jdk</link>
      <guid>https://dev.to/digitalocean/video-demo-how-does-model-compression-change-ai-reasoning-3jdk</guid>
      <description>&lt;p&gt;In this video, I benchmark Mistral-7B-Instruct-v0.2 on an NVIDIA H200 DigitalOcean GPU in three formats: FP16, INT8, and 4-bit AWQ — and test how precision impacts reasoning quality, speed, VRAM usage, and real serving density.&lt;/p&gt;

&lt;p&gt;We’ll cover:&lt;br&gt;
👉 What quantization actually does to model weights&lt;br&gt;
👉 Where reasoning starts breaking down (FP16 → INT8 → 4-bit)&lt;br&gt;
👉 Why memory savings don’t always reduce total GPU usage in vLLM&lt;br&gt;
👉 Tokens/sec vs aggregate throughput&lt;br&gt;
👉 When 4-bit wins — and when it doesn’t&lt;/p&gt;

&lt;p&gt;If you're building AI systems and deciding between full precision and aggressive quantization, this is a practical infrastructure-level breakdown of the real tradeoffs.&lt;/p&gt;

&lt;p&gt;Chapters:&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=50YBZvdwMWM" rel="noopener noreferrer"&gt;00:00&lt;/a&gt; Introduction&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=50YBZvdwMWM&amp;amp;t=41s" rel="noopener noreferrer"&gt;00:41&lt;/a&gt; Understanding how quantization works&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=50YBZvdwMWM&amp;amp;t=102s" rel="noopener noreferrer"&gt;01:42&lt;/a&gt; Why do you even need quantization&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=50YBZvdwMWM&amp;amp;t=158s" rel="noopener noreferrer"&gt;02:38&lt;/a&gt; The experiment we ran&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=50YBZvdwMWM&amp;amp;t=236s" rel="noopener noreferrer"&gt;03:56&lt;/a&gt; The observations we had&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=50YBZvdwMWM&amp;amp;t=343s" rel="noopener noreferrer"&gt;05:43&lt;/a&gt; Overall learnings&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nvidia</category>
      <category>tutorial</category>
      <category>models</category>
    </item>
    <item>
      <title>Tutorial: Build Long-Term Memory in AI Agents with LangGraph and Mem0</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Tue, 28 Apr 2026 22:41:16 +0000</pubDate>
      <link>https://dev.to/digitalocean/tutorial-build-long-term-memory-in-ai-agents-with-langgraph-and-mem0-2ln1</link>
      <guid>https://dev.to/digitalocean/tutorial-build-long-term-memory-in-ai-agents-with-langgraph-and-mem0-2ln1</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was written by Adrian Payong (AI Consultant and Technical Writer) and edited by Shaoni Mukherjee (AI Technical Writer, DigitalOcean)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Persistent Memory Enhances Agents:&lt;/strong&gt; LangGraph agents will persist memory between conversations that you can use to customize your interactions from session to session. Agents will remember who you are and learn about you over time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory vs Context Window:&lt;/strong&gt; Context window provides short-term contextual memory that expires at the end of the session. Long-term memory (Mem0) stores user-specific facts persistently. &lt;a href="https://www.digitalocean.com/community/tutorials/production-ready-rag-pipelines-haystack-langchain" rel="noopener noreferrer"&gt;RAG&lt;/a&gt; augments both short-term and long-term memory by retrieving external knowledge.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph Structure:&lt;/strong&gt; LangGraph's graph structure makes adding memory nodes straightforward. Define a State with &lt;em&gt;mem0_user_id&lt;/em&gt; and build your chatbot node to perform a search/index of memories, then add that memory each turn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mem0 Capabilities:&lt;/strong&gt; Mem0 allows extracting semantic memory and offers flexible persistent storage. It’s compatible with any LLM and enables you to define your own memory functionality, unlike closed systems like OpenAI Memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory System Design:&lt;/strong&gt; Use semantic search to retrieve facts, filter or consolidate memories to avoid duplicates, and balance detail vs summary for efficiency. Choosing the right vector DB and indexing strategy is crucial.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production Concerns:&lt;/strong&gt; Plan for privacy, retention policies, and scalability. Memory greatly reduces token usage and improves response relevance, but adds a layer of storage and computation.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Traditional &lt;a href="https://www.digitalocean.com/community/conceptual-articles/rag-ai-agents-agentic-rag-comparative-analysis" rel="noopener noreferrer"&gt;AI agents&lt;/a&gt; use short-term context (aka the current conversation window) and often &lt;strong&gt;forget&lt;/strong&gt; previous sessions after a chat ends. But what if we could give agents long-term memory? Building agents with memories of user preferences, facts, and history allows us to build more personalized and capable agents. This can be done by combining &lt;a href="https://www.digitalocean.com/community/tutorials/getting-started-agentic-ai-langgraph" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; – a stateful graph-based agent framework – with Mem0, a purpose-built memory layer. Using memories, an LLM agent can “remember” past information and leverage it.&lt;/p&gt;

&lt;p&gt;When combining LangGraph with &lt;a href="https://arxiv.org/abs/2504.19413" rel="noopener noreferrer"&gt;Mem0&lt;/a&gt;, you get context-aware agents. Since Mem0 will store and retrieve memories, each new session with LangGraph can add a summary of relevant previous interactions to the prompt. This allows building agents that can have longer, more personal, coherent conversations with users over time. In this article, we cover the main types of memory, walk through the LangGraph+Mem0 workflow, provide code examples, compare different memory strategies (rag vs memory), and discuss things to consider at scale (&lt;a href="https://www.digitalocean.com/resources/articles/what-are-vector-databases" rel="noopener noreferrer"&gt;vector DBs&lt;/a&gt;, privacy, cost).&lt;/p&gt;

&lt;h2&gt;
  
  
  AI memory: Short-term vs retrieval vs long-term
&lt;/h2&gt;

&lt;p&gt;AI agents use different &lt;em&gt;memory types&lt;/em&gt; depending on scope:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Short-term (Session) Memory:&lt;/strong&gt; Also known as window memory, this refers to your current chat history in a single conversation thread. This thread-scoped state is automatically handled by LangGraph. However, after the conversation ends, that window is closed. If you ask your agent to “list my previously saved documents”, it can only recall documents you’ve provided during that same chat session. When operating directly on raw chat history (past messages), you’re limited by the LLM context window, which causes prompt bloat and higher costs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval Memory (RAG):&lt;/strong&gt; This refers to the process of retrieving information from external sources, such as documents or a database. Retrieval-Augmented Generation pipelines leverage a &lt;a href="https://www.digitalocean.com/resources/articles/what-are-vector-databases" rel="noopener noreferrer"&gt;vector database&lt;/a&gt; to dynamically retrieve related information based on the user’s current query. You can think of RAG as your agent “reading” external documents each time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-term (Persistent) Memory:&lt;/strong&gt; This is a stable, user-specific memory that persists across sessions. Long-term memory allows you to store distilled facts, preferences, and experiences about the user that can be recalled in later conversations. Unlike RAG, which only brings in generic info, long-term memory stores personalized context about the user.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, short-term memory handles the current conversation, RAG augments with external data, and long-term memory (Mem0) provides a continuity of user-specific context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of LangGraph
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/local-ai-agents-with-langgraph-and-ollama" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; is a framework for building stateful graph-based agents. Instead of a linear chain, a LangGraph lets you construct nodes and edges that represent your agent's workflow. Nodes handle small pieces of functionality, such as calling an LLM, performing calculations, or retrieving data from memory, and then return their updated state. Edges are conditionally executed based on the current state and are responsible for routing flow between nodes. There is a central &lt;a href="https://docs.langchain.com/oss/python/langgraph/graph-api" rel="noopener noreferrer"&gt;StateGraph&lt;/a&gt; object that maintains the agent's shared state throughout the workflow. Key points about LangGraph:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmjzojdylokj36m5dk4wg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmjzojdylokj36m5dk4wg.png" alt="image" width="800" height="565"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;State Management:&lt;/strong&gt; LangGraph maintains conversation state in a State object, which flows through nodes. This contains all message history as well as any metadata you want to associate with the user. You can persist state across nodes via checkpointing, but by default, it’s only retained within a single session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conditional Edges:&lt;/strong&gt; Edges can be conditioned, so instead of simply chaining nodes, a LangGraph can branch or even loop. For example, you can route to different tools based on user intent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensible:&lt;/strong&gt; You want to use a different LLM provider? (OpenAI? Anthropic? Google? ...) You can do it!. It is designed with production in mind. Supports streaming, error handling, and more.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session Scope:&lt;/strong&gt; By default, if you build a LangGraph agent, it will only have access to the context of the current session. Once the chat “ends,”  the state is cleared unless you store it externally.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Mem0 provides
&lt;/h2&gt;

&lt;p&gt;Mem0 is a persistent memory solution for &lt;a href="https://www.digitalocean.com/resources/articles/types-of-ai-agents" rel="noopener noreferrer"&gt;AI agents.&lt;/a&gt; Think of it as a semantic memory layer: Mem0 extracts, stores, and retrieves information from conversations &amp;amp; facts you tell it about your users. Mem0 is not an LLM. It is a database + search layer built specifically for “AI memory”. Key features of Mem0 include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Memory:&lt;/strong&gt; Mem0 extracts only the factual knowledge from each raw chat message and stores it in short memory phrases. Ex: “I love pizza” → Stored memory “Loves pizza”. This helps keep the overall memory size small.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Level Memory:&lt;/strong&gt; Mem0 has several levels of namespaces you can define (user-level, session-level, agent-level). You can isolate each user’s memories or share global agent facts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Retrieval:&lt;/strong&gt; Given a query (ex, the user’s latest message), Mem0 will search via vector similarity and return the most relevant stored memories. It scopes by default to a user ID, so you only access that user’s stored history.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexible Storage:&lt;/strong&gt; Connect mem0 to any storage backend. Use &lt;a href="https://www.digitalocean.com/community/tutorials/how-and-when-to-use-sqlite" rel="noopener noreferrer"&gt;SQLite&lt;/a&gt; for local testing, or connect it to vector databases like Qdrant, Pinecone, Weaviate, and more. In the cloud version, Mem0 manages this for you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open Source + Cloud:&lt;/strong&gt; There’s an open-source client library for self-hosting, and a cloud platform ( app.mem0.ai ) for easy setup.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Integration architecture
&lt;/h2&gt;

&lt;p&gt;Bringing it all together, the integration follows a clear flow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtb4mky9or7qxr83nogd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtb4mky9or7qxr83nogd.png" alt="image" width="800" height="977"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Message reception&lt;/strong&gt; – your agent gets a user message through the LangGraph node (e.g., chatbot).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory search&lt;/strong&gt; –  The node calls &lt;em&gt;mem0.search()&lt;/em&gt;, providing the latest user message and their &lt;em&gt;userId&lt;/em&gt;. This returns a list of memories likely to contain relevant memories, ranked by vector similarity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context construction&lt;/strong&gt; – the memory list is formatted into a human‑readable context string, which is prepended to the system prompt. This allows the LLM to be "aware" of past messages when formulating its response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM invocation&lt;/strong&gt; – the agent feeds the system message and conversation history into the LLM (ChatOpenAI, or other provider). The response includes the current user input along with any memories supplied.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory update&lt;/strong&gt; –  once the response has been sent to the user, the agent calls &lt;em&gt;mem0.add()&lt;/em&gt; asynchronously to store the interaction (user message and assistant response) for later retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LangGraph maintains state across iterations, and Mem0 persists long‑term storage. Below is a code sketch example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chatbot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;State&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mem0_user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# 1. Retrieve relevant memories with user filter
&lt;/span&gt;        &lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mem0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;memory_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
        &lt;span class="c1"&gt;# 2. Build context string
&lt;/span&gt;        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Relevant information from previous conversations:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;memory_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="c1"&gt;# 3. Prepend system message
&lt;/span&gt;        &lt;span class="n"&gt;system_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            You are a helpful assistant. Use the provided context to personalize your response.
            &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;full_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;
        &lt;span class="c1"&gt;# 4. Generate response
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# 5. Store interaction with explicit user_id
&lt;/span&gt;        &lt;span class="n"&gt;interaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;mem0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Fallback without memory
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Memory extraction, filtering, and summarization strategies
&lt;/h2&gt;

&lt;p&gt;This diagram illustrates conceptual memory architecture at a high level for AI applications. Reliable persistent memory is built through three controls: defining what should be stored, specifying how memory should be updated over time, and filtering writes to preserve accuracy and usefulness.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfmwlt0dc9hjwm2hy2l1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfmwlt0dc9hjwm2hy2l1.png" alt="image" width="800" height="716"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First, define what counts as memory. Mem0’s framework for writing &lt;a href="https://docs.mem0.ai/open-source/features/custom-fact-extraction-prompt#see-it-in-action" rel="noopener noreferrer"&gt;custom fact extraction prompts&lt;/a&gt; encourages you to clearly define exactly what facts should be stored. This is valuable if you want order numbers, preferences, support history, or task constraints written to persistent memory, but don’t want casual small talk entering long-term storage. The &lt;a href="https://docs.mem0.ai/platform/features/v2-memory-filters" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; clearly explains how broad prompts lead to noisy memory.&lt;/p&gt;

&lt;p&gt;Second, define how memory changes over time. Mem0 also provides a configurable &lt;a href="https://docs.mem0.ai/open-source/features/custom-fact-extraction-prompt" rel="noopener noreferrer"&gt;&lt;em&gt;custom_update_memory_prompt&lt;/em&gt;&lt;/a&gt; instructing the LLM to choose among ADD, UPDATE, DELETE, or NONE actions when new facts must be reconciled with existing memory. Without this level of instruction, when users correct themselves, change preferences, or revoke earlier instructions, the system will simply layer stale facts on top of each other indefinitely.&lt;/p&gt;

&lt;p&gt;Third, &lt;a href="https://docs.mem0.ai/cookbooks/essentials/controlling-memory-ingestion" rel="noopener noreferrer"&gt;control ingestion&lt;/a&gt; quality. Uncontrolled writing can store speculation as fact. For example, if an AI assistant stores every user message without filtering, temporary questions, misunderstandings, or incomplete information may become permanent memory entries. This can lead to incorrect assumptions in future interactions. A healthy production practice is to store only verified facts and important preferences in real time, while processing less critical conversational data asynchronously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade‑offs between memory approaches
&lt;/h2&gt;

&lt;p&gt;Integrating long‑term memory into an agent introduces trade‑offs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Storage vs latency&lt;/strong&gt; – storing full conversations allows perfect recall, but comes at the cost of higher storage requirements and latency when retrieving memories. Summarization can reduce storage and increase retrieval at the expense of precision.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy vs personalization&lt;/strong&gt; –  memory solutions must protect user privacy. Mem0 isolates memories by user ID by scoping them, but you should also consider applying data retention policies and allowing users to delete memories via the API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy vs cost&lt;/strong&gt; – retrieving too many memories can confuse the LLM, while retrieving too few may leave out critical information. You’ll need to tune &lt;em&gt;max_memories&lt;/em&gt; and the relevance threshold for your use case.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database choice&lt;/strong&gt; – a vector database like &lt;a href="https://supabase.com/docs/guides/database/extensions/pgvector" rel="noopener noreferrer"&gt;pgvector&lt;/a&gt;, &lt;a href="https://www.digitalocean.com/community/conceptual-articles/a-dive-into-vector-databases#introduction-to-pinecone" rel="noopener noreferrer"&gt;Pinecone&lt;/a&gt;, or &lt;a href="https://www.digitalocean.com/community/conceptual-articles/a-dive-into-vector-databases#weaviate-an-open-source-vector-database" rel="noopener noreferrer"&gt;Weaviate&lt;/a&gt;, differs in scalability and cost. Mem0 ships with pgvector in its reference implementation, but you can replace it with a different backend or managed service if you prefer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding these trade‑offs will help you design a memory system that balances performance, cost, and user experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  A step-by-step overview of the Mem0–LangGraph integration
&lt;/h2&gt;

&lt;p&gt;Here's a quick-start guide to connect Mem0 to LangGraph. This is a summary of the official documentation with some tips on how to optimize it.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Install dependencies
&lt;/h3&gt;

&lt;p&gt;Install the required libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;langgraph&lt;/span&gt; &lt;span class="n"&gt;langchain&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="n"&gt;mem0ai&lt;/span&gt; &lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;dotenv&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a &lt;em&gt;.env&lt;/em&gt; file with your API keys:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sk&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;
&lt;span class="n"&gt;MEM0_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;mem0&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set the embedding provider, model, and dimensions based on your preference.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Initialize LangGraph and Mem0
&lt;/h3&gt;

&lt;p&gt;Create a &lt;em&gt;State&lt;/em&gt; class that holds the conversation messages and a user ID. Initialize &lt;em&gt;StateGraph&lt;/em&gt;  and define the &lt;em&gt;chatbot&lt;/em&gt; node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;START&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph.message&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;add_messages&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mem0&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemoryClient&lt;/span&gt;
&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;State&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;HumanMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;mem0_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mem0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# No API key needed for local/serverless mode
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;State&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Imports packages necessary for state management, messages, OpenAI chat, and Mem0 memory.&lt;/li&gt;
&lt;li&gt;Loads environment variables from .env.&lt;/li&gt;
&lt;li&gt;Initializes a &lt;em&gt;State&lt;/em&gt; object with conversation history and a Mem0 user ID.&lt;/li&gt;
&lt;li&gt;Initializes a GPT-4o chat model and a Mem0 client.&lt;/li&gt;
&lt;li&gt;Creates a LangGraph state graph, which will be used to build the agent workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You will define th*e chatbot* function as shown earlier to search for memories, build context, generate a response, and store the interaction.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Build the conversation graph
&lt;/h3&gt;

&lt;p&gt;Add the &lt;em&gt;chatbot&lt;/em&gt; node and edges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chatbot&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;compiled_graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above code builds a basic LangGraph workflow that has the &lt;em&gt;chatbot&lt;/em&gt; node set as the starting execution point. It specifies the &lt;em&gt;chatbot&lt;/em&gt; function as the primary step to run and then loops back to itself for each turn of conversation; finally &lt;em&gt;graph.compile()&lt;/em&gt; translates that graph definition into an executable app.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Create a conversation runner
&lt;/h3&gt;

&lt;p&gt;Write a &lt;em&gt;run_connversation&lt;/em&gt; function that streams events from the compiled graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_conversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mem0_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configurable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;mem0_user_id&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
   &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mem0_user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;mem0_user_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;compiled_graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;values&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;last_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
       &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
           &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;span class="c1"&gt;# Main interaction loop
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
   &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Enter your user ID: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chatbot ready! Type &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to exit.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;You: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="k"&gt;break&lt;/span&gt;
       &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_conversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bot: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The code executes the chatbot, passing the user's message, assembling the root conversation state, and streaming through the compiled LangGraph workflow to receive the AI's response. The &lt;em&gt;main()&lt;/em&gt; function creates a basic command-line chat loop, prompting the user for input and displaying the bot's response until the user types to quit.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Deploy and monitor
&lt;/h3&gt;

&lt;p&gt;Deploy the agent in your preferred environment. Store memories in a vector database (&lt;a href="https://cloud.google.com/discover/what-is-pgvector?hl=en" rel="noopener noreferrer"&gt;pgvector&lt;/a&gt;, Pinecone, Weaviate, etc). Keep track of memory growth. Adjust cleanup frequencies. Tune retrieval settings to balance personalization, relevance, and system performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production considerations
&lt;/h2&gt;

&lt;p&gt;There are a few things you may want to think about when running a LangGraph+Mem0 agent in production:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Topic&lt;/th&gt;
&lt;th&gt;Main idea&lt;/th&gt;
&lt;th&gt;Practical notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vector Database&lt;/td&gt;
&lt;td&gt;Mem0 uses SQLite by default for quick testing, but production systems usually need a vector database.&lt;/td&gt;
&lt;td&gt;Ensure the database has an index on &lt;code&gt;user_id&lt;/code&gt;. Managed options such as Mem0 Cloud can handle this, while self-hosting is also possible. The database choice, such as Qdrant or Pinecone, affects cost, speed, and available features.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Privacy &amp;amp; Retention&lt;/td&gt;
&lt;td&gt;Memory systems store user data, so privacy and retention must be handled carefully.&lt;/td&gt;
&lt;td&gt;Encrypt sensitive fields when needed, remove memories after a defined period, and obtain user consent before storing personal data. Mem0 APIs can help delete or export data. &lt;a href="https://www.digitalocean.com/products/vpc" rel="noopener noreferrer"&gt;DigitalOcean VPC&lt;/a&gt;  can improve protection for the vector store.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost &amp;amp; Performance&lt;/td&gt;
&lt;td&gt;Adding memory lowers LLM token usage because prompts stay smaller, but it introduces database lookups.&lt;/td&gt;
&lt;td&gt;Semantic search is usually very fast and can be batched. &lt;a href="https://arxiv.org/abs/2504.19413?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;Mem0 reports&lt;/a&gt; about 90% token savings and 91% lower p95 latency versus a full-context method. Benchmark your own LLM setup to confirm latency.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reliability&lt;/td&gt;
&lt;td&gt;The memory database and LangGraph state should be designed for fault tolerance.&lt;/td&gt;
&lt;td&gt;Use LangGraph checkpoints to recover from crashes and maintain backups for memory storage. As the vector database grows, monitor usage and plan for scaling.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;The Mem0 API key and database must be protected.&lt;/td&gt;
&lt;td&gt;Restrict write access so only the agent can modify memory. In multi-agent or multi-tenant systems, isolate namespaces to improve security and separation.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Pairing LangGraph with Mem0 is one potential path towards transitioning from session-based agents to agents with persistent, long-lived memory scoped to individual users. LangGraph offers structured orchestration and short-lived conversation state management, while Mem0 enables persistent semantic memories that can be retrieved across sessions to increase continuity, personalization, and relevance. Carefully architected (e.g., selective extraction and retention, privacy controls, retrieval settings, etc.), this combined approach enables developers to create more powerful agents that remain efficient at scale, without relying on inflated chat history or generic document retrieval.&lt;/p&gt;

&lt;p&gt;In addition to local examples, a production-ready memory architecture also requires deployment infrastructure. &lt;a href="https://www.digitalocean.com/blog/digitalocean-gradient-ai-platform-langchain-integration" rel="noopener noreferrer"&gt;DigitalOcean's Langchain gradient integration&lt;/a&gt; allows connecting LangChain-powered workflows to the Gradient AI Platform. This provides developers with access to various models using GPU-accelerated serverless inference with a path to scale AI apps beyond the prototype phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.mem0.ai/integrations/langgraph" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.langchain.com/oss/python/langgraph/overview" rel="noopener noreferrer"&gt;LangGraph overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.langchain.com/oss/python/langgraph/graph-api" rel="noopener noreferrer"&gt;Graph API overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.mem0.ai/open-source/features/custom-fact-extraction-prompt#see-it-in-action" rel="noopener noreferrer"&gt;Custom Fact Extraction Prompt&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2504.19413?utm_source=chatgpt.comhttps://arxiv.org/abs/2504.19413?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>langgraph</category>
      <category>tutorial</category>
      <category>agents</category>
    </item>
    <item>
      <title>Building an LLM Tool Calling Workflow with DigitalOcean and Connected Databases</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Thu, 23 Apr 2026 17:50:49 +0000</pubDate>
      <link>https://dev.to/digitalocean/building-an-llm-tool-calling-workflow-with-digitalocean-and-connected-databases-12op</link>
      <guid>https://dev.to/digitalocean/building-an-llm-tool-calling-workflow-with-digitalocean-and-connected-databases-12op</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by  Shamim Raashid (Senior Solutions Architect) and Anish Singh Walia (Senior Technical Content Strategist)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/solutions/llm-applications" rel="noopener noreferrer"&gt;Intent-driven data interfaces&lt;/a&gt; give users flexible access to data through natural language, while your application keeps strict control over queries.&lt;/li&gt;
&lt;li&gt;The guardrail pattern places the AI system behind a strict tool menu so your backend owns every query and enforces permissions on DigitalOcean Managed Databases.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/community/conceptual-articles/integrate-gen-ai-agents" rel="noopener noreferrer"&gt;Gradient™ AI Platform Agents&lt;/a&gt; handle routing and memory, while &lt;a href="https://docs.digitalocean.com/products/functions/" rel="noopener noreferrer"&gt;DigitalOcean Functions&lt;/a&gt; and &lt;a href="https://www.digitalocean.com/resources/articles/serverless-inference" rel="noopener noreferrer"&gt;Serverless Inference&lt;/a&gt; handle secure execution and orchestration.&lt;/li&gt;
&lt;li&gt;Serverless Inference with local tools keeps database credentials in your environment and lets your existing backend own all validation and logging.&lt;/li&gt;
&lt;li&gt;This pattern scales across departments by adding new tools instead of exposing raw database access or writing new endpoints for every question.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Modern applications are undergoing a massive shift. End-users and customers no longer want to hunt through complex navigation menus or rely on rigid, predefined UI buttons to find what they need. They expect conversational, ad-hoc access to their data, asking questions like, "Where is my order from last Tuesday?" or "How does my usage this month compare to last year?" Historically, bridging this gap meant trapping users in a bottleneck, waiting for product teams to design, code, and deploy new UI features for every single unanticipated question.&lt;/p&gt;

&lt;p&gt;The naive AI solution to this bottleneck is "Text-to-SQL": handing an LLM your database schema and letting it translate user questions directly into queries. While this might be acceptable for internal, trusted analysts, it is a security nightmare for untrusted end-users and customers. It exposes your production systems to prompt injection (jailbreaking), hallucinated table names, and potential data leaks.&lt;/p&gt;

&lt;p&gt;We need a secure middle ground. We need a system that offers the infinite flexibility of natural language without ever letting the AI directly access the database.&lt;/p&gt;

&lt;p&gt;This blueprint outlines a modern architectural pattern using &lt;a href="https://www.digitalocean.com/products/managed-databases" rel="noopener noreferrer"&gt;DigitalOcean Managed Databases&lt;/a&gt; and &lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/" rel="noopener noreferrer"&gt;Gradient™ AI Platform&lt;/a&gt; to achieve exactly that. By shifting from direct query generation to Intent-Driven Function Routing (Tool Calling), the AI acts purely as an intelligent dispatcher. It safely brokers flexible, unanticipated data access for untrusted users, protecting your infrastructure while delivering a frictionless user experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Guardrail Pattern: Why Tool-Calling Outperforms Text-to-SQL
&lt;/h2&gt;

&lt;p&gt;The naive approach to building natural-language data interfaces is "Text-to-SQL", giving an LLM your database schema and asking it to write queries based on user prompts. While this might be acceptable for internal, trusted data analysts, for customer-facing applications, it is a security nightmare.&lt;/p&gt;

&lt;p&gt;Exposing your schema to untrusted users opens your system to prompt injection, hallucinations (the AI inventing columns that don't exist), and severe data leaks if a malicious user tricks the AI into querying another tenant's data or dropping tables. To solve this, modern applications use the Guardrail Pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Securing the Perimeter: The AI as an Intelligent Dispatcher
&lt;/h3&gt;

&lt;p&gt;In the Guardrail Pattern, the AI is placed in a secure zone and never touches your database directly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No Schema Exposure:&lt;/strong&gt; The LLM never sees your database schema, tables, or connection strings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Tool Menu:&lt;/strong&gt; Instead, it is given a simple menu of predefined tools, essentially function signatures like &lt;code&gt;get_order_status(order_id)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intent to Execution:&lt;/strong&gt; When a customer asks a question, the LLM translates their natural language into a standardized JSON payload requesting to use a specific tool. Your backend application receives this payload, validates the user's permissions, and executes hardcoded, highly optimized SQL queries against your &lt;a href="https://docs.digitalocean.com/products/databases/" rel="noopener noreferrer"&gt;DigitalOcean Managed Database&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq3t2wj3j6wl4tlfjji6f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq3t2wj3j6wl4tlfjji6f.png" alt="Secure" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because the execution layer remains entirely in your backend, you guarantee deterministic, secure data retrieval. The AI handles the messy natural language; your code handles the secure database execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Magic of Tool Chaining: Answering the Unanticipated
&lt;/h3&gt;

&lt;p&gt;A common critique of structured data access is: "Doesn't this just mean users have to wait for an engineer to write a new Python tool instead of waiting for a custom SQL query?" If tools were rigidly mapped one-to-one with user questions, the answer would be yes. But this is where Tool Chaining changes the engineering ROI entirely.&lt;/p&gt;

&lt;p&gt;Instead of building hyper-specific endpoints for every possible user question, your engineering team only needs to write foundational, primitive functions (e.g., get_user_orders and get_product_specs). Because the LLM is a reasoning engine, it can dynamically chain these primitive tools together to answer incredibly complex, unanticipated questions.&lt;/p&gt;

&lt;p&gt;For example, if a customer asks, "Based on my last three orders, which of your new products am I most likely to enjoy?" the LLM can autonomously:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Call the &lt;code&gt;get_user_orders&lt;/code&gt; tool.&lt;/li&gt;
&lt;li&gt;Analyze the returned JSON results.&lt;/li&gt;
&lt;li&gt;Call the &lt;code&gt;get_product_specs&lt;/code&gt; tool based on those results.&lt;/li&gt;
&lt;li&gt;Synthesize a final custom response for the user.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flcw4z1e4dsqyrur15www.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flcw4z1e4dsqyrur15www.png" alt="Tool_Chaining" width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The engineer never had to build a complex, dedicated "Recommendation Endpoint." Providing secure access to a few basic building blocks helps the AI retrieve data in combinations you never anticipated, providing massive flexibility without requiring new code for every request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Paths on DigitalOcean
&lt;/h2&gt;

&lt;p&gt;To demonstrate how this architecture functions in practice, we will explore two distinct paths using a shared hypothetical scenario: A customer asking, "What is the current status of my order &lt;code&gt;#5529&lt;/code&gt;?"&lt;/p&gt;

&lt;p&gt;For both examples, we assume you have a &lt;a href="https://www.digitalocean.com/products/managed-databases-mysql" rel="noopener noreferrer"&gt;DigitalOcean Managed MySQL database&lt;/a&gt; with an orders table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Path A: Gradient™ AI Platform Agents (The Declarative Approach)
&lt;/h3&gt;

&lt;p&gt;This path uses &lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-agents/" rel="noopener noreferrer"&gt;DigitalOcean Gradient™ AI Platform Agents&lt;/a&gt; to handle the conversational state and the intelligence of when to route to a function. It is a "declarative" approach because you define your tools via schemas and let the Agent handle the orchestration.&lt;/p&gt;

&lt;p&gt;In this model, your backend acts as a serverless fulfillment worker. When the Agent identifies the user’s intent to query data, it securely triggers a DigitalOcean Function to execute the SQL query.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxchulttqgowqmmmxari.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxchulttqgowqmmmxari.png" alt="Declarative" width="800" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  How to Implement Gradient™ AI Platform Agents
&lt;/h4&gt;

&lt;h5&gt;
  
  
  Step 1: Create the Agent
&lt;/h5&gt;

&lt;p&gt;You can create agents using the DigitalOcean API, CLI, Control Panel, or the Agent Development Kit. When configuring the agent, you give it strict system instructions to govern its behavior. For more details, refer to &lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-agents/" rel="noopener noreferrer"&gt;How to Create Agents on DigitalOcean Gradient™ AI Platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Example Instruction:&lt;/em&gt;&lt;/strong&gt; "You are a database auditor. Use your tools to answer questions about customer metrics securely. Do not guess data if a tool fails."&lt;/p&gt;

&lt;h5&gt;
  
  
  Step 2: Create the DigitalOcean Function
&lt;/h5&gt;

&lt;p&gt;You need to first create a serverless function using DigitalOcean Functions that executes your secure database logic. Refer to &lt;a href="https://docs.digitalocean.com/products/functions/how-to/create-functions/" rel="noopener noreferrer"&gt;How to Create Functions&lt;/a&gt; for more details. Make sure the function meets the requirements described in this section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Note on Function Limits:&lt;/em&gt;&lt;/strong&gt; When designing DO Functions, keep the platform's execution limits in mind. By default, functions have a timeout (e.g., 15 minutes max, but usually much lower for synchronous web requests) and memory limits (configurable from 128 MB - 1 GB, defaulting to 256 MB). Ensure your database query is optimized so it doesn't cause the function to time out. You will also need to bundle dependencies like &lt;code&gt;mysql-connector-python&lt;/code&gt; into your deployment package.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Python DO Function (main.py):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Refer to this guide for &lt;a href="https://docs.digitalocean.com/products/functions/how-to/configure-functions/#environment-variables" rel="noopener noreferrer"&gt;adding environment variables&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mysql.connector&lt;/span&gt;

&lt;span class="c1"&gt;# Credentials injected via DO Functions Environment Variables
# Best Practice: Never hardcode credentials in the function. Use Environment Variables.
&lt;/span&gt;
&lt;span class="n"&gt;DB_HOST&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DB_HOST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB_PORT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DB_PORT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;25060&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Defaults to DO's standard 25060
&lt;/span&gt;&lt;span class="n"&gt;DB_USER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DB_USER&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB_PASS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DB_PASS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DB_NAME&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    The entry point for DigitalOcean Functions.
    The Agent passes input data inside the &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; dictionary.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract the limit parameter passed by the Agent (defaults to 5 if missing)
&lt;/span&gt;    &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# 1. CONNECT TO DO MANAGED MYSQL
&lt;/span&gt;        &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mysql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DB_HOST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DB_PORT&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;# Explicitly cast to integer
&lt;/span&gt;            &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DB_USER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DB_PASS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DB_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ssl_ca&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ca-certificate.crt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Required for DO Managed DBs
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dictionary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Return rows as dictionaries
&lt;/span&gt;
        &lt;span class="c1"&gt;# 2. EXECUTE SECURE SQL
&lt;/span&gt;        &lt;span class="c1"&gt;# Using parameterized queries to prevent SQL injection
&lt;/span&gt;        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT customer_id, name, total_spent FROM customers ORDER BY total_spent DESC LIMIT %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;),))&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# 3. RETURN DATA TO THE AGENT
&lt;/span&gt;        &lt;span class="c1"&gt;# DO Functions must return a dictionary. The 'body' contains the JSON response.
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;mysql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Database error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;statusCode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Internal database error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unexpected error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;statusCode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Internal server error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;pass&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Step 3: Define the Route
&lt;/h5&gt;

&lt;p&gt;In the Agent's routing configuration, add a new function route. This links the Agent's "brain" to the specific DigitalOcean Function you just deployed. You can do this via the DigitalOcean Control Panel by following the steps in this guide: &lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/how-to/route-agent-functions/#add-a-function-route-using-the-control-panel" rel="noopener noreferrer"&gt;Add a Function Route Using the Control Panel&lt;/a&gt;.&lt;/p&gt;

&lt;h5&gt;
  
  
  Step 4: Define the Input and Output Schemas
&lt;/h5&gt;

&lt;p&gt;The schema provides a detailed description of the inputs, outputs, and the logic required for the agent to call and use your database function. The agent uses this to understand when to trigger the route.&lt;/p&gt;

&lt;h5&gt;
  
  
  Input Schema
&lt;/h5&gt;

&lt;p&gt;Specify input schema parameters by following the format of the example in the code block below. You can add as many input schema parameters as you need, but be aware more parameters and longer descriptions will incur more token usage.&lt;/p&gt;

&lt;p&gt;The input schema supports the OpenAPI parameters JSON specification format for defining parameter details.&lt;/p&gt;

&lt;p&gt;Example Input Schema for the Agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"limit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"in"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The number of top customers to return (e.g., 3, 5, or 10)."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a user asks the Agent, "Who are our top 10 customers?", the Agent matches the intent, generates the payload {"parameters": {"limit": 10}}, and triggers the DO Function. The Function securely queries MySQL and returns the raw data, which the Agent then synthesizes into a natural-language report.&lt;/p&gt;

&lt;h5&gt;
  
  
  Output Schema
&lt;/h5&gt;

&lt;p&gt;In the DigitalOcean Gradient™ AI Platform, the Output Schema field requires the specific structure of the data returned by your function. While the platform documentation mentions it is optional, providing this schema is the most effective way to prevent the LLM from &lt;a href="https://www.digitalocean.com/resources/articles/ai-hallucination" rel="noopener noreferrer"&gt;hallucinating&lt;/a&gt; data points that aren't there.&lt;/p&gt;

&lt;p&gt;Here is the simplified JSON structure for the Define output schema section in the Control Panel, followed by the descriptive paragraph for your documentation.&lt;/p&gt;

&lt;p&gt;The Output Schema JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"top_customers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"array"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"An array containing customer records retrieved from the database."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"customer_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The unique identifier for the customer."&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The full name of the customer."&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"total_spent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"number"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The total revenue generated by this customer."&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By providing this output schema, you eliminate hallucinations. When the Agent receives the payload from the DigitalOcean Function, it knows exactly that total_spent is a number and name is a string, allowing it to accurately generate a response like: "Our top customer is Jane Doe, who has spent $4,500."&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Sample Interaction: Path A&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;To understand how this path works in practice, let’s look at a real-world interaction between a business user and the AI Agent.&lt;/p&gt;

&lt;h5&gt;
  
  
  &lt;strong&gt;The Test Database&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;For this scenario, let’s assume our DigitalOcean Managed MySQL database contains a table named customers with the following records:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;customer_id&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;name&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;total_spent&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Stark Industries&lt;/td&gt;
&lt;td&gt;125000.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Acme Corp&lt;/td&gt;
&lt;td&gt;54000.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Initech&lt;/td&gt;
&lt;td&gt;41200.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Globex Corporation&lt;/td&gt;
&lt;td&gt;38500.75&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h5&gt;
  
  
  &lt;strong&gt;The Question&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;A business stakeholder asks the AI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Who are our top 2 customers? I need to know the revenue gap between the `#1` and `#2` spots to calculate our client concentration.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  &lt;strong&gt;The Process (Behind the Scenes)&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;This is where the "Intent-Driven" architecture takes over. The system follows a three-step loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Intent Mapping: The AI analyzes the prompt. It identifies that "top 2" maps to the &lt;code&gt;get_top_customers&lt;/code&gt; tool and intelligently sets the limit parameter to 2.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Secure Execution: Instead of the AI writing SQL, it sends a structured JSON request to your DigitalOcean Function (Path A) or Local Script (Path B). Your code executes the hardcoded query:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;SELECT name, total_spent FROM customers ORDER BY total_spent DESC LIMIT 2&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data Retrieval: The database returns the raw data for Stark Industries and Acme Corp.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h5&gt;
  
  
  &lt;strong&gt;The Answer&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;The AI receives the raw data, performs the subtraction ($125,000.00 - $54,000.50 = $70,999.50$), and synthesizes a natural language response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="s2"&gt;"Our top two customers are **Stark Industries** (&lt;/span&gt;&lt;span class="nv"&gt;$125&lt;/span&gt;&lt;span class="s2"&gt;,000.00) and **Acme Corp** (&lt;/span&gt;&lt;span class="nv"&gt;$54&lt;/span&gt;&lt;span class="s2"&gt;,000.50). The revenue gap between the #1 and #2 spots is currently **&lt;/span&gt;&lt;span class="nv"&gt;$70&lt;/span&gt;&lt;span class="s2"&gt;,999.50**, which you can use to assess your client concentration levels."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Why this matters
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;The "Gap" Logic: You never wrote a SQL query to calculate a "gap." The AI used its own reasoning to perform math on the raw data returned by your tool.&lt;/li&gt;
&lt;li&gt;Zero Risk: If the user had asked to "Delete all customers," the AI would have checked its "Tool Menu," realized no such command exists, and safely refused.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Path B: Serverless Inference (The Code-First Approach)
&lt;/h3&gt;

&lt;p&gt;While &lt;strong&gt;Gradient™ AI Platform Agents&lt;/strong&gt; relies on DigitalOcean Agents to manage the conversational state and trigger your functions, &lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/" rel="noopener noreferrer"&gt;Serverless Inference&lt;/a&gt; is designed for developers who need absolute control over the orchestration. In this model, you use DigitalOcean Serverless Inference as a stateless &lt;em&gt;"intelligence engine"&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;You don't upload your data to the AI; instead, you ask the AI what data it needs, you fetch it locally from your DigitalOcean Managed Database, and then you send only the relevant results back to the AI for a final summary.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2k8wskfrcsu8qjdzg0fx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2k8wskfrcsu8qjdzg0fx.png" alt="Serverless" width="800" height="424"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  How to Implement Path B: Step-by-Step
&lt;/h4&gt;

&lt;h5&gt;
  
  
  Step 1: Secure Your Inference Credentials
&lt;/h5&gt;

&lt;p&gt;Before writing code, you must generate a Model Access Key in the &lt;a href="https://cloud.digitalocean.com/login" rel="noopener noreferrer"&gt;DigitalOcean Control Panel&lt;/a&gt; under the &lt;a href="https://cloud.digitalocean.com/gradient-ai-platform" rel="noopener noreferrer"&gt;Gradient AI Platform&lt;/a&gt; section. Serverless Inference on DO is optimized for high-throughput and low-latency, meaning your application can scale without managing GPU clusters.&lt;/p&gt;

&lt;p&gt;Refer to this guide for &lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#keys" rel="noopener noreferrer"&gt;gathering access keys&lt;/a&gt;.&lt;/p&gt;

&lt;h5&gt;
  
  
  Step 2: Define Your Database "Tools" Locally
&lt;/h5&gt;

&lt;p&gt;In your backend (e.g., &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-django-with-postgres-nginx-and-gunicorn-on-ubuntu" rel="noopener noreferrer"&gt;Django&lt;/a&gt;, &lt;a href="https://www.digitalocean.com/community/tech-talks/getting-started-with-python-fastapi" rel="noopener noreferrer"&gt;FastAPI&lt;/a&gt;, or &lt;a href="https://www.digitalocean.com/community/tutorials/nodejs-express-basics" rel="noopener noreferrer"&gt;Express&lt;/a&gt;), you write standard Python functions. The AI will never see this code, it only sees the "Function Signature" (the name and description) that you provide in the next step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Python Tool:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mysql.connector&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;decimal&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;

&lt;span class="c1"&gt;# Best Practice: Never hardcode credentials in the function. Use Environment Variables.
&lt;/span&gt;&lt;span class="n"&gt;DB_HOST&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DB_HOST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB_PORT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DB_PORT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;25060&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB_USER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DB_USER&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB_PASS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DB_PASS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DB_NAME&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_top_customers_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Secure, hardcoded function to query the MySQL DB locally.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mysql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DB_HOST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DB_PORT&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DB_USER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DB_PASS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DB_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ssl_ca&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ca-certificate.crt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Required for DO Managed DBs
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dictionary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Parameterized query to prevent SQL injection
&lt;/span&gt;        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT customer_id, name, total_spent FROM customers ORDER BY total_spent DESC LIMIT %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;),))&lt;/span&gt;
        &lt;span class="n"&gt;raw_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Clean up Decimal types for JSON serialization
&lt;/span&gt;        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_spent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_spent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_spent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;mysql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Database connection failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Step 3: Define the Tool Schema for the LLM
&lt;/h5&gt;

&lt;p&gt;You must describe your functions to the LLM using the OpenAI-compatible JSON schema. This acts as the "Menu" that you pass to the Serverless Inference endpoint so the model knows what capabilities are available.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="n"&gt;tools_definition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_top_customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retrieves the highest spending customers from the database. Use the limit parameter to specify the count.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The number of top customers to return (e.g., 5).&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Step 4: Implement the Orchestration Loop
&lt;/h5&gt;

&lt;p&gt;The "Loop" is the logic that coordinates the conversation. When you call the &lt;a href="https://www.digitalocean.com/community/tutorials/serverless-inference-gradient" rel="noopener noreferrer"&gt;DigitalOcean Serverless Inference&lt;/a&gt; endpoint, the model will respond with a tool_calls request if it determines it needs database data to answer the user's prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="c1"&gt;# Best Practice: Never hardcode credentials in the function. Use Environment Variables.
&lt;/span&gt;&lt;span class="n"&gt;DO_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DO_INFERENCE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;INFERENCE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DO_SERVERLESS_INFERENCE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://inference.do-ai.run/v1/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the client
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DO_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;INFERENCE_URL&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.3-70b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_secure_conversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="c1"&gt;# 1. INITIAL LLM CALL: Ask the AI how to handle the prompt
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools_definition&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;response_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. CHECK IF A TOOL CALL IS REQUESTED
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;available_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_top_customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;get_top_customers_db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;function_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
            &lt;span class="n"&gt;function_to_call&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;available_tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;function_to_call&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# 3. EXECUTE THE SECURE FUNCTION LOCALLY
&lt;/span&gt;                &lt;span class="n"&gt;function_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;limit_arg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;function_args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="n"&gt;db_response_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;function_to_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;limit_arg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="c1"&gt;# Append the raw data back to the conversation history
&lt;/span&gt;                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;db_response_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# 4. FINAL LLM CALL: Send history + raw data back for synthesis
&lt;/span&gt;        &lt;span class="n"&gt;final_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Sample Interaction: Path B&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To see the power of DigitalOcean Serverless Inference combined with a local dispatcher, let’s look at a real-world trace of the script in action.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Test Database
&lt;/h4&gt;

&lt;p&gt;For this terminal session, our DigitalOcean Managed MySQL database is populated with the following dummy data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;total_spent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Stark Industries&lt;/td&gt;
&lt;td&gt;125000.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Acme Corp&lt;/td&gt;
&lt;td&gt;54000.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Initech&lt;/td&gt;
&lt;td&gt;41200.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Globex Corporation&lt;/td&gt;
&lt;td&gt;38500.75&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  The Question
&lt;/h4&gt;

&lt;p&gt;The user runs the script and asks a conversational question about the data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Who are my top 3 customers and how much have they spent?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The Process (The Terminal Trace)
&lt;/h4&gt;

&lt;p&gt;When the user hits enter, the following "thinking" loop occurs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Intent Recognition:&lt;/strong&gt; The prompt is sent to the &lt;strong&gt;DigitalOcean Serverless Inference&lt;/strong&gt; endpoint. The LLM identifies the intent and returns a "Tool Call" request for &lt;code&gt;get_top_customers&lt;/code&gt; with &lt;code&gt;limit=3&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local Execution:&lt;/strong&gt; Your Python script intercepts this request. Because the database logic is hardcoded in your &lt;code&gt;get_top_customers_db&lt;/code&gt; function, it safely executes the SQL query against your Managed Database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The System Log:&lt;/strong&gt; You will see a status message in your terminal indicating the "Guardrail" has been triggered.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Final Synthesis:&lt;/strong&gt; The raw JSON results are sent back to the LLM, which formats them into a human-readable summary.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  The Terminal Output
&lt;/h4&gt;

&lt;p&gt;This is exactly what you will see in your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;python app.py
Ask your database a question: Who are my top 3 customers and how much have they spent?
&lt;span class="nt"&gt;--&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;System] Executing SQL: Top 3 Customers

Based on the provided data, your top 3 customers are:

1. Stark Industries - &lt;span class="nv"&gt;$125&lt;/span&gt;,000.00
2. Acme Corp - &lt;span class="nv"&gt;$54&lt;/span&gt;,000.50
3. Initech - &lt;span class="nv"&gt;$41&lt;/span&gt;,200.00

These customers have spent the most with your company, with Stark Industries being the largest spender.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the line &lt;code&gt;--&amp;gt; [System] Executing SQL: Top 3 Customers&lt;/code&gt;. This is the moment of maximum security. It proves that the AI did not write the SQL itself; it simply requested to use a tool that you wrote. Your database credentials never left your environment, and the LLM only saw the specific 3 rows it needed to answer the question.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which Path Should You Choose?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Choose Path A (DigitalOcean Gradient™ AI Platform Agents)&lt;/strong&gt;: If you want to get to market quickly, need built-in chat memory, and prefer maintaining schemas over writing orchestration loops. It is perfect for standalone chatbots.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose Path B (Serverless Inference)&lt;/strong&gt;: If you are embedding AI into a complex, pre-existing backend (like a Django or Express app), require highly custom user authentication before executing tools, or want to strictly control the exact prompts and token limits sent to the model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Path B is More Powerful for Production Apps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pre-Execution Validation:&lt;/strong&gt; You can verify a user's session or permissions before your Python script hits the database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Efficiency:&lt;/strong&gt; With Serverless Inference, you only pay for the tokens generated during the "Intent Analysis" and "Summary" phases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Sovereignty:&lt;/strong&gt; Since the "dispatcher" logic lives on your server, your database credentials and &lt;code&gt;ca-certificate.crt&lt;/code&gt; never leave your secure DigitalOcean environment.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Extending the Architecture: Moving Beyond the Baseline
&lt;/h2&gt;

&lt;p&gt;The examples provided above represent the foundational blueprint of an intent-driven data interface. Because you control the application logic, and because the AI acts strictly as a dispatcher, this architecture is inherently modular. You can extend it to serve complex, enterprise-scale requirements without re-engineering the core.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Horizontal Scaling Across Departments
&lt;/h3&gt;

&lt;p&gt;You don’t need a separate AI agent for every team. You can build a single, unified &lt;strong&gt;"Data Gateway"&lt;/strong&gt; that serves multiple departments by simply expanding the tools array.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;For HR:&lt;/strong&gt; Add a &lt;code&gt;get_leave_balance&lt;/code&gt; tool querying an internal employee database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For Logistics:&lt;/strong&gt; Add a &lt;code&gt;lookup_shipping_status&lt;/code&gt; tool querying your tracking tables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For Sales:&lt;/strong&gt; Add a &lt;code&gt;get_quarterly_pipeline&lt;/code&gt; tool that aggregates MySQL CRM data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The LLM is intelligent enough to analyze a user's prompt and route it to the correct department's tool automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Multi-Step Reasoning (Tool Chaining)
&lt;/h3&gt;

&lt;p&gt;Modern models are capable of &lt;em&gt;multi-step reasoning&lt;/em&gt;, meaning the AI can call multiple tools in sequence to answer a single complex question.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;User asks:&lt;/strong&gt; &lt;em&gt;"What is the email of the customer who placed the largest order yesterday?"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step 1:&lt;/strong&gt; The AI calls &lt;code&gt;get_largest_order(date="yesterday")&lt;/code&gt; to retrieve a &lt;code&gt;customer_id&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step 2:&lt;/strong&gt; Your backend returns the ID (e.g., &lt;code&gt;5529&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step 3:&lt;/strong&gt; The AI analyzes that result and automatically triggers a second call: &lt;code&gt;get_customer_details(customer_id="5529")&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesis:&lt;/strong&gt; The AI receives the email and provides the final answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Safe Write Operations
&lt;/h3&gt;

&lt;p&gt;While read-only analytics are the safest starting point, you can use Function Routing to safely execute database writes (&lt;code&gt;UPDATE&lt;/code&gt; or &lt;code&gt;INSERT&lt;/code&gt;). Because the AI only outputs a JSON parameter request, your &lt;strong&gt;&lt;a href="https://www.digitalocean.com/products/functions" rel="noopener noreferrer"&gt;DigitalOcean Function&lt;/a&gt;&lt;/strong&gt; or backend can enforce strict validation (RBAC, input sanitization, and business logic) before any data is changed.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Integrating External APIs
&lt;/h3&gt;

&lt;p&gt;Your tools are not restricted to your &lt;strong&gt;&lt;a href="https://www.digitalocean.com/products/managed-databases/" rel="noopener noreferrer"&gt;DigitalOcean Managed Databases&lt;/a&gt;&lt;/strong&gt;. Your backend dispatcher can route requests to third-party APIs just as easily. You could provide a tool called &lt;code&gt;refund_customer&lt;/code&gt; that, when triggered, tells your backend to hit a payment gateway API (like Stripe) after verifying the order status in MySQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced Capabilities
&lt;/h2&gt;

&lt;p&gt;Because this architecture enforces a strict boundary between the AI's intent parsing and your backend's execution layer, you unlock powerful capabilities that are otherwise too risky to implement with untrusted users.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Beyond Read-Only: Executing Secure Actions
&lt;/h3&gt;

&lt;p&gt;Traditional Text-to-SQL is strictly limited to &lt;code&gt;SELECT&lt;/code&gt; statements because allowing an LLM to generate &lt;code&gt;UPDATE&lt;/code&gt;, &lt;code&gt;INSERT&lt;/code&gt;, or &lt;code&gt;DELETE&lt;/code&gt; commands based on user prompts is catastrophically dangerous. However, with the Guardrail Pattern, executing state changes becomes perfectly safe.&lt;/p&gt;

&lt;p&gt;Because the LLM only outputs structured JSON intent, you can safely expose tools that perform actions—such as &lt;code&gt;process_refund(order_id)&lt;/code&gt; or &lt;code&gt;update_shipping_address(order_id, new_address)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The security is guaranteed by your DigitalOcean backend infrastructure. When the Agent triggers the &lt;code&gt;process_refund&lt;/code&gt; tool route, your backend receives the request and can execute complex validation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does this user own this order?&lt;/li&gt;
&lt;li&gt;Is the order within the 30-day return window?&lt;/li&gt;
&lt;li&gt;Does the user have the correct RBAC (Role-Based Access Control) permissions?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only after your code validates these parameters does it execute the database &lt;code&gt;UPDATE&lt;/code&gt;. The AI never touches the transaction logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Agentic Evolution: The Metadata Flywheel
&lt;/h3&gt;

&lt;p&gt;One of the most profound benefits of this architecture addresses the fundamental bottleneck of software development: &lt;em&gt;knowing what to build next.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In a traditional application, if a user wants to know something your UI doesn't support, they leave frustrated, and you never know why. In an intent-driven interface, what happens when a customer asks a question and the Agent doesn't have the right tool to answer it?&lt;/p&gt;

&lt;p&gt;Instead of these queries falling into a black hole, they become your most valuable data stream.&lt;/p&gt;

&lt;p&gt;You can pipe your Agent's chat logs, specifically the conversations where the Agent replied, &lt;em&gt;"I don't have access to that information"&lt;/em&gt;, into a secondary, internal &lt;strong&gt;Developer Agent&lt;/strong&gt;. This secondary agent analyzes what your customers are trying to do and automatically generates a prioritized backlog for your engineering team.&lt;/p&gt;

&lt;p&gt;It can even go a step further: by analyzing the user's prompt, the Developer Agent can draft the exact schema and the Python starter code for the missing DigitalOcean Function. This creates a "Metadata Flywheel," transforming your engineering pipeline from reactive ticket-taking to proactive, data-driven development based on actual customer intent.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. How do intent-driven data interfaces stay secure on DigitalOcean?
&lt;/h3&gt;

&lt;p&gt;An intent driven interface stays secure when your application never exposes database credentials or schemas to the AI system. The approach in this tutorial keeps all &lt;em&gt;&lt;a href="https://docs.digitalocean.com/products/databases/" rel="noopener noreferrer"&gt;DigitalOcean Managed Databases&lt;/a&gt;&lt;/em&gt; access inside &lt;em&gt;&lt;a href="https://docs.digitalocean.com/products/functions/" rel="noopener noreferrer"&gt;DigitalOcean Functions&lt;/a&gt;&lt;/em&gt; or your backend code, where you enforce role checks, tenant isolation, and parameterized queries before any request reaches the cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Why use DigitalOcean Managed Databases for intent driven data interfaces?
&lt;/h3&gt;

&lt;p&gt;DigitalOcean Managed Databases provide automated backups, high availability, and private networking by default, which reduces operational risk for data facing workloads. When you pair those features with strict function routes or local tools, you get predictable performance and secure query execution for AI driven requests without extra infrastructure work.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. How does Gradient AI Platform support this architecture?
&lt;/h3&gt;

&lt;p&gt;Gradient AI Platform supplies the agents and serverless inference endpoints, which translate natural language into structured tool calls. Agents manage chat history and routing to functions, while serverless inference models handle the reasoning loop when your backend runs the orchestration code and forwards only the minimal data needed for each answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. When should you choose Agents versus Serverless Inference?
&lt;/h3&gt;

&lt;p&gt;Agents fit best when you want a managed conversational layer with built in memory, routing, and configuration through schemas and routes. Serverless Inference fits best when your team needs tighter control over prompts, authentication, logging, and tool orchestration inside an existing framework such as Django or Express.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. How does this pattern help with multi tenant SaaS security?
&lt;/h3&gt;

&lt;p&gt;The logic that checks tenant ownership and access rules lives in your tools and functions, not in the AI layer. Each tool verifies user identity and tenant context before running a query on DigitalOcean Managed Databases, which prevents cross tenant data access even when users share the same agent or model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building natural language interfaces for end-users does not mean you have to sacrifice security, nor does it mean you must lock your data behind rigid, static UI dashboards.&lt;/p&gt;

&lt;p&gt;The naive approach of exposing your database schema to an LLM is a non-starter for customer-facing applications. By adopting an Intent-Driven Architecture using DigitalOcean Managed Databases for highly available, optimized query execution and DigitalOcean Agents &amp;amp; Functions for secure intent processing via Tool Calling, teams can deliver magical, highly flexible experiences.&lt;/p&gt;

&lt;p&gt;You protect your infrastructure, eliminate SQL injection and hallucination risks, and, most importantly, empower your customers to find exactly what they need, exactly when they need it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps with DigitalOcean
&lt;/h2&gt;

&lt;p&gt;To move from architecture to implementation, start with these resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/community/tutorials/serverless-inference-genai" rel="noopener noreferrer"&gt;Serverless Inference with the DigitalOcean Gradient Platform&lt;/a&gt; walks you through setting up model access keys and running your first inference call with Python.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/community/tutorials/bulk-inference-content-pipeline-digitalocean-serverless" rel="noopener noreferrer"&gt;Building a Content Generation Pipeline with DigitalOcean Serverless Inference&lt;/a&gt; shows how to build a bulk processing pipeline on top of the same Serverless Inference endpoint used in this tutorial.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/community/tutorials/build-ai-agents-the-right-way" rel="noopener noreferrer"&gt;A Simple Guide to Building AI Agents Correctly&lt;/a&gt; covers agent architecture, tool design, guardrails, and production deployment patterns.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/community/tutorials/build-ai-agents-conversation-memory" rel="noopener noreferrer"&gt;AI Agents with Memory via DigitalOcean Gradient AI and Memori Labs&lt;/a&gt; demonstrates persistent conversation memory for customer support agents on Gradient AI Platform.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/community/tutorials/effective-context-engineering-ai-agents" rel="noopener noreferrer"&gt;Effective Context Engineering to Build Better AI Agents&lt;/a&gt; explains how to structure system prompts, retrieval, and context compression for reliable agent behavior.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/community/tutorials/data-secure-ai-workflows" rel="noopener noreferrer"&gt;Create and Implement Data Secure AI Workflows&lt;/a&gt; covers model provider selection, data flow security, and testing strategies for production LLM applications.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/community/tutorials/how-to-build-parallel-agentic-workflows-with-python" rel="noopener noreferrer"&gt;How to Build Parallel Agentic Workflows with Python&lt;/a&gt; shows how to run multiple agent tasks concurrently for complex orchestration scenarios.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/community/tutorials/deploy-coreflux-mqtt-broker" rel="noopener noreferrer"&gt;Deploy Coreflux MQTT Broker with Managed Databases&lt;/a&gt; walks through a production style data pipeline on DigitalOcean Managed Databases.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/products/managed-databases" rel="noopener noreferrer"&gt;DigitalOcean Managed Databases&lt;/a&gt; product overview to choose the right engine and cluster size for your workload.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>learning</category>
    </item>
    <item>
      <title>How to Optimize LLM Pipeline Builds with DSPy</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Tue, 21 Apr 2026 19:10:39 +0000</pubDate>
      <link>https://dev.to/digitalocean/how-to-optimize-llm-pipeline-builds-with-dspy-7j1</link>
      <guid>https://dev.to/digitalocean/how-to-optimize-llm-pipeline-builds-with-dspy-7j1</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by Adrian Payong (AI Consultant and Technical Writer) and Shaoni Mukherjee (AI Technical Writer, DigitalOcean)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;DSPy turns LLM development into a programmable workflow by using signatures, modules, metrics, and optimizers instead of relying on manual prompt tweaking alone.
&lt;/li&gt;
&lt;li&gt;It is especially useful for production-style pipelines that combine routing, retrieval, reasoning, tool use, structured output, and evaluation inside one maintainable system.&lt;/li&gt;
&lt;li&gt;Core DSPy modules such as Predict, ChainOfThought, ReAct, and Module let you build practical applications like QA systems, RAG pipelines, multi-step agents, and classifiers.&lt;/li&gt;
&lt;li&gt;DSPy optimizers such as BootstrapFewShot, MIPROv2, and COPRO help improve program quality automatically by tuning instructions and demonstrations against a metric.&lt;/li&gt;
&lt;li&gt;For reliable deployment, DSPy works best when paired with evaluation, grounding checks, typed outputs, constraint enforcement, and stable infrastructure such as DigitalOcean for hosting models, retrieval, and agent pipelines.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a href="https://www.digitalocean.com/resources/articles/large-language-models" rel="noopener noreferrer"&gt;LLM&lt;/a&gt; application development has grown past simple &lt;a href="https://www.digitalocean.com/resources/articles/prompt-engineering-best-practices" rel="noopener noreferrer"&gt;prompt engineering&lt;/a&gt;. As systems become more complex, you need a stronger mental model to structure reasoning, retrieval, tool use, evaluation, and optimization within one maintainable workflow. &lt;a href="https://www.digitalocean.com/community/tutorials/prompting-with-dspy" rel="noopener noreferrer"&gt;DSPy&lt;/a&gt; was designed to help with that. Rather than manually tuning lengthy prompt templates, you define signatures, compose modules, and then optimize the entire program against a metric. This makes LLM development feel less like prompt trial and error and more like building a measurable, improvable software pipeline.&lt;/p&gt;

&lt;p&gt;This article covers practical DSPy use cases you will encounter when building production-quality applications. We dive into how DSPy enables question answering, retrieval-augmented generation, multi-step reasoning agents, text classification, and much more. Along the way, you'll learn about DSPy's approach to metric evaluation, assertion-style constraints, and choosing an optimizer. By the end, you should have a clearer view of how DSPy can help you move from isolated prompts to scalable, structured, production-ready &lt;a href="https://www.digitalocean.com/community/tutorials/end-to-end-rag-pipeline" rel="noopener noreferrer"&gt;LLM pipelines&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is DSPy and why use it for LLM pipelines?
&lt;/h2&gt;

&lt;p&gt;DSPy's design philosophy is to program declarative LM programs (signatures, modules, and control flow), then compile them towards a metric, rather than manually engineering long prompt templates.&lt;/p&gt;

&lt;p&gt;The authors of DSPy reframe this as compiling declarative LM calls into self-improving pipelines, as in the original paper. The compile step searches for better instructions, few-shot demonstrations, (in some modes) fine-tuned weights. Doing DSPy in practice tends to look more like "lightweight ML" than prompt engineering:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define your interface: a DSPy prompt signature (inputs/outputs + types).&lt;/li&gt;
&lt;li&gt;Implement the pipeline logic as modules (DSPy Predict module, DSPy &lt;em&gt;ChainOfThought&lt;/em&gt; module, DSPy &lt;em&gt;ReAct&lt;/em&gt; module, etc) + Python control flow with &lt;em&gt;dspy.Module&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Define a metric function to measure quality (often calling an LLM for metric evaluation, sometimes via a DSPy "judge" program).&lt;/li&gt;
&lt;li&gt;Run an optimizer (previously known as "teleprompters") such as DSPy &lt;em&gt;BootstrapFewShot&lt;/em&gt; optimizer or MIPROv2 optimizer to DSPy to improve your score.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Where DSPy fits versus LangChain and LlamaIndex
&lt;/h3&gt;

&lt;p&gt;DSPy is often compared to orchestration frameworks, such as LangChain, and data-centric &lt;a href="https://www.digitalocean.com/community/tutorials/end-to-end-rag-pipeline" rel="noopener noreferrer"&gt;RAG frameworks&lt;/a&gt;, like LlamaIndex. One helpful way to think about their differences is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/community/tutorials/langchain-language-model" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt; centers around composing chains together, agents, tools, and integrations (extensive tooling for “wiring things together”).&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/resources/articles/what-is-llamaindex" rel="noopener noreferrer"&gt;LlamaIndex&lt;/a&gt; centers around data ingestion, building indexes, and querying LLM over your data (it's built around RAG-style retrievers + query engines).&lt;/li&gt;
&lt;li&gt;DSPy emphasizes programmatic optimization of the LM behavior within your stack: signatures, modules, metrics, and optimizers that can automatically improve your prompts/demos throughout the system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many real-world production stacks combine these approaches: use LlamaIndex (or another retriever) to power ingestion and retrieval, then utilize DSPy to wrap the generation and routing logic to optimize prompts and typed outputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  DSPy core building blocks you will use in this tutorial
&lt;/h3&gt;

&lt;p&gt;Signatures describe what the model should do: input fields, output fields, and their semantic names. Optionally specify types and instructions. Field names are important because they indicate the role (“question” vs “answer”, “context” vs “summary”, etc).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Modules&lt;/strong&gt; define how to solve it. Key ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dspy.ai/api/modules/Predict/" rel="noopener noreferrer"&gt;dspy.Predict&lt;/a&gt;: The basic building block that maps inputs → outputs using an LM. Configured by a signature.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dspy.ai/api/modules/ChainOfThought/" rel="noopener noreferrer"&gt;dspy.ChainOfThought&lt;/a&gt;: A predictor that reasons step-by-step. Outputs are the same as your signature, but with an additional “reasoning” field prepended.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dspy.ai/api/modules/ReAct/" rel="noopener noreferrer"&gt;dspy.ReAct&lt;/a&gt;: An iterative “Reasoning and Acting” tool-using agent loop where the model chooses tools and produces final outputs.&lt;/li&gt;
&lt;li&gt;dspy.Module: the base class for multi-step programs where you implement forward() and compose submodules.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Adapters determine how “structured” your LM I/O is. &lt;em&gt;ChatAdapter&lt;/em&gt; is DSPy’s default field-marker format. &lt;em&gt;JSONAdapter&lt;/em&gt; forces models that support structured output formatting to emit JSON so that you can reliably parse typed outputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unified end-to-end pipeline example
&lt;/h3&gt;

&lt;p&gt;This code implements a small but realistic “router” program which brings together Predict, RAG + ChainOfThought, and ReAct end-to-end flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pip install -U dspy  (or: pip install -U dspy-ai)
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;
&lt;span class="c1"&gt;# 1) Configure the language model once near the top of your app.
&lt;/span&gt;&lt;span class="n"&gt;lm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# reads OPENAI_API_KEY from env
&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adapter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;JSONAdapter&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="c1"&gt;# 2) A small intent classifier (Predict) to route requests.
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Signature&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Route the user request to the best handler.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InputField&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rag_qa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;direct_qa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OutputField&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Route&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# 3) A RAG-style answerer (we'll implement it fully later).
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RagAnswer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Signature&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Answer using only the provided context passages.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InputField&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InputField&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OutputField&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OutputField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;indices of context passages used&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;rag_answerer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ChainOfThought&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RagAnswer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# 4) A ReAct agent with tools (we'll implement tools later).
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ReAct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question -&amp;gt; answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;max_iters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# 5) Tie it together as a program.
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;UnifiedAssistant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_passages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;route&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;intent&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rag_qa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retrieved_passages&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;rag_answerer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# default: direct QA, still using a CoT-style module for robustness
&lt;/span&gt;        &lt;span class="n"&gt;direct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ChainOfThought&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question -&amp;gt; answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;direct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;assistant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;UnifiedAssistant&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above script builds a lightweight DSPy assistant capable of serving multiple types of user queries within a single workflow. After setting up an LLM and JSON adapter, it creates a &lt;em&gt;Predict&lt;/em&gt; router that classifies which of three intents a new query belongs to: RAG-based question answering, tool-based agent reasoning, or direct question answering. Queries that require external knowledge are routed to a &lt;em&gt;ChainOfThought&lt;/em&gt; RAG module that answers the question given retrieved passages, and returns citations. Queries that require tool usage are routed to a &lt;em&gt;ReAct&lt;/em&gt; agent coupled with an &lt;em&gt;add&lt;/em&gt; tool; all other queries fall back to a direct ChainOfThought answer module. This program demonstrates how DSPy can orchestrate routing, retrieval, reasoning, and tool use within a single modular assistant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Case 1: Question answering with ChainOfThought
&lt;/h2&gt;

&lt;p&gt;By default, the DSPy &lt;em&gt;ChainOfThought&lt;/em&gt; module is designed towards problems where providing intermediate reasoning improves correctness. Let’s consider the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dspy.evaluate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Evaluate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dspy.evaluate.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;answer_exact_match&lt;/span&gt;
&lt;span class="c1"&gt;# Configure once per process.
# (OPENAI_API_KEY must be set in your environment.)
&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# A minimal CoT QA module.
&lt;/span&gt;&lt;span class="n"&gt;qa_cot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ChainOfThought&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question -&amp;gt; answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# A tiny devset (start small, then grow).
&lt;/span&gt;&lt;span class="n"&gt;devset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Example&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the capital of France?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Paris&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;with_inputs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Example&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is 2+2?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;with_inputs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# Metric: exact match on the final answer field.
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;em_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;answer_exact_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;evaluator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;devset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;devset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;display_progress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;baseline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qa_cot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;em_metric&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Baseline score:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This program set up a small DSPy question-answering evaluation pipeline. It initializes DSPy with the &lt;em&gt;openai/gpt-4o-mini&lt;/em&gt; model, then defines a simple &lt;em&gt;ChainOfThought&lt;/em&gt; module which accepts a question and generates an answer. The program defines a small development dataset consisting of two example question answering pairs and builds an exact-match metric for evaluating the predicted answer against the expected ones. It then launches DSPy's &lt;em&gt;Evaluate&lt;/em&gt; utility to apply that module to each question in the dataset in parallel. It computes and outputs the baseline score, indicating how accurately the unoptimized Chain-of-Thought question answering module answered those sample questions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Improving question answering with BootstrapFewShot
&lt;/h3&gt;

&lt;p&gt;If you only have a few examples, &lt;em&gt;BootstrapFewShot&lt;/em&gt; is a good starting point. This optimizer composes demos from labeled examples + bootstrapped demos created by a teacher, filtering to only keep demos that pass your metric.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dspy.teleprompt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BootstrapFewShot&lt;/span&gt;
&lt;span class="c1"&gt;# A very small trainset is acceptable (DSPy is designed to start small).
&lt;/span&gt;&lt;span class="n"&gt;trainset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;devset&lt;/span&gt;
&lt;span class="n"&gt;teleprompter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BootstrapFewShot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;em_metric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_bootstrapped_demos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_labeled_demos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;qa_optimized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;teleprompter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;student&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;qa_cot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trainset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;trainset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;optimized_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qa_optimized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;em_metric&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Optimized score:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimized_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we improved the original &lt;em&gt;qa_cot&lt;/em&gt; question-answering module with DSPy's &lt;em&gt;BootstrapFewShot&lt;/em&gt; optimizer. We use the small &lt;em&gt;trainset&lt;/em&gt; as learning examples for better few-shot demonstrations. Then we compiled an optimized version of the model using up to 2 bootstrapped demos + 2 labeled demos. Finally, we run an evaluation on the new model with the same exact-match metric and print out the optimized score to show whether the performance improved over the baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Case 2: Retrieval-augmented generation (RAG) pipeline
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.digitalocean.com/resources/articles/rag" rel="noopener noreferrer"&gt;Retrieval-augmented generation (RAG)&lt;/a&gt; solves a major pain point. Without RAG, LLMs can’t access your private or continuously changing knowledge unless you directly supply it at inference time. A typical end-to-end RAG pipeline consists of ingestion/chunking, embeddings, storage + retrieval, and final generation grounded on retrieved documents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step-by-step RAG with typed outputs and structured JSON
&lt;/h3&gt;

&lt;p&gt;In the following program, we define a typed signature (lists and ints), use JSONAdapter, and return citations as indices into retrieved passages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;

&lt;span class="c1"&gt;# Configure LM with JSONAdapter so lists (like citations)
# are parsed reliably from model output.
&lt;/span&gt;&lt;span class="n"&gt;lm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# reads OPENAI_API_KEY from env
&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adapter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;JSONAdapter&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Minimal local corpus for demo; replace with your documents or a vector DB.
&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Linux divides memory into regions; on 32-bit systems highmem is not permanently mapped.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Low memory is directly addressable by the kernel; high memory is mapped on demand.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unrelated passage about iPhone apps.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Embedder for dense retrieval.
&lt;/span&gt;&lt;span class="n"&gt;embedder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Embedder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/text-embedding-3-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dimensions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrievers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embedder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RagAnswer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Signature&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Answer using only the provided context passages.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InputField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieved passages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InputField&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OutputField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final answer grounded in context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OutputField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;indices of context passages used&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;respond&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ChainOfThought&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RagAnswer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Retrieve top‑k passages.
&lt;/span&gt;        &lt;span class="n"&gt;retrieved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;passages&lt;/span&gt;

        &lt;span class="c1"&gt;# Generate answer and citations.
&lt;/span&gt;        &lt;span class="n"&gt;pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Lightweight validation of citations indices.
&lt;/span&gt;        &lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

        &lt;span class="c1"&gt;# Return a structured prediction.
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Prediction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Instantiate the RAG module.
&lt;/span&gt;&lt;span class="n"&gt;rag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RAG&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Run a demo question.
&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;rag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are high memory and low memory in Linux?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Citations (indices into context):&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we retrieve information from a small knowledge base in order to answer a question. The language model is configured with &lt;em&gt;JSONAdapter&lt;/em&gt; to properly parse structured output (citation lists). An embedding-based retriever is created to find the most relevant passages from the corpus. Typed &lt;em&gt;Signature&lt;/em&gt; defines a structured RAG task with fields for &lt;em&gt;context&lt;/em&gt;, &lt;em&gt;question&lt;/em&gt;, &lt;em&gt;answer&lt;/em&gt;, and &lt;em&gt;citations&lt;/em&gt;. The &lt;em&gt;RAG&lt;/em&gt; module follows &lt;em&gt;ChainOfThought&lt;/em&gt; to produce a grounded answer from the retrieved passages. Lastly, the citation indices are checked for validity before returning structured prediction, and a demo query is run about Linux memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add a RAG metric that checks both correctness and grounding
&lt;/h3&gt;

&lt;p&gt;Here's a small example of a composite metric. It checks if the label matches and whether the predicted answer was found in the retrieved context. It returns a float for evaluation and a boolean for bootstrapping.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dspy.evaluate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Evaluate&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;grounded_answer_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Case‑insensitive exact or near‑exact match on answer.
&lt;/span&gt;    &lt;span class="n"&gt;answer_match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# Answer should appear in at least one retrieved passage.
&lt;/span&gt;    &lt;span class="n"&gt;context_match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# For evaluation: soft score between 0 and 1.
&lt;/span&gt;        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer_match&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;context_match&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;
    &lt;span class="c1"&gt;# For bootstrapping / optimization: require both.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer_match&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;context_match&lt;/span&gt;

&lt;span class="n"&gt;devset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Example&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is low memory in Linux?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;directly addressable by the kernel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;with_inputs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;evaluator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;devset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;devset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;display_progress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;grounded_answer_metric&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code computes a custom metric to score how well a DSPy RAG pipeline is answering a question with grounded answers. &lt;em&gt;grounded_answer_metric&lt;/em&gt; checks two things: 1) whether the predicted matches the expected answer, and 2) whether that answer can be grounded in the retrieved context passages. Then, &lt;em&gt;Evaluate&lt;/em&gt; runs that metric on a small development set to validate whether your RAG pipeline returns grounded, correct answers before using it for optimization or production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimize the RAG program with MIPROv2
&lt;/h3&gt;

&lt;p&gt;Here we use &lt;a href="https://dspy.ai/api/optimizers/MIPROv2/" rel="noopener noreferrer"&gt;DSPy’s MIPROv2&lt;/a&gt; optimizer to improve the original RAG program against your custom grounding metric, then recompile the module with a small demo set and evaluate whether the optimized version performs better.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dspy.teleprompt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MIPROv2&lt;/span&gt;
&lt;span class="c1"&gt;# Set up MIPROv2 optimizer with your custom metric.
&lt;/span&gt;&lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MIPROv2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;grounded_answer_metric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;light&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;# or "medium" / "heavy"
&lt;/span&gt;    &lt;span class="n"&gt;num_threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Compile the original RAG module using the dev/train set.
&lt;/span&gt;&lt;span class="n"&gt;rag_optimized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;rag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;trainset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;devset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_bootstrapped_demos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_labeled_demos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Re‑evaluate the optimized RAG module.
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Evaluation after MIPROv2 optimization:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rag_optimized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;grounded_answer_metric&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Use Case 3: Multi-Step reasoning agent with ReAct
&lt;/h2&gt;

&lt;p&gt;When you have tasks that require tool use (whether that's doing calculations, calling internal APIs, fetching knowledge, or taking actions), DSPy provides &lt;em&gt;dspy.ReAct&lt;/em&gt;, which implements the ReAct ("Reasoning and Acting") paradigm: the model reasons, chooses which tool to call, observes the results, and repeats until it can output final answers. ReAct can be generalized to function over any signature. It can accept either functions or &lt;em&gt;dspy.Tool&lt;/em&gt; objects as tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  A minimal ReAct agent with typed tools
&lt;/h3&gt;

&lt;p&gt;The script below implements a small DSPy &lt;em&gt;ReAct&lt;/em&gt; agent that answers questions by utilizing tools as needed. It sets up an LLM, defines two tools - one that returns the current UTC time and another that multiplies numbers - and passes those tools to &lt;em&gt;dspy.ReAct&lt;/em&gt;. The agent will reason if it should use a tool, call it if needed, and then return the final answer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;
&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;adapter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;JSONAdapter&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;utc_now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;

&lt;span class="c1"&gt;# Create a ReAct agent that can use utc_now and multiply.
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ReAct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question -&amp;gt; answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;utc_now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_iters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Example queries.
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What time is it in UTC right now?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is 19.5 * 4.2?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Production concern: agent reliability, costs, and guardrails
&lt;/h3&gt;

&lt;p&gt;Agent loops can silently accumulate high costs (repeated LLM calls, repeated tool calls) or hallucinate invalid actions without guardrails and observability. A reasonable set of guardrails includes cap iterations (max_iters), tightening tool schemas and permissions, and validating on real traffic-like prompts before rollout.&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimize a ReAct agent with DSPy optimizers
&lt;/h3&gt;

&lt;p&gt;DSPy optimizers can optimize entire programs, including end-to-end complex multi-module systems (such as agents, retrieval, and extraction), as long as you specify a metric to improve. For many teams, a pattern that works well is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bootstrap a few demos with &lt;em&gt;BootstrapFewShot&lt;/em&gt;(cheap);&lt;/li&gt;
&lt;li&gt;Then, run MIPROv2 in auto="light" or auto="medium" depending on budget.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use Case 4: Text classification with LLM metric evaluation
&lt;/h2&gt;

&lt;p&gt;Classification is an ideal DSPy use case because while success metrics (accuracy, F1) are straightforward, you can still take advantage of DSPy’s programmatic structure, typed outputs, and optimizers.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Build a typed classifier with Predict&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Here’s code that builds a simple DSPy text classifier for support tickets. It sets up the model, declares a signature with one input (ti*cket*) and one constrained output (&lt;em&gt;label&lt;/em&gt;), then calls &lt;em&gt;dspy.Predict&lt;/em&gt; to classify the ticket as one of four types: &lt;em&gt;billing&lt;/em&gt;, &lt;em&gt;bug&lt;/em&gt;, &lt;em&gt;feature&lt;/em&gt;, or &lt;em&gt;security&lt;/em&gt;. In this example, the “I was charged twice” complaint is correctly classified as &lt;em&gt;billing.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;
&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;adapter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;JSONAdapter&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TicketLabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Signature&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Classify a support ticket into a fixed taxonomy.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InputField&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;billing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OutputField&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TicketLabel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;example&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I was charged twice for my subscription this month.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Evaluate with a metric (and optionally build an LLM-judge metric)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Metrics are ordinary &lt;a href="https://www.digitalocean.com/community/tutorials/python-tutorial" rel="noopener noreferrer"&gt;Python&lt;/a&gt; functions. They should follow the signature(example, pred, trace=None); for complex outputs, metrics can use AI feedback via additional predictor calls.&lt;/p&gt;

&lt;p&gt;The code below uses DSPy’s &lt;em&gt;Evaluate&lt;/em&gt; utility to test a classifier, clf, on a small labeled dataset of support tickets. The &lt;em&gt;trainset&lt;/em&gt; has three examples. For each example, the text of a ticket is labeled with the correct category (&lt;em&gt;billing&lt;/em&gt;, &lt;em&gt;bug, or&lt;/em&gt; &lt;em&gt;feature&lt;/em&gt;). Passing &lt;em&gt;.with_inputs("ticket")&lt;/em&gt; specifies to DSPy that the model should only receive the ticket text as input. The &lt;em&gt;accuracy_metric&lt;/em&gt; function checks if the classifier's predicted label matches the true label. It returns 1.0 if the prediction is correct and 0.0 otherwise. &lt;em&gt;Evaluate&lt;/em&gt; runs &lt;em&gt;clf&lt;/em&gt; on the dataset with 2 threads, displays progress while running, and &lt;em&gt;print(evaluator(clf, metric=accuracy_metric))&lt;/em&gt; prints the final result, which is usually the accuracy of the model on those examples.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dspy.evaluate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Evaluate&lt;/span&gt;
&lt;span class="n"&gt;trainset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Example&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I was charged twice.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;billing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;with_inputs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Example&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The app crashes on launch.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;with_inputs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Example&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please add export to CSV.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;with_inputs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;accuracy_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;evaluator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;devset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;trainset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_threads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;display_progress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;accuracy_metric&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Assertion testing and constraint enforcement in modern DSPy
&lt;/h3&gt;

&lt;p&gt;In production, people often ask for “verification” operations: ("assertion testing"; the label must be one of X; JSON must parse; citations must be in range).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dspy.ai/api/modules/Refine/" rel="noopener noreferrer"&gt;&lt;em&gt;dspy.Refine&lt;/em&gt;&lt;/a&gt; was purpose-built to be a best-of-N refinement loop with &lt;em&gt;reward_fn&lt;/em&gt; and threshold. It repeatedly calls the module N times and returns the best prediction, generating feedback between attempts if necessary. Here's a real-world “constraint enforcement” wrapper: retry until output taxonomy is respected. Let’s consider the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Set&lt;/span&gt;
&lt;span class="n"&gt;allowed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;billing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;label_is_valid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;allowed&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;span class="n"&gt;robust_clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Refine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reward_fn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;label_is_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;robust_clf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please add SSO support.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code wraps the original classifier with &lt;em&gt;dspy.Refine&lt;/em&gt;, which allows DSPy to retry up to 3 times and retain only outputs that passed &lt;em&gt;reward_fn&lt;/em&gt;. The reward function ensures the predicted label is one of our allowed categories, and the &lt;em&gt;threshold=1.0&lt;/em&gt; means only a fully valid label will be accepted before returning the result.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing the Right DSPy Optimizer
&lt;/h2&gt;

&lt;p&gt;DSPy now refers to these algorithms as optimizers (previously teleprompters). According to the optimizer documentation, an optimizer is an algorithm that tunes a DSPy program’s parameters (prompts and/or LM weights) to maximize your metrics using your program, metric, and training inputs. The training inputs are often a small set of examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practical decision criteria
&lt;/h3&gt;

&lt;p&gt;This table lists the 3 optimizers your brief prioritizes—&lt;a href="https://dspy.ai/api/optimizers/BootstrapFewShot/" rel="noopener noreferrer"&gt;&lt;strong&gt;BootstrapFewShot&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;, &lt;a href="https://dspy.ai/api/optimizers/MIPROv2/" rel="noopener noreferrer"&gt;MIPROv2&lt;/a&gt;, and COPRO&lt;/strong&gt;—as well as &lt;em&gt;BootstrapFewShotWithRandomSearch&lt;/em&gt;, which DSPy recommends after you have more data.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimizer&lt;/th&gt;
&lt;th&gt;What it does and when to use it&lt;/th&gt;
&lt;th&gt;Data guidance and key config knobs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BootstrapFewShot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tunes few-shot demos assembled from labeled and bootstrapped examples validated by the metric. It works well for fast wins on small datasets and is a strong first compile option.&lt;/td&gt;
&lt;td&gt;Start here when you have around 10 examples. &lt;strong&gt;Knobs:&lt;/strong&gt; &lt;code&gt;max_labeled_demos&lt;/code&gt;, &lt;code&gt;max_bootstrapped_demos&lt;/code&gt;, &lt;code&gt;teacher_settings&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BootstrapFewShotWithRandomSearch&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tunes few-shot demos like BootstrapFewShot, but tests multiple candidate demo sets and keeps the best one. It is better for a more robust few-shot selection while staying relatively simple.&lt;/td&gt;
&lt;td&gt;Best when you have around 50 or more examples. &lt;strong&gt;Knobs:&lt;/strong&gt; &lt;code&gt;num_candidate_programs&lt;/code&gt;, plus the BootstrapFewShot knobs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;COPRO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tunes prompt instructions through iterative search, documented as coordinate ascent in the optimizer guide. It is useful when you want instruction tuning without focusing heavily on demos.&lt;/td&gt;
&lt;td&gt;Usually needs a train set and a metric. &lt;strong&gt;Knobs:&lt;/strong&gt; &lt;code&gt;breadth&lt;/code&gt;, &lt;code&gt;depth&lt;/code&gt;, &lt;code&gt;init_temperature&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MIPROv2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Jointly tunes instructions and few-shot examples using &lt;a href="https://en.wikipedia.org/wiki/Bayesian_optimization" rel="noopener noreferrer"&gt;Bayesian optimization&lt;/a&gt;. It is the strongest choice when you want higher-quality prompt optimization and have enough budget and data.&lt;/td&gt;
&lt;td&gt;Best for longer runs, such as 40 or more trials, with around 200 or more examples to reduce overfitting risk. &lt;strong&gt;Knobs:&lt;/strong&gt; &lt;code&gt;auto&lt;/code&gt; (“light/medium”), &lt;code&gt;num_threads&lt;/code&gt;, plus demo knobs in &lt;code&gt;compile()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Running DSPy on DigitalOcean
&lt;/h2&gt;

&lt;p&gt;Deployment should provide you with two things: (1) infrastructure to run your DSPy program (stable runtime) and (2) access to LLMs you can reliably call to run retrieval and add guardrails.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment patterns that map well to DSPy pipelines
&lt;/h3&gt;

&lt;p&gt;Deploy your DSPy service to a Virtual Machine (VM) or GPU instance if you want full control of everything in your stack (vector DB, embeddings, model runtime). &lt;a href="https://www.digitalocean.com/community/tutorials/build-rag-application-using-gpu-droplets" rel="noopener noreferrer"&gt;Building a RAG application on GPU Droplets&lt;/a&gt; is covered in step-by-step detail with DigitalOcean’s RAG tutorials.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use a fully managed model access for simpler operations&lt;/strong&gt;. The DigitalOcean Gradient platform describes serverless inference (no infrastructure management) and API access to models hosted by major vendors (OpenAI, Anthropic, etc) as well as managed scalability and security features for open-source models hosted directly in-platform.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build agentic apps with managed agent features&lt;/strong&gt;. &lt;a href="https://www.digitalocean.com/products/gradient/platform" rel="noopener noreferrer"&gt;DigitalOcean’s Gradient AI Platform&lt;/a&gt; quickstart describes fully managed agents with knowledge bases for retrieval-augmented generation, multi-agent routing, and guardrails.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;DSPy represents a meaningful shift in how modern LLM systems are built. Instead of viewing prompts as static strings, DSPy treats them as components of a larger program composed of signatures, modules, metrics, and control flow. This approach really shines when you graduate from simple completions to authoring tangible application patterns such as ChainOfThought QA, RAG with structured outputs, ReAct-based tool use, and classification pipelines with integrated quality checks.&lt;/p&gt;

&lt;p&gt;The larger point here is that DSPy isn’t simply a playground for prompt engineering. DSPy is a practical foundation for building, validating, iterating, and scaling your LLM systems with more rigor. As engineering teams require better guarantees around reliability, observability, and control over agentic behavior, DSPy will be ready to take on a larger role in production AI stacks. The future will belong to those engineers who build LLM workflows that are modular, testable, and optimization-driven from the start.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dspy.ai/learn/programming/signatures/" rel="noopener noreferrer"&gt;Why should I use a DSPy Signature?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2310.03714" rel="noopener noreferrer"&gt;DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dspy.ai/tutorials/rag/" rel="noopener noreferrer"&gt;Tutorial: Retrieval-Augmented Generation (RAG)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dspy.ai/api/modules/Refine/?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;dspy.Refine&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/prompting-with-dspy" rel="noopener noreferrer"&gt;Prompting with DSPy: A New Approach&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>tutorial</category>
      <category>dspy</category>
      <category>ai</category>
    </item>
    <item>
      <title>Tutorial: Build an AI-Powered GPU Fleet Optimizer</title>
      <dc:creator>DigitalOcean</dc:creator>
      <pubDate>Fri, 17 Apr 2026 19:00:00 +0000</pubDate>
      <link>https://dev.to/digitalocean/tutorial-build-an-ai-powered-gpu-fleet-optimizer-8bl</link>
      <guid>https://dev.to/digitalocean/tutorial-build-an-ai-powered-gpu-fleet-optimizer-8bl</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally written by Shamim Raashid (Senior Solutions Architect) and Anish Singh Walla (Senior Technical Content Strategist and Team Lead)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deploy a serverless LangGraph agent&lt;/strong&gt; on the DigitalOcean Gradient AI Platform that monitors your GPU fleet using natural language queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scrape real-time NVIDIA DCGM metrics&lt;/strong&gt; (temperature, power, VRAM, engine utilization) from GPU Droplets over Prometheus-style endpoints on port 9400.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detect idle and underutilized GPUs automatically&lt;/strong&gt; by defining configurable threshold dictionaries that compare live metrics against your baseline workload patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customize the blueprint to your needs:&lt;/strong&gt; Change target Droplet types, adjust idle detection thresholds, enrich the data payload with additional metrics, and add actionable tools like automated power-off commands.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduce GPU cloud costs&lt;/strong&gt; by replacing reactive dashboard monitoring with a proactive AI agent that identifies waste the moment it starts.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Managing a GPU fleet in the cloud is a constant balancing act between performance and cost. A single idle GPU Droplet left running overnight can add hundreds of dollars to your monthly bill. Traditional monitoring dashboards surface raw metrics, but they still require a human to interpret whether a machine is “working” or “wasting money.”&lt;/p&gt;

&lt;p&gt;This tutorial walks you through building an AI-powered GPU fleet optimizer using the DigitalOcean Gradient AI Platform and the Agent Development Kit (ADK). You will deploy a serverless, natural-language AI agent that audits your GPU infrastructure in real time, scrapes NVIDIA DCGM (Data Center GPU Manager) metrics like temperature, power draw, VRAM usage, and engine utilization, and flags idle resources before they inflate your cloud bill.&lt;/p&gt;

&lt;p&gt;This blueprint is designed to be forked and customized. By the end of this guide, you will know how to tune the agent's personality and efficiency thresholds, add new monitoring tools, and deploy the agent as a production-ready serverless endpoint.&lt;/p&gt;

&lt;h4&gt;
  
  
  Reference repository
&lt;/h4&gt;

&lt;p&gt;You can view the complete blueprint code here: &lt;a href="https://github.com/dosraashid/do-adk-gpu-monitor" rel="noopener noreferrer"&gt;dosraashid/do-adk-gpu-monitor&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DigitalOcean Account:&lt;/strong&gt; With at least one active GPU Droplet running.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DigitalOcean API Token:&lt;/strong&gt; A Personal Access Token with read permissions and GenAI scopes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradient Model Access Key:&lt;/strong&gt; Generated from the Gradient AI Dashboard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.12:&lt;/strong&gt; Recommended for the latest LangGraph and asyncio features.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Familiarity with Python, REST APIs, and Linux command-line basics.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The challenge: “Invisible” cloud waste
&lt;/h2&gt;

&lt;p&gt;When scaling AI workloads, engineering teams often spin up expensive, specialized GPU Droplets (like NVIDIA H100s or H200s) for training or inference tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem: Hidden costs and wasted resources
&lt;/h3&gt;

&lt;p&gt;Once a training script finishes or a model endpoint stops receiving traffic, the Droplet itself remains online and billing by the hour. This creates two compounding issues:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Generic monitoring falls short:&lt;/strong&gt; Standard cloud dashboards typically show host-level metrics like CPU and RAM. A machine learning node might report 1% CPU utilization, but those monitors do not reveal whether the GPU's VRAM is empty or whether the compute engine is completely idle.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Dashboard fatigue:&lt;/strong&gt; Even if you install specialized tools like Grafana to track NVIDIA DCGM metrics, an engineer still has to remember to log in, interpret the charts, and manually map the IP address of an idle node back to a specific cloud resource to shut it down.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbiwytf0raeao1je60mni.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbiwytf0raeao1je60mni.png" alt="A a weary developer looking at a screen while money flies out of the data center server" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: A proactive AI fleet analyst
&lt;/h3&gt;

&lt;p&gt;Instead of waiting for an engineer to check a dashboard, you can build an AI agent that acts as an autonomous infrastructure analyst. &lt;/p&gt;

&lt;p&gt;Using the DigitalOcean Gradient ADK, you will deploy a Large Language Model (LLM) equipped with custom Python tools. When you ask the agent a question like, “Are any of my GPUs wasting money right now?”, it executes a multi-step reasoning loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Discovery:&lt;/strong&gt; Calls the DigitalOcean API to get a live inventory of your Droplets.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Interrogation:&lt;/strong&gt; Pings the NVIDIA DCGM exporter on each node's public IP to read VRAM, temperature, and engine load.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Analysis:&lt;/strong&gt; Runs those raw metrics against a threshold dictionary you define (e.g., “If VRAM usage is below 5% and engine utilization is below 2%, mark this GPU as IDLE”).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Actionable Output:&lt;/strong&gt; Replies in plain English, naming the specific node, its current hourly cost, and the exact metrics proving it is idle.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuiy0rs2lojv908252rar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuiy0rs2lojv908252rar.png" alt="Stressed developer on the left, image of a chatbot providing the solution on the right" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding NVIDIA DCGM metrics for GPU monitoring
&lt;/h2&gt;

&lt;p&gt;NVIDIA Data Center GPU Manager (DCGM) exposes hardware telemetry through a Prometheus-compatible exporter that runs on port 9400. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;What It Measures&lt;/th&gt;
&lt;th&gt;Why It Matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DCGM_FI_DEV_GPU_TEMP&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GPU die temperature in Celsius&lt;/td&gt;
&lt;td&gt;High temperatures indicate active computation.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DCGM_FI_DEV_POWER_USAGE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Current power draw in watts&lt;/td&gt;
&lt;td&gt;Idle GPUs draw significantly less power than busy ones.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DCGM_FI_DEV_FB_USED&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Framebuffer (VRAM) memory in use&lt;/td&gt;
&lt;td&gt;Empty VRAM means no models are loaded.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DCGM_FI_DEV_GPU_UTIL&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GPU engine utilization percentage&lt;/td&gt;
&lt;td&gt;The most direct indicator of compute work.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can query these metrics directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://&amp;lt;DROPLET_PUBLIC_IP&amp;gt;:9400/metrics | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"DCGM_FI_DEV_GPU_TEMP|DCGM_FI_DEV_POWER_USAGE|DCGM_FI_DEV_FB_USED|DCGM_FI_DEV_GPU_UTIL"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://www.digitalocean.com/resources/articles/ai-agents" rel="noopener noreferrer"&gt;AI agent&lt;/a&gt; in this blueprint automates this scraping across your entire fleet, parses the Prometheus text format, and feeds the structured data into the LLM for analysis. If DCGM is not available on a particular node (for example, because the exporter is not installed or port &lt;code&gt;9400&lt;/code&gt; is blocked by a firewall), the agent falls back to standard CPU and RAM metrics and reports “DCGM Missing” for that node.&lt;/p&gt;

&lt;p&gt;For production deployments, consider pairing DCGM data collection with a full &lt;a href="https://dev.toPrometheus%20and%20Grafana%20monitoring%20stack"&gt;Prometheus and Grafana monitoring stack&lt;/a&gt; for historical trend analysis alongside the AI agent’s real-time assessments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Clone the blueprint and set up your environment
&lt;/h2&gt;

&lt;p&gt;Start with the foundational repository rather than writing everything from scratch.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clone the repo and set up your &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-programming-environment-on-an-ubuntu-22-04-server" rel="noopener noreferrer"&gt;Python environment&lt;/a&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/dosraashid/do-adk-gpu-monitor
&lt;span class="nb"&gt;cd &lt;/span&gt;&lt;span class="k"&gt;do&lt;/span&gt;&lt;span class="nt"&gt;-adk-gpu-monitor&lt;/span&gt;
python3.12 &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Configure your secrets by creating a .env file in the root directory:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;DIGITALOCEAN_API_TOKEN&lt;/span&gt;=&lt;span class="s2"&gt;"your_do_token"&lt;/span&gt;
&lt;span class="n"&gt;GRADIENT_MODEL_ACCESS_KEY&lt;/span&gt;=&lt;span class="s2"&gt;"your_gradient_key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Security note: Never commit &lt;code&gt;.env&lt;/code&gt; files to version control. The repository’s &lt;code&gt;.gitignore&lt;/code&gt; already excludes this file.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Step 2: How it works (the architecture)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fds5p9hftjariheuwthdg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fds5p9hftjariheuwthdg.png" alt="AI Agent LangGraph architecture diagram" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before you customize the blueprint, it helps to understand the data flow inside the code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;User Prompt&lt;/strong&gt;: You ask the agent a question via the &lt;code&gt;/run&lt;/code&gt; endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph State&lt;/strong&gt;: The agent checks its conversation memory &lt;code&gt;(thread_id)&lt;/code&gt; via &lt;code&gt;MemorySaver&lt;/code&gt;, which enables multi-turn follow-up questions within the same session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Execution&lt;/strong&gt;: The LLM decides to call &lt;code&gt;@tool def analyze_gpu_fleet()&lt;/code&gt; defined in &lt;code&gt;main.py&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel Scraping&lt;/strong&gt;: &lt;code&gt;analyzer.py&lt;/code&gt; uses Python’s &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; to query the DigitalOcean API and each Droplet’s DCGM endpoint &lt;code&gt;(metrics.py)&lt;/code&gt; concurrently. This parallel approach prevents network bottlenecks when monitoring dozens of nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Omniscient Payload&lt;/strong&gt;: The analyzer packages all raw data (temperature, power, VRAM, RAM, CPU, cost) into a structured JSON dictionary that the LLM can reason about.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesis&lt;/strong&gt;: The LLM reads the JSON payload and responds in natural language with specific node names, costs, and actionable recommendations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about building stateful AI agents with LangGraph, follow the &lt;a href="https://www.digitalocean.com/community/tutorials/getting-started-agentic-ai-langgraph" rel="noopener noreferrer"&gt;Getting Started with Agentic AI Using LangGraph tutorial&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Customizing the blueprint to your needs
&lt;/h2&gt;

&lt;p&gt;This repository is built to be forked and modified. Here are the four main areas you should adjust to match your organization’s requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Customization 1: Tuning the logic (config.py)
&lt;/h3&gt;

&lt;p&gt;Open &lt;code&gt;config.py.&lt;/code&gt; This is the control center for your agent’s behavior.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Persona&lt;/strong&gt;: Edit &lt;code&gt;AGENT_SYSTEM_PROMPT&lt;/code&gt; to change how the AI communicates. For a highly technical DevOps assistant, remove the emojis and instruct it to output raw bullet points. For a management-facing report, tell it to summarize in cost terms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Thresholds&lt;/strong&gt;: The blueprint considers a GPU “Idle” when utilization falls below 2% by default. If your baseline workloads idle at a higher percentage, adjust the &lt;code&gt;THRESHOLDS&lt;/code&gt; dictionary:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;THRESHOLDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_temp_c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;82.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_util_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;95.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_vram_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;95.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idle_util_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idle_vram_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optimized_util_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;40.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optimized_vram_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;50.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idle_cpu_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idle_ram_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;15.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idle_load_15&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;starved_cpu_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;85.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;starved_ram_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;90.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optimized_cpu_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;40.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optimized_ram_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;50.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, if your inference servers typically idle at 8% GPU utilization between request bursts, set idle_util_percent to 10.0 to avoid false positives.&lt;/p&gt;

&lt;h3&gt;
  
  
  Customization 2: Changing the target infrastructure (analyzer.py)
&lt;/h3&gt;

&lt;p&gt;By default, the blueprint only scans Droplets with &lt;code&gt;"gpu"&lt;/code&gt; in the &lt;code&gt;size_slug&lt;/code&gt; to reduce unnecessary API calls. Open &lt;code&gt;analyzer.py&lt;/code&gt; and locate the slug filter. If you want the agent to monitor CPU-optimized or standard Droplets, modify this line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Change "gpu" to "c-" for CPU-Optimized, or remove the filter entirely to scan all Droplets.
&lt;/span&gt;&lt;span class="n"&gt;target_droplets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;all_droplets&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size_slug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Customization 3: Enriching the omniscient payload (analyzer.py and metrics.py)
&lt;/h3&gt;

&lt;p&gt;The LLM only knows what you explicitly pass to it. The default payload includes temperature, power, and VRAM data. If you install &lt;a href="https://prometheus.io/docs/guides/node-exporter/" rel="noopener noreferrer"&gt;Prometheus Node Exporter&lt;/a&gt; on your instances and want the AI to also analyze disk space, you would:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Update &lt;code&gt;metrics.py&lt;/code&gt; to scrape disk metrics from Node Exporter on port &lt;code&gt;9100&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Update the return dictionary at the bottom of &lt;code&gt;process_single_droplet&lt;/code&gt; in &lt;code&gt;analyzer.py&lt;/code&gt; to include the new field:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;droplet_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;droplet_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpu_temp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;temp_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpu_power&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;power_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vram_used&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;vram_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;disk_space_free_gb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;disk_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# New metric
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Customization 4: Adding actionable tools (main.py)
&lt;/h3&gt;

&lt;p&gt;The default blueprint is read-only. The most powerful upgrade is giving the AI permission to act on your infrastructure. In &lt;code&gt;main.py&lt;/code&gt;, you can add a new function with the &lt;code&gt;@tool&lt;/code&gt; decorator that uses the DigitalOcean API to power off a specific Droplet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;power_off_droplet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;droplet_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Power off a Droplet by ID. Use only when the user explicitly asks to stop an idle node.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

    &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DIGITALOCEAN_API_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.digitalocean.com/v2/droplets/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;droplet_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/actions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;power_off&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Successfully sent power-off command to Droplet &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;droplet_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to power off Droplet &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;droplet_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After adding any new tools, bind them to the LLM so the agent can invoke them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;llm_with_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bind_tools&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;analyze_gpu_fleet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;power_off_droplet&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Warning&lt;/strong&gt;: Giving an AI agent write access to your infrastructure requires careful guardrails. Consider adding confirmation prompts, restricting which Droplet tags the agent can act on, and logging all actions for audit purposes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Step 4: Testing your custom agent
&lt;/h2&gt;

&lt;p&gt;Once you have tailored the code, test it locally before deploying. Start the local development server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gradient agent run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a separate terminal, simulate user requests using curl.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jpmyvirmaagjrtig3kd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jpmyvirmaagjrtig3kd.png" alt="Agent testing workflow diagram" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 1: Deep diagnostic
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/run &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
           "prompt": "Give me a full diagnostic on my GPU nodes including temperature and power.",
           "thread_id": "audit-session-1"
         }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Expected Output&lt;/strong&gt;: The AI uses the Omniscient Payload to report exact temperatures, wattage, and RAM utilization for each GPU Droplet, alongside cost-saving recommendations for any idle nodes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 2: Contextual memory
&lt;/h3&gt;

&lt;p&gt;Because you are passing &lt;code&gt;thread_id: "audit-session-1"&lt;/code&gt;, the agent retains conversation context. You can ask follow-up questions without triggering a full re-scan of your infrastructure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/run &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
           "prompt": "Which of those nodes was the most expensive?",
           "thread_id": "audit-session-1"
         }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 3: Thread isolation
&lt;/h3&gt;

&lt;p&gt;The memory is strictly scoped by &lt;code&gt;thread_id&lt;/code&gt;. A request with a different thread ID sees no prior history and starts a fresh conversation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/run &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
           "prompt": "What was the second question I asked you?",
           "thread_id": "audit-session-2"
         }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Expected Output&lt;/strong&gt;: The agent responds that it has no record of previous questions in this session, confirming that thread isolation is working correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Cloud deployment:
&lt;/h2&gt;

&lt;p&gt;Once you are satisfied with your customizations, deploy the agent as a serverless endpoint on the DigitalOcean Gradient AI Platform:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gradient agent deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will receive a public endpoint URL that you can integrate into Slack bots, internal dashboards, &lt;a href="https://www.digitalocean.com/solutions/cicd-pipelines" rel="noopener noreferrer"&gt;CI/CD pipelines&lt;/a&gt;, or any HTTP client. The Gradient platform handles scaling, so your agent can serve multiple concurrent users without manual infrastructure management.&lt;/p&gt;

&lt;p&gt;For more details on building and deploying agents with the ADK, see &lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/how-to/build-agents-using-adk/" rel="noopener noreferrer"&gt;How to Build Agents Using ADK&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPU fleet cost optimization: When to use an AI agent vs. static dashboards
&lt;/h3&gt;

&lt;p&gt;One of the most common questions teams face when setting up &lt;a href="https://www.digitalocean.com/community/tutorials/monitoring-gpu-utilization-in-real-time" rel="noopener noreferrer"&gt;GPU monitoring&lt;/a&gt; is whether to build a custom AI agent or rely on traditional dashboard tooling. The right choice depends on your fleet size, the complexity of your workloads, and how quickly you need to act on idle resources.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Static Dashboards (Grafana + Prometheus)&lt;/th&gt;
&lt;th&gt;AI Agent (This Blueprint)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate: requires Prometheus server, Grafana, and DCGM exporter configuration&lt;/td&gt;
&lt;td&gt;Low: clone the repo, set env vars, deploy with &lt;code&gt;gradient agent deploy&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Real-time alerting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rule-based alerts with fixed thresholds&lt;/td&gt;
&lt;td&gt;Natural language queries with adaptive reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-metric correlation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual: you visually compare multiple charts&lt;/td&gt;
&lt;td&gt;Automatic: the LLM correlates temperature, power, VRAM, and cost in a single response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Actionability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Read-only dashboards; separate automation needed&lt;/td&gt;
&lt;td&gt;Extensible with &lt;code&gt;@tool&lt;/code&gt; decorator for direct API actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Conversational follow-ups&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;Built-in via LangGraph &lt;code&gt;MemorySaver&lt;/code&gt; and &lt;code&gt;thread_id&lt;/code&gt; scoping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large teams with dedicated SRE/DevOps staff and historical trend analysis&lt;/td&gt;
&lt;td&gt;Small-to-mid teams that need fast, conversational GPU auditing without building dashboard infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For teams running fewer than 20 GPU Droplets, the AI agent approach eliminates the overhead of maintaining a full monitoring stack while still providing actionable insights. For larger fleets, consider running both: use &lt;a href="https://www.digitalocean.com/community/developer-center/setting-up-monitoring-for-digitalocean-managed-databases-with-prometheus-and-grafana" rel="noopener noreferrer"&gt;Prometheus and Grafana&lt;/a&gt; for long-term trend storage and the AI agent for on-demand, conversational diagnostics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advantages and tradeoffs
&lt;/h2&gt;

&lt;p&gt;When adapting this blueprint for production, keep these architectural considerations in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Contextual intelligence&lt;/strong&gt;: LangGraph’s &lt;code&gt;MemorySaver&lt;/code&gt; gives the agent conversation history, allowing natural drill-down investigations. You can ask “Which node is idle?” followed by “How much is it costing me per hour?” without repeating context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel processing&lt;/strong&gt;: The analyzer uses Python’s &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; to scan dozens of Droplets concurrently, preventing the LLM from timing out while waiting for sequential network calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost justification&lt;/strong&gt;: If the AI agent spots a single idle $500/month GPU instance, it pays for itself many times over. The inference cost of running a single diagnostic query on the Gradient platform is negligible compared to the savings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graceful degradation&lt;/strong&gt;: If the DCGM metric scraper cannot reach port &lt;code&gt;9400&lt;/code&gt; (for example, because of firewall rules or the exporter not being installed), the agent reports “DCGM Missing” for that node and falls back to standard CPU and RAM metrics rather than failing entirely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security considerations&lt;/strong&gt;: The agent requires a DigitalOcean API token with read permissions. If you add write tools (like the &lt;code&gt;power_off_droplet&lt;/code&gt; example), scope the token’s permissions carefully and implement audit logging.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You have successfully deployed a multi-tool AI agent using the DigitalOcean Gradient AI Platform that transforms raw infrastructure metrics into conversational, actionable intelligence. By combining DigitalOcean API data with real-time NVIDIA DCGM telemetry and an LLM reasoning engine, you have built a system that addresses three major operational challenges:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Stopping the silent budget drain
&lt;/h3&gt;

&lt;p&gt;The most immediate value this agent delivers is catching “forgotten resources.” When engineers spin up GPU Droplets for experiments or temporary training runs, those instances often continue billing long after the work is done. Standard CPU monitors might show background processes at 1%, making the instance look active.&lt;/p&gt;

&lt;p&gt;By querying the NVIDIA DCGM exporter directly for engine and VRAM utilization, the AI agent cuts through that noise. It identifies premium GPU nodes that are doing no meaningful compute work, letting you stop the financial drain before it compounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Eliminating dashboard fatigue
&lt;/h3&gt;

&lt;p&gt;In a traditional workflow, diagnosing a cloud infrastructure issue means opening the DigitalOcean Control Panel to check Droplet status, switching to Grafana to review DCGM metrics, and consulting an architecture diagram to remember what each node is responsible for.&lt;/p&gt;

&lt;p&gt;This agent consolidates that entire workflow. Using LangGraph’s conversational memory and the Omniscient Payload, you ask a single question and receive a complete summary of host details, GPU temperature, power usage, and cost impact in one response.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Bridging observability and action
&lt;/h3&gt;

&lt;p&gt;Traditional dashboards are read-only. They can alert you that a resource is idle, but they do not provide the tools to act on that information.&lt;/p&gt;

&lt;p&gt;Because this blueprint is built on the Gradient ADK, the agent is inherently extensible. By adding a few lines of Python using the &lt;code&gt;@tool decorator&lt;/code&gt;, you can upgrade this agent from a passive monitor into an active operator that executes API commands to power off idle nodes, resize underutilized instances, or trigger scaling events automatically.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/dosraashid/do-adk-gpu-monitor" rel="noopener noreferrer"&gt;do-adk-gpu-monitor&lt;/a&gt; repository is your starting point. Clone the code, adjust the efficiency thresholds to match your specific workloads, and start having conversations with your infrastructure today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference and resources
&lt;/h2&gt;

&lt;p&gt;Ready to take your GPU fleet management and AI agent development further? Explore these resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/" rel="noopener noreferrer"&gt;DigitalOcean Gradient AI Platform Documentation&lt;/a&gt;&lt;/strong&gt;: Full reference for deploying and managing AI agents, models, and inference endpoints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.digitalocean.com/products/gradient-ai-platform/how-to/build-agents-using-adk/" rel="noopener noreferrer"&gt;How to Build Agents Using ADK&lt;/a&gt;&lt;/strong&gt;: Step-by-step guide to creating custom agents with the Agent Development Kit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://dev.tourl"&gt;Getting Started with Agentic AI Using LangGraph&lt;/a&gt;&lt;/strong&gt;: Learn the fundamentals of building stateful, multi-step AI agents with LangGraph.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/stable-diffusion-gpu-droplet" rel="noopener noreferrer"&gt;Stable Diffusion on DigitalOcean GPU Droplets&lt;/a&gt;&lt;/strong&gt;: Run GPU-accelerated AI workloads on DigitalOcean GPU Droplets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/harnessing-gpus-glb-vpc-for-genai-products" rel="noopener noreferrer"&gt;Scaling Gradient with GPU Droplets and Networking&lt;/a&gt;&lt;/strong&gt;: Architect production GenAI deployments with GPU Droplets, global load balancers, and VPC networking.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>gpu</category>
      <category>nvidia</category>
      <category>ai</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
