<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hedi Manai</title>
    <description>The latest articles on DEV Community by Hedi Manai (@hedimanai).</description>
    <link>https://dev.to/hedimanai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3921983%2F211b83f8-97df-4d81-868e-b3171d62fa23.jpg</url>
      <title>DEV Community: Hedi Manai</title>
      <link>https://dev.to/hedimanai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hedimanai"/>
    <language>en</language>
    <item>
      <title>ToolOps: Stop Rewriting the Same Boilerplate Every Time You Build an AI Agent</title>
      <dc:creator>Hedi Manai</dc:creator>
      <pubDate>Sat, 09 May 2026 23:04:33 +0000</pubDate>
      <link>https://dev.to/hedimanai/toolops-stop-rewriting-the-same-boilerplate-every-time-you-build-an-ai-agent-363b</link>
      <guid>https://dev.to/hedimanai/toolops-stop-rewriting-the-same-boilerplate-every-time-you-build-an-ai-agent-363b</guid>
      <description>&lt;p&gt;You've built the demo. It works. The LLM responds, the tools fire, the output looks great.&lt;/p&gt;

&lt;p&gt;Then you push it to production — and everything breaks.&lt;/p&gt;

&lt;p&gt;API calls fail with no retry logic. Identical queries hammer your LLM endpoint ten times per minute, burning through credits. A single bad response cascades into an agent loop. You have no idea what's happening inside because there's nothing to look at.&lt;/p&gt;

&lt;p&gt;So you start writing infrastructure. A retry decorator here. A cache manager there. A circuit-breaker wrapper you found on Stack Overflow. Eighty lines of boilerplate — just to make one tool call production-safe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is the problem ToolOps was built to solve.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Production Gap Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Building AI agents has never been easier. Frameworks like LangChain, CrewAI, and LlamaIndex get you from idea to working prototype in an afternoon. But moving that prototype to production exposes a gap that frameworks don't fill: &lt;strong&gt;the reliability, cost, and observability layer&lt;/strong&gt; that every real agent needs.&lt;/p&gt;

&lt;p&gt;Every external call your agent makes — to an LLM, an API, a database — is a tool call. In production, those calls are expensive, slow, and unreliable. Without proper infrastructure around them, you're flying blind.&lt;/p&gt;

&lt;p&gt;Most developers solve this by copy-pasting the same boilerplate across every project. ToolOps solves it with a single decorator.&lt;/p&gt;




&lt;h2&gt;
  
  
  What ToolOps Actually Does
&lt;/h2&gt;

&lt;p&gt;ToolOps is a framework-agnostic middleware SDK for Python. It sits between your agent and the external world, wrapping any async function with caching, retries, circuit breakers, request coalescing, and observability — without touching your business logic.&lt;/p&gt;

&lt;p&gt;The core idea is elegant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before: 80+ lines of custom infrastructure
# After:
&lt;/span&gt;
&lt;span class="nd"&gt;@readonly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retry_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_market_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One decorator. Your function is now cached for an hour, automatically retried on failure, and fully traced. That's the entire API surface for most use cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Decorators, Every Case Covered
&lt;/h2&gt;

&lt;p&gt;ToolOps makes a clean architectural distinction between two types of tool calls:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;@readonly&lt;/code&gt;&lt;/strong&gt; — for functions that read data. API lookups, database queries, LLM calls, file reads. These get full caching + retry support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;@sideeffect&lt;/code&gt;&lt;/strong&gt; — for functions that write or act. Sending emails, executing trades, posting messages. These are never cached (you genuinely want them to run), but they're protected by retries and circuit breakers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Read: cache it, retry it, trace it
&lt;/span&gt;&lt;span class="nd"&gt;@readonly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retry_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stale_if_error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_stock_price&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;market_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Write: protect it, but always execute it
&lt;/span&gt;&lt;span class="nd"&gt;@sideeffect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;circuit_breaker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retry_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_trade&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;broker_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This separation is intentional and surprisingly useful. It forces you to think clearly about what your agent is actually doing — and gives each class of operation exactly the protection it needs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Features That Matter in Production
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Semantic Caching
&lt;/h3&gt;

&lt;p&gt;Standard caches match on exact strings. &lt;code&gt;"weather in Paris"&lt;/code&gt; and &lt;code&gt;"Paris weather"&lt;/code&gt; hit different cache keys, so your LLM gets called twice for the same answer.&lt;/p&gt;

&lt;p&gt;ToolOps includes a semantic cache that matches by &lt;strong&gt;meaning&lt;/strong&gt; using vector embeddings. Queries above a configurable similarity threshold share the same cached result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;embedder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformerEmbedder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cache_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SemanticCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embedder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.92&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nd"&gt;@readonly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ask_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;openai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Three prompts, one real LLM call:
&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;ask_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the latest AI news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;ask_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Give me a summary of recent AI news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# Cache hit ✅
&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;ask_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s happening in AI recently?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# Cache hit ✅
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For agents that handle natural language queries, this can cut LLM calls by up to 90%.&lt;/p&gt;

&lt;h3&gt;
  
  
  Request Coalescing
&lt;/h3&gt;

&lt;p&gt;When 50 concurrent agents request the same data during a cache miss, ToolOps fires &lt;strong&gt;one&lt;/strong&gt; real API call and returns the result to all 50. Without this, a thundering herd can overwhelm your API rate limits instantly. With it, the problem simply doesn't exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stale-If-Error Fallback
&lt;/h3&gt;

&lt;p&gt;If your upstream API goes down, ToolOps can serve the last known good cached value instead of throwing an exception. For slowly-changing data like exchange rates or configuration, this is often exactly the right behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@readonly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;cache_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;stale_if_error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;stale_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;86400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Accept data up to 24 hours old if the API is down
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_exchange_rates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;forex_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multiple Cache Backends
&lt;/h3&gt;

&lt;p&gt;Register as many backends as you need and route different functions to the right one:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backend&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MemoryCache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Development, single-process, low-latency hot data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;FileCache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local scripts, lightweight persistence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PostgresCache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Production, distributed, durable across restarts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SemanticCache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;NLP queries, RAG pipelines, LLM cost reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A hot-cold cache pattern — in-memory for frequent reads, Postgres for expensive computations — is a single configuration call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Built-In Observability
&lt;/h3&gt;

&lt;p&gt;Every cache hit, miss, retry, timeout, and circuit-breaker event is logged as structured JSON, compatible with Datadog, Loki, CloudWatch, and any log aggregator. Add the &lt;code&gt;[otel]&lt;/code&gt; extra and you get full OpenTelemetry tracing and Prometheus metrics with zero extra code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agent_run (450ms)
  ├── get_market_data (12ms)  [cache: hit]
  ├── get_news_feed (310ms)   [cache: miss, retries: 1]
  └── send_report (128ms)     [circuit: closed]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Going from zero insight to full distributed tracing takes about five lines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Framework Agnostic by Design
&lt;/h2&gt;

&lt;p&gt;ToolOps wraps plain Python async functions. That means it works with whatever agent framework you're using — no special integration required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangChain / LangGraph&lt;/strong&gt; — stack &lt;code&gt;@readonly&lt;/code&gt; under &lt;code&gt;@tool&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CrewAI&lt;/strong&gt; — apply it directly to &lt;code&gt;BaseTool._run()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LlamaIndex&lt;/strong&gt; — decorate then pass to &lt;code&gt;FunctionTool.from_defaults()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP&lt;/strong&gt; — generate a fully typed MCP tool definition with &lt;code&gt;MCPIntegration.to_mcp_definition()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PydanticAI, Agno, AutoGPT, Haystack&lt;/strong&gt; — any framework that calls Python async functions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you migrate frameworks (and you will), your infrastructure layer stays the same.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started in Under 2 Minutes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Install:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"toolops[all]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Verify:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;toolops doctor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;toolops&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;readonly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_manager&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;toolops.cache&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemoryCache&lt;/span&gt;

&lt;span class="n"&gt;cache_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;MemoryCache&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;is_default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@readonly&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retry_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;weather_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The modular install system means zero required external dependencies for the core package. Add &lt;code&gt;[postgres]&lt;/code&gt;, &lt;code&gt;[semantic]&lt;/code&gt;, or &lt;code&gt;[otel]&lt;/code&gt; only when you need them.&lt;/p&gt;

&lt;p&gt;The CLI (&lt;code&gt;toolops stats&lt;/code&gt;, &lt;code&gt;toolops clear&lt;/code&gt;, &lt;code&gt;toolops doctor&lt;/code&gt;) gives you a live view into cache hit rates, latency, and backend health without touching your code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters Now
&lt;/h2&gt;

&lt;p&gt;AI agents are moving fast from demos to production. The infrastructure gap between "it works on my machine" and "it's running reliably at scale" is real, and it's expensive to rebuild from scratch every time.&lt;/p&gt;

&lt;p&gt;ToolOps is a clean answer to a problem that every agent developer hits eventually. It's not a framework — it's the layer beneath your framework, the one that makes your tools trustworthy.&lt;/p&gt;

&lt;p&gt;The code is open source, Apache 2.0 licensed, and actively maintained.&lt;/p&gt;

&lt;p&gt;If you're building agents that need to survive real traffic, real failures, and real costs, it's worth ten minutes of your time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/hedimanai-pro/toolops" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt; · &lt;strong&gt;&lt;a href="https://pypi.org/project/toolops/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;&lt;/strong&gt; · &lt;strong&gt;&lt;a href="https://hedimanai.vercel.app/projects/toolops.html" rel="noopener noreferrer"&gt;Documentation&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>python</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
