<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pratik Pathak</title>
    <description>The latest articles on DEV Community by Pratik Pathak (@pratikpathak).</description>
    <link>https://dev.to/pratikpathak</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F602830%2F664eea36-3e68-40f5-b284-c40d635debd5.jpg</url>
      <title>DEV Community: Pratik Pathak</title>
      <link>https://dev.to/pratikpathak</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pratikpathak"/>
    <language>en</language>
    <item>
      <title>The Real Difference Between Azure OpenAI and the Standard API</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Fri, 24 Apr 2026 03:30:00 +0000</pubDate>
      <link>https://dev.to/pratikpathak/the-real-difference-between-azure-openai-and-the-standard-api-29f9</link>
      <guid>https://dev.to/pratikpathak/the-real-difference-between-azure-openai-and-the-standard-api-29f9</guid>
      <description>&lt;p&gt;Azure OpenAI Service is increasingly becoming a critical decision point for enterprise teams. Artificial Intelligence has come a long way, and today, tools like ChatGPT, GPT-4, and DALL-E are helping developers, students, and businesses every day. But here’s a common question I hear people ask: “What’s the difference between OpenAI and Azure OpenAI?” If you’ve ever wondered which one to use, or if the Azure wrapper is worth the cloud overhead, let’s break it down.&lt;/p&gt;

&lt;p&gt;I decided to dig deep into the architectural differences to see how much of a technical edge Azure OpenAI actually gives over just hitting the standard OpenAI API. Spoiler alert: OpenAI gives you the model, but Azure OpenAI gives you the model plus an entire enterprise cloud ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Architectural Differences
&lt;/h2&gt;

&lt;p&gt;At first glance, hitting the direct OpenAI API feels identical to the Azure endpoint. You pass your payload, and you get your tokens back. However, the infrastructure layer is entirely different.&lt;/p&gt;

&lt;p&gt;OpenAI (via OpenAI.com or their direct API) hosts its models on its own proprietary compute instances. It’s built for rapid iteration and developer access. Azure OpenAI, on the other hand, runs the exact same foundational models (GPT-4o, DALL-E 3, Whisper) but hosts them entirely within your Microsoft Azure tenant boundary.&lt;/p&gt;

&lt;p&gt;The models themselves are mathematically identical. The difference lies entirely in the infrastructure, data residency, and compliance wrapper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network Isolation &amp;amp; Security
&lt;/h3&gt;

&lt;p&gt;This is usually the dealbreaker for enterprise deployments. With the direct OpenAI API, your data travels over the public internet to OpenAI’s servers. While they have strict privacy policies (API data isn’t used for training by default), the network path is public.&lt;/p&gt;

&lt;p&gt;Azure OpenAI allows you to use Azure Virtual Networks (VNet) and Azure Private Link. This means your application can communicate with the AI models entirely within the Microsoft backbone network. Your traffic never hits the public internet. If you want to dive deeper into the official setup, you can read more in the &lt;a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/overview" rel="noopener noreferrer"&gt;official Microsoft documentation&lt;/a&gt;. Let’s look at how a basic Python integration looks when hitting an Azure endpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AzureOpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AzureOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AZURE_OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  
    &lt;span class="n"&gt;api_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-04-01-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;azure_endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AZURE_OPENAI_ENDPOINT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-deployment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Notice this is a custom deployment name, not just the model name
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a technical assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain VNet integration.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Data Residency and Compliance
&lt;/h2&gt;

&lt;p&gt;Why did I decide to prioritize Azure for production workloads? Simply put: data residency. When you deploy an instance of Azure OpenAI, you select a specific geographic region (e.g., East US, West Europe). All prompts, completions, and fine-tuning data are stored within that specific region.&lt;/p&gt;

&lt;p&gt;Direct OpenAI doesn’t give you this granular geographical control. Furthermore, Azure OpenAI inherits all of Microsoft’s compliance certifications, including HIPAA, SOC 2, and ISO 27001. If you’re building in healthcare or finance, this isn’t just a nice-to-have; it’s a hard requirement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Identity and Access Management (IAM)
&lt;/h2&gt;

&lt;p&gt;OpenAI uses standard API keys. If a key leaks, anyone can use it until it’s revoked. Azure OpenAI natively integrates with Microsoft Entra ID (formerly Azure AD). This allows for Role-Based Access Control (RBAC).&lt;/p&gt;

&lt;p&gt;Instead of hardcoding API keys, your application can authenticate to Azure OpenAI using Managed Identities, eliminating the risk of leaked credentials entirely.&lt;/p&gt;

&lt;p&gt;Here is what authenticating via Azure DefaultAzureCredential looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;azure.identity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DefaultAzureCredential&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AzureOpenAI&lt;/span&gt;

&lt;span class="n"&gt;credential&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DefaultAzureCredential&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;credential&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://cognitiveservices.azure.com/.default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AzureOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;azure_endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://my-custom-endpoint.openai.azure.com/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;azure_ad_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-04-01-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Content Filtering and Responsible AI
&lt;/h2&gt;

&lt;p&gt;Another massive difference is the Azure AI Content Safety layer. While OpenAI has baseline moderation, Azure OpenAI lets you create custom content filters. You can configure the exact severity thresholds (Low, Medium, High) for categories like hate speech, sexual content, violence, and self-harm. You can even create custom blocklists for specific industry terms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pros, Cons, and Trade-offs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a&gt;Azure OpenAI Service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a&gt;OpenAI Direct API&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt; Enterprise security (VNet, Private Link), strict data residency, Managed Identities via Entra ID, customizable content filtering, backed by Azure SLA.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt; Can be slightly slower to receive the absolute newest model versions from OpenAI. Requires navigating the complex Azure portal.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt; Immediate access to the latest models on day one. Extremely simple to set up and start coding. Lower barrier to entry for solo developers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt; Lacks enterprise VNet isolation. Less granular control over geographic data residency. API keys are harder to secure securely at scale.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;For side projects, hackathons, or general scripting, I’ll still reach for the direct OpenAI API. It’s frictionless. But if I’m building an AI agent that touches PII, requires strict compliance, or lives inside a corporate network, Azure OpenAI Service is the only logical choice. You get the brilliance of GPT-4o with the fortress of Microsoft Azure.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>aicompliance</category>
      <category>aisecurity</category>
      <category>apimanagement</category>
    </item>
    <item>
      <title>I run Code AI Locally, fully offline and Pay 0$ on subscription</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Thu, 23 Apr 2026 06:25:08 +0000</pubDate>
      <link>https://dev.to/pratikpathak/how-to-run-offline-code-ai-locally-complete-guide-2026-443k</link>
      <guid>https://dev.to/pratikpathak/how-to-run-offline-code-ai-locally-complete-guide-2026-443k</guid>
      <description>&lt;p&gt;I was working on a sensitive client architecture last week, sitting in a coffee shop with spotty Wi-Fi, when my IDE suddenly crawled to a halt. My cloud-based AI coding assistant could not connect to its API. It was in that frustrating moment that I realized relying entirely on cloud-hosted LLMs for daily engineering tasks is a single point of failure. Why are we sending every keystroke, every proprietary function, and every sensitive database schema over the internet when modern laptops have enough compute to run these models natively?&lt;/p&gt;

&lt;p&gt;That is when I decided to fully explore the world of &lt;strong&gt;offline code AI&lt;/strong&gt;. The ecosystem has matured incredibly fast in 2026. You no longer need a massive GPU server rack to run a competent coding assistant locally. If you have an Apple Silicon Mac (M1/M2/M3/M4) or a Windows machine with a decent dedicated GPU, you can run powerful code generation models directly on your hardware, completely offline, with zero latency and zero subscription fees.&lt;/p&gt;

&lt;p&gt;Let’s figure out how to set this up together, exploring the best tools, models, and configurations to replace cloud-dependent assistants.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why You Need Offline Code AI in 2026
&lt;/h2&gt;

&lt;p&gt;Beyond the obvious benefit of working on an airplane or during an internet outage, there are three massive reasons why engineering teams are shifting toward local LLMs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Privacy and Security:&lt;/strong&gt; When you work with healthcare data, financial systems, or highly confidential proprietary code, sending context to a third-party API is a massive compliance risk. Offline AI guarantees your code never leaves your machine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero API Costs:&lt;/strong&gt; Cloud models charge per token. If your IDE assistant is constantly indexing your workspace and sending context windows to the cloud, the bill adds up quickly. Local models are free forever.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customization:&lt;/strong&gt; You can fine-tune or swap out models instantly based on the specific language you are writing. You can run a specialized Rust model one minute, and a Python-optimized model the next.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are working in an enterprise environment, many CISOs are now actively blocking cloud-based code assistants. Getting comfortable with offline code AI is becoming a mandatory engineering skill.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack: Ollama and Continue.dev
&lt;/h2&gt;

&lt;p&gt;There are many ways to run local models, but the absolute best developer experience right now is the combination of &lt;strong&gt;Ollama&lt;/strong&gt; (for model hosting) and &lt;strong&gt;Continue.dev&lt;/strong&gt; (for IDE integration).&lt;/p&gt;

&lt;h2&gt;
  
  
  Downloads &amp;amp; Tools Needed
&lt;/h2&gt;

&lt;p&gt;To get your offline code AI stack running, you’ll need to download these free, open-source tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ollama:&lt;/strong&gt; The local model runner and API backend. Download it at &lt;a href="https://ollama.com/download" rel="noopener noreferrer"&gt;ollama.com&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continue.dev:&lt;/strong&gt; The IDE extension (VS Code or JetBrains) that connects your editor to Ollama. Download the extension at &lt;a href="https://continue.dev" rel="noopener noreferrer"&gt;continue.dev&lt;/a&gt; or directly from your IDE’s marketplace.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  1. Setting up the Local API with Ollama
&lt;/h3&gt;

&lt;p&gt;Ollama is a lightweight tool that allows you to run open-source LLMs locally. It acts as the backend server. Download and install it, then open your terminal to pull a coding-specific model. For general coding tasks, I highly recommend downloading the DeepSeek Coder model or CodeLlama.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pull and run the DeepSeek Coder model locally&lt;/span&gt;
ollama run deepseek-coder

&lt;span class="c"&gt;# Alternatively, if you have more RAM (16GB+), run the larger 7b version&lt;/span&gt;
ollama run deepseek-coder:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the model is downloaded, Ollama exposes a local API (usually on port 11434) that your IDE can talk to. Your machine is now officially an AI server.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Bridging the Gap with Continue.dev
&lt;/h3&gt;

&lt;p&gt;Continue.dev is an open-source extension for VS Code and JetBrains that brings the “Copilot” experience to your local models. Instead of hardcoding the assistant to a cloud provider, you can configure it to talk to your local Ollama instance.&lt;/p&gt;

&lt;p&gt;After installing the extension, you simply open the &lt;code&gt;config.json&lt;/code&gt; file for Continue and point it to your local environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeepSeek Coder (Local)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-coder"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:11434"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tabAutocompleteModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Starcoder 2 (Autocomplete)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"starcoder2:3b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:11434"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice how we configured two different models! We use a larger model (DeepSeek) for the chat interface where we ask complex questions, and a much smaller, faster model (Starcoder2 3B) for real-time tab autocomplete. This is the secret to a snappy offline experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Top Local Models for Offline Code AI
&lt;/h2&gt;

&lt;p&gt;The beauty of this architecture is that you can swap out the “brain” of your assistant whenever a new model drops. Here is what I am running locally right now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek Coder V2:&lt;/strong&gt; Unbelievably good at Python, JavaScript, and C++. It punches way above its weight class and handles complex logic refactoring beautifully.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Starcoder 2 (3B):&lt;/strong&gt; The absolute king of low-latency autocomplete. If you want your code completions to feel instantaneous on a laptop, this is the model you run in the background.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3 (8B):&lt;/strong&gt; While not strictly a coding model, the base Llama 3 model is fantastic for generating documentation, writing commit messages, and explaining abstract architectural concepts offline.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Trade-offs: Hardware Constraints
&lt;/h2&gt;

&lt;p&gt;I have to be honest here. Running offline code AI is not pure magic – it is bound by the laws of physics and RAM. If you are running a 5-year-old laptop with 8GB of memory, your experience is going to be painful.&lt;/p&gt;

&lt;p&gt;To run a 7B or 8B parameter model comfortably while also running Docker, VS Code, and a browser, you really need 16GB of Unified Memory (like an M-series Mac) or a dedicated Nvidia GPU with at least 8GB of VRAM. If your hardware is constrained, you can still participate! Just download smaller, highly quantized models (like 1.5B parameter models) which can run on almost anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Why did I decide to fully transition my workflow? Because having a coding assistant that works at 35,000 feet, never exposes my client’s proprietary algorithms, and costs zero dollars a month is an absolute superpower. It forces you to understand how these models actually work under the hood, rather than just treating them as magic black boxes provided by massive tech monopolies.&lt;/p&gt;

&lt;p&gt;If you haven’t tried running an offline code AI stack yet, take 15 minutes today, install Ollama and Continue, and pull a local model. You will be shocked at how capable your local hardware actually is.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>aicodeautocomplete</category>
      <category>aionapplesilicon</category>
      <category>aionmacbookm1</category>
    </item>
    <item>
      <title>LangGraph vs Azure AI Agents: Orchestration Frameworks Compared</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Wed, 22 Apr 2026 04:30:00 +0000</pubDate>
      <link>https://dev.to/pratikpathak/langgraph-vs-azure-ai-agents-orchestration-frameworks-compared-234d</link>
      <guid>https://dev.to/pratikpathak/langgraph-vs-azure-ai-agents-orchestration-frameworks-compared-234d</guid>
      <description>&lt;p&gt;I was sitting in a design review last week, staring at a whiteboard covered in multi-agent workflows, and a terrifying thought crossed my mind: how on earth are we going to orchestrate all of this reliably in production? We developers get so obsessed with crafting the perfect prompts and tool use that we often forget about the underlying framework. Orchestrating multi-agent workflows is rapidly becoming the new frontier in AI development. As applications evolve from simple chat interfaces to complex, autonomous agents that can plan, execute, and collaborate, the framework you choose becomes your most critical architectural decision.&lt;/p&gt;

&lt;p&gt;Two powerful contenders have emerged at the forefront of this space: LangGraph (by LangChain) and Azure AI Agents. Both offer robust solutions for building stateful, multi-agent applications, but they take fundamentally different approaches to architecture, deployment, and developer experience. Let’s figure out which one makes sense for your next enterprise build.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is LangGraph?
&lt;/h2&gt;

&lt;p&gt;LangGraph is an open-source library built on top of LangChain, designed specifically for creating stateful, multi-actor applications with LLMs. At its core, LangGraph models agent workflows as graphs. Nodes represent agents or functions, and edges represent the flow of data or control between them.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Developer’s Playground
&lt;/h3&gt;

&lt;p&gt;If you can write it in Python or TypeScript, you can model it in LangGraph. You have absolute control over the execution flow, state transitions, and tool integrations. Unlike standard Directed Acyclic Graphs (DAGs), LangGraph natively supports cyclic workflows. This is absolutely essential for agents that need to reflect, self-correct, or retry actions until a condition is met. Why did I decide to use LangGraph for a recent open-source project? Because it gave me granular control over the state checkpointing system, allowing me to pause, resume, or “time travel” through agent states.&lt;/p&gt;

&lt;p&gt;Being part of the LangChain ecosystem means immediate access to thousands of community tools, document loaders, and vector store integrations out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Azure AI Agents?
&lt;/h2&gt;

&lt;p&gt;Azure AI Agents (formerly part of the Azure OpenAI Assistant API features) represents Microsoft’s enterprise-grade, managed approach to building intelligent applications. It abstracts away much of the infrastructure complexity required to run multi-agent systems securely at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Managed Enterprise Engine
&lt;/h3&gt;

&lt;p&gt;With Azure AI Agents, there is no need to provision custom state stores or handle checkpointing databases manually. Azure manages the underlying compute and state persistence, often backed securely by Cosmos DB or Azure Storage. The biggest selling point for me? Out-of-the-box compliance with enterprise standards, including Entra ID (Azure AD B2C) integration, private endpoints, and data residency guarantees.&lt;/p&gt;

&lt;p&gt;It also features seamless Azure ecosystem integration. You get native connectivity to Azure OpenAI models, Azure AI Search for RAG pipelines, and Azure Monitor for telemetry without writing extensive glue code. The built-in threading simplifies conversational state management by providing managed threads, completely removing the headache of manual context window management.&lt;/p&gt;

&lt;h2&gt;
  
  
  Head-to-Head Architectural Comparison
&lt;/h2&gt;

&lt;p&gt;Let’s look at how these two frameworks stack up across the most critical dimensions for engineering teams.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Developer Experience and Control
&lt;/h3&gt;

&lt;p&gt;LangGraph is a developer’s playground. You define the exact state schema, write the reducer functions, and wire up the nodes manually. This gives you granular control but comes with a steeper learning curve and more boilerplate code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;

&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_agent_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;should_continue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Azure AI Agents abstracts the graph away. You define instructions, equip the agent with tools (like Code Interpreter or Retrieval), and let the managed API handle the orchestration. It’s faster to market but less customizable if you need a highly specific, non-standard routing logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. State Management and Memory
&lt;/h3&gt;

&lt;p&gt;In LangGraph, state is a first-class citizen. You can use SQLite locally or PostgreSQL in production via LangGraph Cloud or custom deployments. You can easily inject human-in-the-loop steps to approve actions before they execute.&lt;/p&gt;

&lt;p&gt;Azure AI Agents handles state opaquely via its managed Threads API. While incredibly convenient, you have less visibility into the raw state object at intermediate steps compared to LangGraph’s transparent checkpointing. However, for most conversational and task-oriented workflows, Azure’s managed memory is more than sufficient and entirely maintenance-free.&lt;/p&gt;

&lt;p&gt;If you are dealing with strict compliance regulations that require you to audit every intermediate thought process of the LLM, LangGraph’s transparent state database might be legally required over Azure’s managed opaque threads.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Deployment and Scalability
&lt;/h3&gt;

&lt;p&gt;Deploying a LangGraph application into production requires setting up your own API layer (e.g., FastAPI), managing a state database, and handling worker scaling. Though LangSmith and LangGraph Cloud are changing this, it’s still a separate platform-as-a-service to manage.&lt;/p&gt;

&lt;p&gt;Azure AI Agents is essentially serverless. You call the API, and Microsoft scales the underlying infrastructure. If your organization is already embedded in the Azure cloud, deploying Azure AI Agents is a natural extension of your existing architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict: Which Should You Choose?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a&gt;Choose LangGraph&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a&gt;Choose Azure AI Agents&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You are building highly custom, complex cognitive architectures (e.g., hierarchical agent teams with non-standard reflection loops).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You want zero vendor lock-in and prefer open-source Python or TypeScript solutions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You need deep, programmatic control over every step of the agent’s thought process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You are building enterprise applications where security, compliance, and data privacy are non-negotiable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You want to ship to production quickly without managing state databases or underlying compute infrastructure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Your tech stack is already heavily invested in Azure (Azure OpenAI, Cosmos DB, Entra ID).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Both LangGraph and Azure AI Agents are powerful tools, but they cater to different philosophies. LangGraph gives you the steering wheel, the engine, and the raw parts to build your own custom vehicle. Azure AI Agents gives you a managed, enterprise-ready fleet that gets you to your destination safely and securely. The best choice depends entirely on your team’s expertise, timeline, and security constraints. I’ve found myself using LangGraph for rapid prototyping and Azure AI Agents for production systems that handle PII. Let’s keep building and experimenting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related Reading:&lt;/strong&gt; For more on architectural decisions in AI, check out my thoughts on &lt;a href="https://pratikpathak.com/managing-state-in-multi-agent-workflows-redis-vs-cosmos-db-in-production/" rel="noopener noreferrer"&gt;Managing State in Multi-Agent Workflows&lt;/a&gt; and how to handle &lt;a href="https://pratikpathak.com/silent-failures-the-hidden-reason-your-ai-agents-keep-getting-stuck-in-production/" rel="noopener noreferrer"&gt;Silent Failures in Production AI Agents&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>azure</category>
      <category>azuredeployments</category>
      <category>azureidentity</category>
    </item>
    <item>
      <title>I saved up 80% Azure OpenAi cost optimization by making these 7 architectural decision</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Tue, 21 Apr 2026 04:30:00 +0000</pubDate>
      <link>https://dev.to/pratikpathak/i-saved-up-80-azure-openai-cost-optimization-by-making-these-7-architectural-decision-438f</link>
      <guid>https://dev.to/pratikpathak/i-saved-up-80-azure-openai-cost-optimization-by-making-these-7-architectural-decision-438f</guid>
      <description>&lt;p&gt;&lt;strong&gt;Azure OpenAI cost optimization&lt;/strong&gt; becomes a real concern not during experimentation, but after your system goes live.&lt;br&gt;&lt;br&gt;
A fintech team running ~50,000 daily queries saw their monthly bill jump from $3,000 to $28,000 in six weeks-with no new features shipped.&lt;br&gt;&lt;br&gt;
Nothing obvious broke.&lt;br&gt;&lt;br&gt;
Latency stayed stable. Outputs looked fine. But under the hood, retries increased, prompts grew longer, and multi-step workflows quietly multiplied token usage.&lt;br&gt;&lt;br&gt;
This is where &lt;strong&gt;azure-openai-cost-optimization&lt;/strong&gt; shifts from a pricing problem to an architectural one.&lt;/p&gt;


&lt;h2&gt;
  
  
  Decision 1: Single-Call Simplicity vs Multi-Step Expansion
&lt;/h2&gt;

&lt;p&gt;The fastest way to increase cost is to increase the number of model calls per request.&lt;/p&gt;

&lt;p&gt;A simple system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input → LLM → Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A production system often becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input → Planner → Tool → Re-ask → Summarize → Final Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One request can easily turn into 5-10 model calls.&lt;/p&gt;

&lt;p&gt;Each additional step introduces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More tokens&lt;/li&gt;
&lt;li&gt;More latency&lt;/li&gt;
&lt;li&gt;More failure points&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key issue is not just cost-it’s &lt;em&gt;unbounded execution&lt;/em&gt;.&lt;br&gt;&lt;br&gt;
Multi-step workflows make sense when the problem genuinely requires decomposition-autonomous agents, tool orchestration, or complex reasoning chains. But for most use cases, a well-structured prompt with clear instructions can achieve the same outcome in a single call, with far lower cost and complexity.&lt;br&gt;&lt;br&gt;
A customer support classifier, for instance, doesn’t need a planner-a single prompt with few-shot examples handles intent detection reliably. Reserve orchestration for tasks where intermediate tool results actually change the next step.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision 2: Model Selection – Capability vs Cost Efficiency
&lt;/h2&gt;

&lt;p&gt;Model choice has a direct and often underestimated cost impact.&lt;br&gt;&lt;br&gt;
Many teams default to a high-capability model for all requests, even when unnecessary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real Pricing Difference (Illustrative)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GPT-4o → higher reasoning capability, higher cost&lt;/li&gt;
&lt;li&gt;GPT-4o-mini → significantly cheaper, lower latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, you should also review Microsoft’s official &lt;strong&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/pricing" rel="noopener noreferrer"&gt;Azure OpenAI pricing&lt;/a&gt;&lt;/strong&gt; to understand model cost differences.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-4o-mini can be &lt;strong&gt;5-10× cheaper per token&lt;/strong&gt; than GPT-4o&lt;/li&gt;
&lt;li&gt;For classification, routing, or formatting tasks, the quality difference is often negligible&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Practical Routing Pattern
&lt;/h3&gt;

&lt;p&gt;Instead of sending everything to a large model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a lightweight model to classify intent&lt;/li&gt;
&lt;li&gt;Route only complex tasks to a higher-capability model
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classification&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="n"&gt;gpt&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;mini&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="n"&gt;gpt&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In high-traffic systems, even shifting 30-40% of requests to smaller models can significantly reduce total cost while improving latency.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision 3: Token Budgeting – Input Size Is the Hidden Multiplier
&lt;/h2&gt;

&lt;p&gt;Most cost does not come from output tokens. It comes from input size.&lt;br&gt;&lt;br&gt;
Common production issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sending full conversation history every time&lt;/li&gt;
&lt;li&gt;Including irrelevant system prompts&lt;/li&gt;
&lt;li&gt;Passing entire documents instead of filtered chunks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Practical Optimization Techniques
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Trim conversation windows (last N turns only)&lt;/li&gt;
&lt;li&gt;Use embeddings to retrieve relevant context&lt;/li&gt;
&lt;li&gt;Summarize long histories before reuse&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of passing a full document, embed it into a vector store and retrieve only the top 2-3 relevant chunks at query time-often under 500 tokens total. This reduces input size without sacrificing answer quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example Impact
&lt;/h3&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;5,000 tokens per request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reduce to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1,000 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At scale, this can translate into a 60-80% reduction in token-related cost for that workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision 4: Caching – Avoid Paying Twice for the Same Work
&lt;/h2&gt;

&lt;p&gt;A surprising amount of LLM traffic is repetitive.&lt;br&gt;&lt;br&gt;
Without caching, you pay for the same computation repeatedly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two Types of Caching
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Exact Match Caching&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same input → same output&lt;/li&gt;
&lt;li&gt;Simple and fast&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Semantic Caching&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Similar inputs → reused responses&lt;/li&gt;
&lt;li&gt;Uses embeddings to detect similarity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“What is my refund status?”&lt;/li&gt;
&lt;li&gt;“Can you check my refund?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These queries can map to the same cached response.&lt;/p&gt;

&lt;h3&gt;
  
  
  Azure Implementation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Azure Cache for Redis for low-latency storage&lt;/li&gt;
&lt;li&gt;Embedding similarity search for semantic matching
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Caching reduces repeated model calls without affecting output quality. The main tradeoff is maintaining cache freshness, especially when underlying data changes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision 5: Retry and Loop Control – The Silent Cost Multiplier
&lt;/h2&gt;

&lt;p&gt;Retries are necessary in distributed systems-but dangerous in LLM workflows, especially when dealing with &lt;a href="https://pratikpathak.com/azure-openai-rate-limits-guide/" rel="noopener noreferrer"&gt;Azure OpenAI Rate Limits Guide.&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Real Scenario
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;API returns error&lt;/li&gt;
&lt;li&gt;System retries&lt;/li&gt;
&lt;li&gt;Model re-plans&lt;/li&gt;
&lt;li&gt;Same failure repeats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;1 request → 3 retries → 4× cost&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Causes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;429 rate limit errors&lt;/li&gt;
&lt;li&gt;Transient API failures&lt;/li&gt;
&lt;li&gt;Unbounded agent loops&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example: Exponential Backoff
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Control Mechanisms
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Max retry limits&lt;/li&gt;
&lt;li&gt;Exponential backoff&lt;/li&gt;
&lt;li&gt;Failure classification (retry vs stop)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For agent-based systems, also add a hard step limit-if the agent hasn’t resolved the task within N iterations, surface a fallback response rather than continuing indefinitely.&lt;br&gt;&lt;br&gt;
Without explicit controls, retries silently multiply both cost and latency.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision 6: Observability – You Can’t Optimize What You Can’t See
&lt;/h2&gt;

&lt;p&gt;Most teams track total cost.&lt;br&gt;&lt;br&gt;
That’s not enough.&lt;br&gt;&lt;br&gt;
You need visibility into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost per request&lt;/li&gt;
&lt;li&gt;Tokens per feature&lt;/li&gt;
&lt;li&gt;Model usage distribution&lt;/li&gt;
&lt;li&gt;Retry frequency&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Minimal Trace Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;trace&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"feature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"support_agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tokens_input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tokens_output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cost"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.02&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Implementation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Application Insights for logging&lt;/li&gt;
&lt;li&gt;Custom dashboards for aggregation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Set cost alert thresholds in Azure Cost Management to notify your team when daily or hourly spend exceeds a defined limit. This helps catch runaway loops before they become expensive surprises.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision 7: System Design – Cost as a First-Class Constraint
&lt;/h2&gt;

&lt;p&gt;Cost should not be optimized after deployment. It should shape architecture from the start.&lt;/p&gt;

&lt;h3&gt;
  
  
  Concrete Example
&lt;/h3&gt;

&lt;p&gt;Assume:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Avg request = $0.02&lt;/li&gt;
&lt;li&gt;Daily requests = 50,000
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Daily cost = $1,000  
Monthly ≈ $30,000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now apply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;30% token reduction&lt;/li&gt;
&lt;li&gt;20% cache hit rate
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;New daily cost ≈ $560
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Compounding Effect
&lt;/h3&gt;

&lt;p&gt;Small improvements at each layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model routing&lt;/li&gt;
&lt;li&gt;Token trimming&lt;/li&gt;
&lt;li&gt;Caching&lt;/li&gt;
&lt;li&gt;Retry control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together can reduce cost by &lt;strong&gt;40-70%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A system that costs $30,000/month at launch can realistically operate at $10,000-$18,000 with these controls in place-not through a single optimization, but through compounding small decisions across every layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Azure OpenAI Cost Optimization Matters Most
&lt;/h2&gt;

&lt;p&gt;Focus on optimization when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic is scaling – small inefficiencies multiply quickly at volume&lt;/li&gt;
&lt;li&gt;Multi-step workflows are introduced – each layer increases call depth&lt;/li&gt;
&lt;li&gt;Costs are unpredictable – a sign of uncontrolled execution paths&lt;/li&gt;
&lt;li&gt;Multiple teams share infrastructure – shared systems amplify waste&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Avoid over-optimizing when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are still experimenting – premature optimization slows iteration&lt;/li&gt;
&lt;li&gt;Usage is low – cost signals are not yet meaningful&lt;/li&gt;
&lt;li&gt;System behavior is unstable – fix correctness before efficiency&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Azure OpenAI cost optimization is not about reducing tokens in isolation.&lt;br&gt;&lt;br&gt;
It is about controlling system behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How often models are called&lt;/li&gt;
&lt;li&gt;How much context is passed&lt;/li&gt;
&lt;li&gt;How retries are handled&lt;/li&gt;
&lt;li&gt;How work is reused&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tradeoff is clear:&lt;br&gt;&lt;br&gt;
You can build flexible systems that do everything…&lt;br&gt;&lt;br&gt;
or controlled systems that do only what is necessary.&lt;br&gt;&lt;br&gt;
The systems that scale sustainably are not the ones that generate the most intelligence.&lt;br&gt;&lt;br&gt;
They are the ones that generate it efficiently.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the biggest cost driver in Azure OpenAI systems?
&lt;/h3&gt;

&lt;p&gt;The number of model calls per request. Multi-step workflows and retries can multiply costs quickly.  &lt;/p&gt;

&lt;h3&gt;
  
  
  How can I reduce token usage effectively?
&lt;/h3&gt;

&lt;p&gt;Trim conversation history, retrieve only relevant data using embeddings, and summarize long inputs before sending them to the model.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Should I always use the most advanced model?
&lt;/h3&gt;

&lt;p&gt;No. Use smaller models for simple tasks and reserve advanced models for complex reasoning.  &lt;/p&gt;

&lt;h3&gt;
  
  
  How does semantic caching reduce cost?
&lt;/h3&gt;

&lt;p&gt;Semantic caching reuses responses for similar queries using embeddings, reducing repeated model calls even when inputs are not identical.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Why do retries increase cost so much?
&lt;/h3&gt;

&lt;p&gt;Each retry often triggers a full model call. Without limits, retries multiply both token usage and API costs.  &lt;/p&gt;

&lt;h3&gt;
  
  
  When should I start optimizing costs?
&lt;/h3&gt;

&lt;p&gt;Once your system reaches production scale or costs become unpredictable, optimization should be treated as a core architectural concern.  &lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between exact match and semantic caching?
&lt;/h3&gt;

&lt;p&gt;Exact match requires identical inputs. Semantic caching uses embedding similarity to reuse responses for queries that are phrased differently but mean the same thing-making it far more effective in real user traffic.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>azure</category>
      <category>intelligence</category>
      <category>python</category>
    </item>
    <item>
      <title>Do you know Gemini Chrome Skills? A single line makes browser your AI Agent.</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Sun, 19 Apr 2026 04:30:00 +0000</pubDate>
      <link>https://dev.to/pratikpathak/do-you-know-gemini-chrome-skills-a-single-line-makes-browser-your-ai-agent-3e1o</link>
      <guid>https://dev.to/pratikpathak/do-you-know-gemini-chrome-skills-a-single-line-makes-browser-your-ai-agent-3e1o</guid>
      <description>&lt;p&gt;If you want to know how to master &lt;strong&gt;Gemini Chrome skills&lt;/strong&gt; , your life is about to get a lot easier. Google recently started rolling out ‘Skills’ for Gemini directly inside the Chrome browser. This update effectively turns Chrome into a lightweight, personalized AI agent that remembers your favorite workflows and can run them across multiple tabs simultaneously.&lt;/p&gt;

&lt;p&gt;Why does this matter? Instead of treating AI as a basic chatbot, Skills allow you to build repeatable, customized processes for tasks like summarizing long documents, comparing products side-by-side, or analyzing recipes. Let’s break down exactly how to create, use, and master Gemini Chrome Skills.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Gemini Chrome Skills?
&lt;/h2&gt;

&lt;p&gt;At its core, a Skill is simply a saved prompt. Whether it is a highly specific set of instructions for analyzing the ingredients of a skincare product or a prompt to extract action items from a meeting transcript, you can save that exact command to your Chrome profile.&lt;/p&gt;

&lt;p&gt;Instead of manually writing it out every time, you can trigger a saved Skill by typing a forward slash (/) or clicking the plus (+) button in your Gemini chat history. Your saved Skills sync across all desktop versions of Chrome (Mac, Windows, ChromeOS) where you are signed in with your Google account.&lt;/p&gt;

&lt;p&gt;Note: The feature began rolling out in mid-April 2026. Initially, your Chrome browser language must be set to US English to access the Skills interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Create Your Own Custom Skill
&lt;/h2&gt;

&lt;p&gt;Creating a custom workflow is incredibly intuitive. Here is the step-by-step process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the Gemini side panel in Google Chrome.
&lt;/li&gt;
&lt;li&gt;Browse to a webpage you want to analyze (for example, a recipe blog).
&lt;/li&gt;
&lt;li&gt;Type your complex prompt. For instance: ‘Analyze this recipe, identify all ingredients, and suggest high-protein substitutions.’
&lt;/li&gt;
&lt;li&gt;Once Gemini answers, look for the option to save that exact prompt as a Skill from your chat history.
&lt;/li&gt;
&lt;li&gt;Give it a memorable name.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The next time you visit a completely different recipe site, you do not need to retype anything. You just trigger your newly created Skill, and Gemini runs the exact same analysis on the new page.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Magic of Multi-Tab Analysis
&lt;/h2&gt;

&lt;p&gt;The most powerful feature of Gemini Chrome skills is its ability to operate across multiple tabs at the same time. This fundamentally changes how you do research.&lt;/p&gt;

&lt;p&gt;Imagine you are shopping for a new laptop or researching skincare products. You can open three different product pages in three separate tabs. By selecting those tabs and triggering a ‘Product Comparison’ Skill, Gemini will pull data from all three pages simultaneously. It will generate a clean, side-by-side comparison factoring in price points, specs, and user reviews without you ever having to copy and paste text between tabs.&lt;/p&gt;

&lt;p&gt;Pro Tip: Multi-tab Skills work beautifully with Google Drive. You can open a recipe in one tab and your personal grocery list in Google Docs in another, then run a Meal Planner Skill to cross-reference and update your list automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pre-Built Skills Library
&lt;/h2&gt;

&lt;p&gt;If you don’t want to build prompts from scratch, Google included a built-in Skills Library. You can browse ready-made workflows for common tasks and add them to your profile with a single click. Every pre-built Skill is fully editable, so you can tweak the underlying prompt to match your exact preferences.&lt;/p&gt;

&lt;p&gt;Some of the top pre-built Skills include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gift Concierge:&lt;/strong&gt; A smart product comparison tool designed for multi-tab shopping.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Ingredient Decoder:&lt;/strong&gt; Instantly breaks down complex ingredient lists on health or beauty pages, explaining what each component does and highlighting allergens.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Calendar Creator:&lt;/strong&gt; Scans a webpage for event details and formats them for your schedule.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Meal Planner:&lt;/strong&gt; Analyzes recipes and helps build weekly plans and shopping lists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Privacy and Security Measures
&lt;/h2&gt;

&lt;p&gt;Giving an AI agent the ability to run automated workflows across your browser raises valid security questions. Google built confirmation gates into the system to handle this. If a Skill attempts to perform a high-impact action like sending an email or creating an event on your calendar, the system halts and asks for your explicit manual approval before executing the task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Why did I decide to start using these immediately? Because they eliminate the friction of modern AI. We spend too much time engineering the perfect prompt over and over again. By saving these as executable Skills, Gemini transforms Chrome from a simple web viewer into a personalized research assistant.&lt;/p&gt;

&lt;p&gt;Give it a try today, and let’s figure out the most creative ways to automate our daily browsing habits together! For more technical updates on AI and Chrome, you can always check out the official &lt;a href="https://blog.google/products/chrome/" rel="noopener noreferrer"&gt;Google Chrome Blog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related Reading:&lt;/strong&gt; For a deep dive into extending AI agent capabilities natively inside your IDE instead of Chrome, read my guide on &lt;a href="https://pratikpathak.com/how-to-download-vs-code-extensions-vsix-offline/" rel="noopener noreferrer"&gt;VS Code Extensions (VSIX) Offline Downloads&lt;/a&gt;. If you want to compress costs across your entire generative AI tech stack, check out &lt;a href="https://pratikpathak.com/stop-overpaying-for-rag-how-we-cut-azure-openai-costs-by-40-with-one-architecture-tweak/" rel="noopener noreferrer"&gt;How We Cut Azure OpenAI Costs by 40%&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aiagents</category>
      <category>aibrowserassistant</category>
      <category>aiproductivitytools</category>
    </item>
    <item>
      <title>Top 25+ Advanced DSA Projects in C++ with Source Code</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Sat, 18 Apr 2026 14:42:03 +0000</pubDate>
      <link>https://dev.to/pratikpathak/top-25-advanced-dsa-projects-in-c-with-source-code-193n</link>
      <guid>https://dev.to/pratikpathak/top-25-advanced-dsa-projects-in-c-with-source-code-193n</guid>
      <description>&lt;p&gt;When you are serious about mastering Data Structures and Algorithms (DSA), building a high-complexity &lt;strong&gt;DSA project in C++&lt;/strong&gt; is the ultimate test. I wanted to put together a definitive list of advanced C++ projects that don’t just use basic arrays, but actually engineer optimal time and space complexities with professional patterns. Let’s figure this out together.&lt;/p&gt;

&lt;p&gt;Why did I decide to compile this? Because most ‘beginner’ projects don’t teach you how to handle dynamic rehashing, memory coalescing, or thread-safe state. If we really want to get better at C++, we need to build systems that scale. Every project in this collection has been engineered to showcase high-fidelity logic and optimal complexities.&lt;/p&gt;

&lt;h2&gt;Top 25 Advanced DSA Projects in C++&lt;/h2&gt;

&lt;h3&gt;1. Student Records System&lt;/h3&gt;

&lt;p&gt;This project implements a custom hash map with dynamic rehashing and O(1) chaining to handle student records efficiently. You will learn how to manage persistent storage directly via C++ File I/O streams while maintaining rapid lookup times. This is perfect for understanding how databases manage indexing under the hood.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/01-Student-Record-System" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;2. Snake Game Logic Engine&lt;/h3&gt;

&lt;p&gt;A completely decoupled simulation of the classic Snake game using queues and threading. It features thread-safe state management to ensure input doesn’t block the rendering loop, and a scaling difficulty mechanism that tests your ability to handle real-time game loops in C++.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/02-Snake-Game-Logic" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;3. Library Management with AVL Trees&lt;/h3&gt;

&lt;p&gt;Managing inventory requires rapid search and insertion. This project uses self-balancing AVL trees to ensure O(log N) operations. It heavily utilizes smart pointers to prevent memory leaks and supports multi-criteria search for complex queries across the library database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/03-Library-Management-System" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;4. Sudoku Solver Engine&lt;/h3&gt;

&lt;p&gt;Standard backtracking is too slow for complex puzzles. This implementation supercharges the solver using the Minimum Remaining Values (MRV) heuristic and bitmasking optimization. Forward checking prunes the search tree significantly, making this an excellent study in constraint satisfaction problems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/04-Sudoku-Solver" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;5. GPS Navigator (Dijkstra)&lt;/h3&gt;

&lt;p&gt;Pathfinding visualizers are incredibly satisfying to build. This GPS navigator uses Dijkstra’s Algorithm backed by a priority queue to achieve O(E log V) complexity. It reconstructs the shortest path dynamically across named nodes, simulating how Google Maps calculates routes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/05-Dijkstra-Pathfinding-Visualizer" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;6. Huffman Coding Compression&lt;/h3&gt;

&lt;p&gt;File compression is a classic greedy algorithm problem. This engine constructs optimal prefix codes using Huffman Trees. You will learn how to handle full bitstream encoding and decoding in C++, which is much trickier than simple character mapping.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/06-Huffman-Coding-Compression" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;7. File System Simulator&lt;/h3&gt;

&lt;p&gt;Navigating nested directories is essentially traversing an N-ary tree. This project simulates a Unix-like file system using Tries and Trees. It supports recursive path navigation, metadata tracking, and full CRUD operations on simulated files in memory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/07-File-System-Simulator" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;8. Bank Core Management&lt;/h3&gt;

&lt;p&gt;A deep dive into object-oriented programming (OOP) and hashing. This bank core system handles the full transaction lifecycle, simulates savings interest accumulation over time, and provides an audited statement history. It is a fantastic practice for writing robust, enterprise-like logic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/08-Bank-Management-System" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;9. Social Graph Analysis&lt;/h3&gt;

&lt;p&gt;How does LinkedIn know you are 2nd-degree connections? This project uses Breadth-First Search (BFS) on graphs to calculate influence centrality and degrees of separation. It can even generate mutual friend recommendations based on network topology.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/09-Social-Network-Analysis" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;10. Text Editor Engine&lt;/h3&gt;

&lt;p&gt;Implementing an editor requires instantaneous edits. By combining Stacks and Linked Lists, this engine achieves O(1) text modifications. It also implements the Command Pattern to support multi-level undo and redo functionalities, a must-have for modern UI apps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/10-Text-Editor-Engine" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;11. Search Engine Indexer&lt;/h3&gt;

&lt;p&gt;Search engines don’t scan documents line-by-line; they use inverted indexes. This project builds a case-insensitive Trie to map words to document IDs. It tracks word frequency and allows for blazing-fast multi-document queries across large datasets.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/11-Search-Engine-Indexer" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;12. Stock Span Analyzer&lt;/h3&gt;

&lt;p&gt;Financial algorithms require speed. Using monotonic stacks, this stock span analyzer processes historical price data in linear O(N) time. It identifies buy/sell signals and calculates moving metrics without nested loops ruining performance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/12-Stock-Span-Analyzer" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;13. LRU Cache Implementation&lt;/h3&gt;

&lt;p&gt;The Least Recently Used (LRU) Cache is a classic interview question. This hybrid Hash-List architecture ensures O(1) reads and writes. I included template generic support so you can cache any data type, along with eviction analytics to monitor the hit rate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/13-LRU-Cache-Implementation" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;14. Expression Tree Evaluator&lt;/h3&gt;

&lt;p&gt;Parsing mathematical expressions requires an understanding of precedence. This uses the Shunting-yard algorithm to convert infix expressions to postfix, then builds a Binary Expression Tree for recursive numerical evaluation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/14-Expression-Tree-Evaluator" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;15. Contact Book with Trie&lt;/h3&gt;

&lt;p&gt;When you type a name into your phone, it instantly suggests contacts. That is a Prefix Tree (Trie) in action. This contact book features case-insensitive prefix-based autocomplete and attaches metadata grouping to each complete node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/15-Contact-Book-with-Trie" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;16. Chess Move Validator&lt;/h3&gt;

&lt;p&gt;A heavily OOP-focused project utilizing polymorphism. The validator ensures pieces follow specific movement rules and implements recursive path-clearing checks to guarantee knights jump and rooks are blocked by pawns.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/16-Chess-Move-Validator" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;17. A-Star Pathfinder&lt;/h3&gt;

&lt;p&gt;Unlike Dijkstra, A* uses heuristics to guess the direction of the target. This pathfinder calculates Euclidean distances on a 2D obstacle grid, allowing for optimized 8-way movement. It is the foundation of AI navigation in video games.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/17-A-Star-Pathfinder" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;18. N-Queens Visualizer&lt;/h3&gt;

&lt;p&gt;The N-Queens problem is the ultimate test of Backtracking. This visualizer performs an exhaustive multi-solution search on the board while tracking performance metrics to see how fast your CPU can prune invalid branches.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/18-N-Queens-Visualizer" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;19. Inventory Management System&lt;/h3&gt;

&lt;p&gt;To keep the most critical items at the top, this system is built on Max-Heaps. It features dynamic restock alerting when thresholds are breached and guarantees O(1) lookups for the highest priority inventory items.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/19-Inventory-Management-System" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;20. Transit Simulator&lt;/h3&gt;

&lt;p&gt;If you need the shortest path from every city to every other city, Dijkstra is too slow. This uses the Floyd-Warshall algorithm to generate an All-Pairs Path Matrix, handling named city mapping and infinite distance disconnected nodes gracefully.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/20-Shortest-Path-in-Cities" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;21. OS Task Scheduler&lt;/h3&gt;

&lt;p&gt;Operating systems juggle thousands of processes. This Priority Queue implementation simulates multi-criteria scheduling. It balances First-Come-First-Serve (FCFS) arrivals with critical system categorizations to avoid process starvation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/21-Task-Scheduler" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;22. Autocomplete Engine&lt;/h3&gt;

&lt;p&gt;Standard Tries don’t know what you want to type the most. This advanced engine combines a Trie with DFS weighting. It ranks suggestions based on frequency tracking, making sure the most commonly searched terms appear first.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/22-Autocomplete-System" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;23. Packet Routing Simulator&lt;/h3&gt;

&lt;p&gt;Networking is just massive Graph Theory. This simulator maps out network topologies and calculates latency costs using an OSPF Dijkstra simulation. It dynamically adjusts paths if a ‘router’ node goes down.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/23-Packet-Routing-Simulator" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;24. Event Planner and Calendar&lt;/h3&gt;

&lt;p&gt;Using Red-Black Trees, this calendar application guarantees perfectly balanced insertions. It provides rapid range-based date searches, automatic scheduling conflict detection, and priority sorting for overlapping events.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/24-Calendar-and-Event-Planner" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;25. Cipher Encryption System&lt;/h3&gt;

&lt;p&gt;Cryptography relies heavily on bitwise operations. This project builds a multi-layer XOR and transposition cipher. It also generates data integrity checksums to ensure the payload hasn’t been tampered with during transit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/25-Encryption-Decryption-System" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;26. Memory Kernel Allocator&lt;/h3&gt;

&lt;p&gt;Writing malloc from scratch. This linked-list-based memory kernel simulates heap allocation. It uses a Best-Fit policy to find available memory chunks, coalesces free blocks to prevent fragmentation, and handles dynamic block splitting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/Top-25-DSA-Projects-CPP/tree/main/26-Memory-Allocator-Simulator" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;Wrapping Up&lt;/h3&gt;

&lt;p&gt;Building these projects is the fastest way to move from theoretical DSA knowledge to practical engineering. Which one will you tackle first? Dive into the repository and let me know!&lt;/p&gt;

</description>
      <category>azure</category>
      <category>astaralgorithm</category>
      <category>advanceddsa</category>
      <category>algorithms</category>
    </item>
    <item>
      <title>I just solved the rate throttling issue by changing one line</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Fri, 17 Apr 2026 04:30:00 +0000</pubDate>
      <link>https://dev.to/pratikpathak/azure-openai-rate-limits-guide-how-to-prevent-throttling-in-production-ai-systems-2527</link>
      <guid>https://dev.to/pratikpathak/azure-openai-rate-limits-guide-how-to-prevent-throttling-in-production-ai-systems-2527</guid>
      <description>&lt;p&gt;Azure OpenAI rate limits become a real concern the moment an AI application moves from development into production. During early testing, everything usually works perfectly. A developer sends prompts to the model, receives responses instantly, and the system behaves exactly as expected. Then, real users arrive.&lt;/p&gt;

&lt;p&gt;Multiple requests begin hitting the API simultaneously. Prompt sizes grow as applications include conversation history, system instructions, and retrieved documents. Suddenly, responses start failing with 429 errors. The model itself isn’t failing. The system is hitting rate limits. Small development workloads rarely exceed quotas, but real applications quickly reach limits on tokens per minute (TPM) or requests per minute (RPM).&lt;/p&gt;

&lt;p&gt;Without the right architecture, throttling can create cascading problems like retry storms, backed-up queues, and massive latency spikes across your entire distributed system.&lt;/p&gt;

&lt;p&gt;Understanding how Azure OpenAI rate limits work, and designing systems around them, is absolutely essential for building reliable, production-grade AI applications.&lt;/p&gt;

&lt;h2&gt;Understanding Azure OpenAI Rate Limits&lt;/h2&gt;

&lt;p&gt;Azure OpenAI controls throughput using two primary quotas: Requests per minute (RPM) and Tokens per minute (TPM). These limits protect the platform from overload and ensure fair resource usage across all enterprise customers.&lt;/p&gt;

&lt;h3&gt;RPM vs TPM&lt;/h3&gt;

&lt;p&gt;RPM limits how many API requests your application can send each minute, while TPM limits the total tokens processed per minute, including both input tokens and output tokens. For example, if you send 10 requests per minute and each request uses 2000 tokens, your total usage is 20,000 TPM. Even if request limits are not exceeded, the system can still throttle traffic if TPM limits are reached.&lt;/p&gt;

&lt;p&gt;In Azure OpenAI, RPM is effectively derived from TPM capacity. A typical ratio is 1000 TPM to roughly 6 RPM. This means applications with large prompts may hit TPM limits long before reaching RPM limits!&lt;/p&gt;

&lt;h2&gt;Regional Quotas and Deployment Allocation&lt;/h2&gt;

&lt;p&gt;Azure OpenAI quotas are allocated per subscription, region, and model. For example, you might have a GPT-4 deployment in East US and a GPT-3.5 deployment in West Europe. Each deployment has independent rate limits, allowing organizations to distribute traffic across multiple regions. This is a common and highly recommended scaling strategy in production AI systems.&lt;/p&gt;

&lt;p&gt;Furthermore, Azure assigns a quota pool per model per region. If you have 240,000 TPM available for GPT-4, you can distribute it across deployments. You could have one deployment with the full 240k TPM, or two deployments with 120k TPM each. This allows teams to precisely balance throughput across different environments or workloads.&lt;/p&gt;

&lt;h2&gt;Handling Rate Limits with Exponential Backoff&lt;/h2&gt;

&lt;p&gt;When quotas are exceeded, Azure returns a 429 Too Many Requests response. This indicates the service is protecting its throughput capacity. Production systems must be designed to handle these responses gracefully using exponential backoff.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import time
import openai

MAX_RETRIES = 5

for attempt in range(MAX_RETRIES):
    try:
        response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )
        break
    except openai.RateLimitError as e:
        if attempt == MAX_RETRIES - 1:
            raise e
        time.sleep(2 ** attempt)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This approach prevents retry storms while still allowing temporary spikes to recover smoothly. Immediate retries worsen throttling, whereas exponential backoff gradually increases wait times (e.g., 2 seconds, 4 seconds, 8 seconds).&lt;/p&gt;

&lt;h2&gt;Strategies to Prevent Throttling&lt;/h2&gt;

&lt;p&gt;Production AI systems require architectural strategies that respect API quotas. Here are the top three approaches to ensure your agents never get stuck:&lt;/p&gt;

&lt;h3&gt;1. Token Optimization&lt;/h3&gt;

&lt;p&gt;Reducing token usage often yields the biggest scalability improvements. Common techniques include summarizing conversation history, limiting retrieved documents, compressing prompts, and removing redundant system instructions. Dropping a prompt from 4000 tokens to an 800-token summary allows significantly more requests within your TPM limits.&lt;/p&gt;

&lt;h3&gt;2. Queue-Based Architectures&lt;/h3&gt;

&lt;p&gt;High-traffic AI systems often rely on asynchronous processing. By introducing a message queue (like Azure Service Bus or Azure Queue Storage) between your API Gateway and your Worker Service, you can smooth traffic spikes. The queue prevents sudden bursts from overwhelming rate limits. While this introduces slight latency, the trade-off dramatically improves system reliability.&lt;/p&gt;

&lt;h3&gt;3. Monitor Token Usage and Telemetry&lt;/h3&gt;

&lt;p&gt;Managing rate limits effectively requires constant monitoring. You should track tokens per request, requests per minute, API latency, and throttling errors using tools like Azure Monitor and Application Insights. Here is a simple logging implementation:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import logging

logger.info(
    "openai_request",
    extra={
        "input_tokens": input_tokens,
        "output_tokens": output_tokens
    }
)&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Final Thoughts on Production AI&lt;/h2&gt;

&lt;p&gt;Rate limits are not an error condition – they are an architectural constraint. Systems designed without considering quotas often work during development but fail under production traffic. Most AI systems evolve from direct API calls at low traffic, to token optimization at moderate traffic, and finally to queue-based processing and regional scaling at the enterprise scale.&lt;/p&gt;

&lt;p&gt;Designing with rate limits in mind from the beginning ensures your applications remain stable as user demand increases. Let’s build resilient infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Related Reading:&lt;/b&gt; If you want to dive deeper into securing and observing these workloads, check out our recent guides on &lt;a href="https://pratikpathak.com/silent-failures-the-hidden-reason-your-ai-agents-keep-getting-stuck-in-production/" rel="noopener noreferrer"&gt;Observability and Silent Failures&lt;/a&gt; and &lt;a href="https://pratikpathak.com/managing-state-in-multi-agent-workflows-redis-vs-cosmos-db-in-production/" rel="noopener noreferrer"&gt;Managing State with Cosmos DB&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>azure</category>
      <category>developer</category>
    </item>
    <item>
      <title>Stop Hardcoding API Keys in LangChain: Securing AI Agents with Azure Key Vault</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Thu, 16 Apr 2026 04:30:00 +0000</pubDate>
      <link>https://dev.to/pratikpathak/stop-hardcoding-api-keys-in-langchain-securing-ai-agents-with-azure-key-vault-ehl</link>
      <guid>https://dev.to/pratikpathak/stop-hardcoding-api-keys-in-langchain-securing-ai-agents-with-azure-key-vault-ehl</guid>
      <description>&lt;p&gt;I spent an hour reviewing the architecture of a new multi-agent system recently, and one line of code made my stomach drop:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import os
os.environ["OPENAI_API_KEY"] = "sk-..."&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It’s the classic “getting started” tutorial code. The problem? That agent was bound for an enterprise production environment. When developers build with LangChain, Semantic Kernel, or AutoGen, they get so focused on crafting the perfect prompt or building the retrieval pipeline that basic security hygiene goes completely out the window. Hardcoding API keys or dropping them into an unencrypted &lt;code&gt;.env&lt;/code&gt; file in a massive serverless orchestration environment is basically handing the keys to your billing account over to anyone who can peek at your repo or server environment.&lt;/p&gt;

&lt;p&gt;If you are pushing AI agents to production, you need to stop hardcoding keys immediately. Let’s fix this using Azure Key Vault and Managed Identities so your agents stay secure without the hassle of rotating raw keys.&lt;/p&gt;

&lt;h2&gt;The Danger of the .env File in AI Workflows&lt;/h2&gt;

&lt;p&gt;Why is this such a big deal for AI agents compared to standard web apps? Because AI agents are highly autonomous. They make external API calls, scrape the web, and execute arbitrary Python code. If an agent falls victim to a prompt injection attack, a malicious actor might trick the agent into printing its environment variables. Suddenly, your $10,000/month Azure OpenAI provisioned throughput is fully exposed.&lt;/p&gt;

&lt;p&gt;Prompt injection can expose environment variables. If your API key is in memory, it can be leaked by the LLM itself!&lt;/p&gt;

&lt;h2&gt;The Fix: Azure Key Vault + Managed Identity&lt;/h2&gt;

&lt;p&gt;The solution is to decouple the secret from the application entirely. By storing the OpenAI API Key in Azure Key Vault and granting your host (like Azure Container Apps or Azure Functions) a System Assigned Managed Identity, your code authenticates seamlessly to the Vault. No passwords. No connection strings.&lt;/p&gt;

&lt;h3&gt;Step 1: Set Up Azure Key Vault&lt;/h3&gt;

&lt;p&gt;First, create your Key Vault in the Azure Portal and add your OpenAI key as a new Secret. Give it a clear name like &lt;code&gt;AzureOpenAIKey-Prod&lt;/code&gt;. Next, go to your compute resource (where your agent runs), enable “System assigned identity”, and grant that identity “Key Vault Secrets User” access to your vault.&lt;/p&gt;

&lt;h3&gt;Step 2: Modify the Python Agent&lt;/h3&gt;

&lt;p&gt;Now, we rip out the hardcoded &lt;code&gt;os.environ&lt;/code&gt; logic and replace it with the &lt;code&gt;azure-identity&lt;/code&gt; and &lt;code&gt;azure-keyvault-secrets&lt;/code&gt; Python SDKs.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
from langchain.chat_models import AzureChatOpenAI

# 1. Authenticate using the machine's managed identity (NO PASSWORDS!)
credential = DefaultAzureCredential()

# 2. Connect to the Vault
vault_url = "https://your-vault-name.vault.azure.net/"
client = SecretClient(vault_url=vault_url, credential=credential)

# 3. Retrieve the secret dynamically at runtime
openai_secret = client.get_secret("AzureOpenAIKey-Prod")

# 4. Initialize LangChain
llm = AzureChatOpenAI(
    openai_api_key=openai_secret.value,
    azure_endpoint="https://your-endpoint.openai.azure.com/",
    openai_api_version="2023-05-15",
    deployment_name="gpt-4"
)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Notice how clean this is? &lt;code&gt;DefaultAzureCredential()&lt;/code&gt; automatically detects that it is running in Azure and uses the machine’s identity to fetch the key. If this code leaks to GitHub, the attacker gets absolutely nothing. If they run it on their local machine, it crashes because they lack your Azure identity context.&lt;/p&gt;

&lt;h2&gt;Why This Matters for Enterprise Agents&lt;/h2&gt;

&lt;p&gt;By integrating Azure Key Vault, you also unlock secret rotation. If your OpenAI key is compromised or expires, you change it in exactly one place—the Key Vault—and all of your deployed agents instantly pick up the new key on their next execution without a single line of code changing.&lt;/p&gt;

&lt;p&gt;Stop copying and pasting tutorials straight into production. AI security starts at the infrastructure level.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Related Reading:&lt;/b&gt; To see how else infrastructure can save your AI agents from catastrophic failures, check out &lt;a href="https://pratikpathak.com/silent-failures-the-hidden-reason-your-ai-agents-keep-getting-stuck-in-production/" rel="noopener noreferrer"&gt;Silent Failures: The Hidden Reason Your AI Agents Keep Getting Stuck&lt;/a&gt; and learn why &lt;a href="https://pratikpathak.com/your-ai-agent-is-leaking-data-heres-how-azure-ad-b2c-plugs-the-hole-in-5-minutes/" rel="noopener noreferrer"&gt;Your AI Agent is Leaking Data without Azure AD B2C&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak" rel="noopener noreferrer"&gt;View More Azure Security Tips&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>errorwithlangchainch</category>
      <category>langchainpythontutor</category>
      <category>aiagents</category>
    </item>
    <item>
      <title>Finally: Setting Up a Local, Offline AI Coding Assistant in VS Code</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Wed, 15 Apr 2026 13:46:40 +0000</pubDate>
      <link>https://dev.to/pratikpathak/finally-setting-up-a-local-offline-ai-coding-assistant-in-vs-code-2gkd</link>
      <guid>https://dev.to/pratikpathak/finally-setting-up-a-local-offline-ai-coding-assistant-in-vs-code-2gkd</guid>
      <description>&lt;p&gt;I’ve been experimenting with AI coding assistants for a long time, but let’s be honest: the subscriptions add up, and there is always that nagging feeling in the back of your mind. Every single snippet of code you write is being beamed to a remote server you have zero control over. For personal side projects, maybe that’s fine. But for anything professional, sensitive, or enterprise-grade? That’s a complete dealbreaker.&lt;/p&gt;

&lt;p&gt;I finally decided to stop paying for cloud-based AI tools and build my own local coding assistant right inside my editor. The goal was simple: it had to run entirely offline, it had to live inside VS Code, and it had to cost exactly $0 to maintain. After a ton of trial and error, I found the “gold” stack that actually works without friction.&lt;/p&gt;

&lt;p&gt;By shifting your AI assistant to your local machine, you guarantee 100% data privacy. No code ever leaves your hard drive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Local AI Stack: Ollama + Continue + Qwen
&lt;/h2&gt;

&lt;p&gt;The entire setup relies on three specific tools that click together perfectly. No API keys, no monthly billing, and no complex networking required.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Ollama (The Engine)
&lt;/h3&gt;

&lt;p&gt;If you aren’t using Ollama yet, you are missing out. It is a lightweight local runtime that abstracts away all the nightmare-inducing complexities of running Large Language Models (LLMs) locally. It handles hardware compatibility, VRAM allocation, and spinning up a local API on your machine. You just download the executable, run it, and you are good to go.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpratikpathak.com%2Fwp-content%2Fuploads%2F2026%2F04%2Fimage-1-aa2a4e3c-d7ef-4903-a0f6-e72df9b92ec1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpratikpathak.com%2Fwp-content%2Fuploads%2F2026%2F04%2Fimage-1-aa2a4e3c-d7ef-4903-a0f6-e72df9b92ec1.jpg" title="Finally: Setting Up a Local, Offline AI Coding Assistant in VS Code 1" alt="Ollama Local LLM Terminal" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Continue.dev (The VS Code Extension)
&lt;/h3&gt;

&lt;p&gt;Continue.dev is the bridge between your local AI engine and your editor. It is an open-source VS Code extension that hooks directly into Ollama. It gives you all the premium features you’d expect from a paid tool: inline autocomplete, a docked chat panel, code refactoring, and quick explanations—all natively inside VS Code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpratikpathak.com%2Fwp-content%2Fuploads%2F2026%2F04%2Fimage-1-a8617e4a-3cd3-4fd5-955a-71fbe53f70e4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpratikpathak.com%2Fwp-content%2Fuploads%2F2026%2F04%2Fimage-1-a8617e4a-3cd3-4fd5-955a-71fbe53f70e4.jpg" title="Finally: Setting Up a Local, Offline AI Coding Assistant in VS Code 2" alt="VS Code Continue.dev Extension" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Qwen2.5-Coder (The Brains)
&lt;/h3&gt;

&lt;p&gt;This is where the magic happens. You don’t need a monster $2000 GPU to run capable AI models anymore. I found the absolute sweet spot with Alibaba’s &lt;strong&gt;Qwen2.5-Coder&lt;/strong&gt;. The 7B version runs incredibly fast on just 8GB of VRAM, and the 32B model rivals GPT-4o on code generation benchmarks. It supports over 90 programming languages and is terrifyingly good at fixing broken syntax.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: If you have 16GB+ of VRAM, you should also experiment with DeepSeek-Coder-V2. Its reasoning capabilities are insane for building projects from scratch.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Set It Up in 3 Steps
&lt;/h2&gt;

&lt;p&gt;The setup is surprisingly painless. You can go from zero to a fully functional AI assistant in a few minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Download the Model&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Install Ollama, open your terminal, and run the following command to pull the model and spin up the local server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run qwen2.5-coder:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Install Continue.dev&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Open VS Code, head to the Extensions marketplace, and search for “Continue”. Install it and wait for the logo to appear in your sidebar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Configure the Extension&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Click the gear icon inside the Continue extension panel to open its &lt;code&gt;config.json&lt;/code&gt;. You just need to point it to your local Ollama instance using this simple configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Qwen2.5-Coder"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen2.5-coder:7b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://127.0.0.1:11434"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tabAutocompleteModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Qwen2.5-Coder"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen2.5-coder:7b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://127.0.0.1:11434"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Final Thoughts: Better Flow, Zero Subscriptions
&lt;/h2&gt;

&lt;p&gt;The difference is tangible. Because everything runs locally, there is absolutely zero latency waiting for a remote server to respond. The autocomplete feels snappier than any cloud tool I’ve used, and I don’t have to constantly break my flow state to context-switch into a browser window.&lt;/p&gt;

&lt;p&gt;Most importantly, I have total peace of mind. Whether I’m auditing a client’s infrastructure or drafting experimental backend architecture, my code stays on my machine. If you’re tired of paying monthly fees or compromising on privacy, it’s time to bring your AI home.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related Reading:&lt;/strong&gt; Once you have your local environment set up, you might want to start building your own autonomous systems. Check out my guide on &lt;a href="https://pratikpathak.com/building-a-proactive-web-scraping-agent-with-python-firecrawl-and-azure-openai/" rel="noopener noreferrer"&gt;Building a Proactive Web-Scraping Agent&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak" rel="noopener noreferrer"&gt;Explore More Open Source AI&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>python</category>
      <category>5githubhacks</category>
      <category>aiagentarchitecture</category>
    </item>
    <item>
      <title>The 3 Lines of Python Code That Fixed My AI Agent’s Hallucinations</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Wed, 15 Apr 2026 05:52:47 +0000</pubDate>
      <link>https://dev.to/pratikpathak/the-3-lines-of-python-code-that-fixed-my-ai-agents-hallucinations-446f</link>
      <guid>https://dev.to/pratikpathak/the-3-lines-of-python-code-that-fixed-my-ai-agents-hallucinations-446f</guid>
      <description>&lt;h2&gt;The Fallacy of Prompt Engineering&lt;/h2&gt;

&lt;p&gt;There is a widespread misconception in the AI engineering community that hallucinations can be solved with better words. Developers spend hours appending phrases like &lt;em&gt;“Think step-by-step,” “You are a helpful expert,” “Output strictly in JSON,”&lt;/em&gt; and &lt;em&gt;“Do not lie under any circumstances”&lt;/em&gt; to their system prompts.&lt;/p&gt;

&lt;p&gt;No amount of prompt engineering can completely eradicate LLM hallucinations in a production agentic system. The fundamental flaw is treating the LLM as a magical black box that will always output valid, parseable text.&lt;/p&gt;

&lt;h2&gt;The Architectural Challenge: Parsing Unpredictable Text&lt;/h2&gt;

&lt;p&gt;When an LLM generates a response, standard systems attempt to parse it using regular expressions, string splitting, or loose &lt;code&gt;json.loads()&lt;/code&gt; wrappers. If the model hallucinates an extra sentence, forgets a trailing comma, or decides to wrap its JSON in markdown backticks (&lt;code&gt;

```

json&lt;/code&gt;), your downstream Python logic crashes.&lt;/p&gt;

&lt;p&gt;In multi-agent systems, hallucinations aren’t just factual errors (like stating the wrong capital of a country); they are &lt;em&gt;structural&lt;/em&gt; errors. If the Routing Agent hallucinates an invalid route name, the entire orchestration graph fails.&lt;/p&gt;

&lt;h2&gt;The Fix: Strict Pydantic Enforcement&lt;/h2&gt;

&lt;p&gt;The secret to reliable AI agents is removing the LLM’s ability to be creative with its output format. By using Pydantic models combined with modern Structured Output APIs (introduced by OpenAI in late 2024), you can force the model at the API-level to conform strictly to a predefined JSON schema.&lt;/p&gt;

&lt;h3&gt;The 3 Lines of Code&lt;/h3&gt;

&lt;p&gt;Using LangChain’s wrapper around OpenAI’s structured outputs, we can bind a rigid Python class to the generation pipeline.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

# 1. Define the absolute constraints of the Agent's decision
class AgentDecision(BaseModel):
    confidence_score: float = Field(ge=0.0, le=1.0, description="Confidence metric")
    action: str = Field(description="Strict enum routing", enum=["refund", "escalate", "ignore"])
    reasoning: str = Field(description="Brief explanation for audit logs")

llm = ChatOpenAI(model="gpt-4o")

# 2. The Magic Line: Bind the schema to the LLM natively
structured_llm = llm.with_structured_output(AgentDecision)

# 3. Invoke. The output is a guaranteed Pydantic object, not a string!
output = structured_llm.invoke("The customer is furiously demanding their money back for a broken monitor.")

# No parsing, no regex, no crashes. Just raw object properties.
print(f"Action chosen: {output.action} with {output.confidence_score} confidence.")&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Why This Kills Hallucinations&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema Coercion at the Token Level:&lt;/strong&gt; OpenAI’s native structured outputs actually constrain the token generation probabilities on the server side. If the model tries to output an action like “send_email” instead of the allowed “refund”, the API refuses to generate the invalid token.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grounding via Data Types:&lt;/strong&gt; Forcing the LLM to output specific variable types (like floats strictly between 0 and 1) anchors its generation process, leaving less computational room for erratic generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic Routing:&lt;/strong&gt; You can now use standard Python &lt;code&gt;if/elif&lt;/code&gt; statements for your LangGraph edges, completely eliminating string-matching bugs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stop asking models to format things nicely using prompts. Force them using Pydantic.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Related Reading:&lt;/b&gt; We discuss how structured outputs also drastically cut costs in &lt;a href="https://pratikpathak.com/i-saved-80k-tokens-a-day-just-by-changing-how-my-ai-agents-talk-to-each-other/" rel="noopener noreferrer"&gt;I Saved 80k Tokens a Day&lt;/a&gt;, and how to trace these outputs in &lt;a href="https://pratikpathak.com/silent-failures-the-hidden-reason-your-ai-agents-keep-getting-stuck-in-production/" rel="noopener noreferrer"&gt;Silent Failures in Production&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/pratikpathak" rel="noopener noreferrer"&gt;View Source Code on GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>errorwithlangchainch</category>
      <category>pydanticerrorwrapper</category>
      <category>agentframeworks</category>
    </item>
    <item>
      <title>I learned this in 13 Years, Here are my DevOps Tips and Tricks</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Wed, 15 Apr 2026 04:30:00 +0000</pubDate>
      <link>https://dev.to/pratikpathak/i-learned-this-in-13-years-here-are-my-devops-tips-and-tricks-5gh9</link>
      <guid>https://dev.to/pratikpathak/i-learned-this-in-13-years-here-are-my-devops-tips-and-tricks-5gh9</guid>
      <description>&lt;p&gt;If you’re serious about DevOps, there are some GitHub repositories you simply cannot afford to miss.&lt;/p&gt;

&lt;p&gt;After 13 years in the trenches, I have realized that the best resources aren’t necessarily the ones with the most stars, but the ones that teach you how to think, build, and operate systems the right way. I wanted to share this curated list that I’ve personally found invaluable. These repositories act as your comprehensive guide to mastering the DevOps lifecycle, from basic Linux commands to advanced Kubernetes orchestration and AWS cloud architecture.&lt;/p&gt;

&lt;p&gt;Here are 9 must-follow repositories packed with real, hands-on DevOps knowledge that you should bookmark right now.&lt;/p&gt;

&lt;p&gt;Bookmark this page! You will want to refer back to these repositories as you progress through your DevOps journey.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Book of Secret Knowledge
&lt;/h2&gt;

&lt;p&gt;A goldmine of CLI tricks, cheat sheets, and deep DevOps wisdom. This repository is an absolute treasure trove of CLI tools, scripts, and cheat sheets. Whether you are debugging a network issue at 3 AM or looking for the most efficient way to parse logs, this collection of secret knowledge has you covered. It includes tools for OSINT, penetration testing, and general system administration, making it a mandatory bookmark for any serious systems engineer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/trimstray/the-book-of-secret-knowledge" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Ftrimstray%2Fthe-book-of-secret-knowledge" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Ftrimstray%2Fthe-book-of-secret-knowledge" title="I learned this in 13 Years, Here are my DevOps Tips and Tricks 1" alt="The Book of Secret Knowledge Demo" width="1200" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. DevOps Exercises
&lt;/h2&gt;

&lt;p&gt;Hands-on DevOps, Linux, networking &amp;amp; cloud exercises. Theory is great, but practice is what makes you an engineer. This repository is packed with interview questions and practical exercises covering Linux, networking, Python, and cloud infrastructure. It bridges the gap between reading about a concept and actually deploying it, ensuring you are prepared for real-world scenarios and tough technical interviews.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/bregman-arie/devops-exercises" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fbregman-arie%2Fdevops-exercises" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fbregman-arie%2Fdevops-exercises" title="I learned this in 13 Years, Here are my DevOps Tips and Tricks 2" alt="DevOps Exercises Demo" width="1200" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. AWS DevOps Zero to Hero
&lt;/h2&gt;

&lt;p&gt;Complete AWS + DevOps roadmap with projects. Abhishek Veeramalla’s roadmap is widely considered one of the best structured paths for learning AWS from a DevOps perspective. It takes you from the absolute basics of IAM and EC2 all the way to complex CI/CD pipelines, EKS deployments, and infrastructure as code. The project-based approach ensures you aren’t just memorizing AWS services, but actually building interconnected cloud architectures.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/iam-veeramalla/aws-devops-zero-to-hero" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fiam-veeramalla%2Faws-devops-zero-to-hero" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fiam-veeramalla%2Faws-devops-zero-to-hero" title="I learned this in 13 Years, Here are my DevOps Tips and Tricks 3" alt="AWS DevOps Zero to Hero Demo" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Docker Zero to Hero
&lt;/h2&gt;

&lt;p&gt;Learn Docker with practical, beginner-friendly content. Containerization is the bedrock of modern DevOps. This repository breaks down Docker into easily digestible tutorials, starting with simple container runs to advanced Docker Compose setups and multi-stage builds. If you want to understand how to truly isolate environments and optimize your container images for production, this is the place to start.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/iam-veeramalla/Docker-Zero-to-Hero" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fiam-veeramalla%2FDocker-Zero-to-Hero" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fiam-veeramalla%2FDocker-Zero-to-Hero" title="I learned this in 13 Years, Here are my DevOps Tips and Tricks 4" alt="Docker Zero to Hero Demo" width="1200" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Awesome Kubernetes
&lt;/h2&gt;

&lt;p&gt;Everything you need to learn Kubernetes ecosystem. The Kubernetes ecosystem is vast and constantly changing. Trying to navigate it without a map is a recipe for disaster. This curated list acts as your compass, pointing you toward the best operators, monitoring tools, networking plugins, and security scanners available for k8s. It is an indispensable resource when you are tasked with expanding your cluster’s capabilities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ramitsurana/awesome-kubernetes" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Framitsurana%2Fawesome-kubernetes" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Framitsurana%2Fawesome-kubernetes" title="I learned this in 13 Years, Here are my DevOps Tips and Tricks 5" alt="Awesome Kubernetes Demo" width="1200" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Terraform Zero to Hero
&lt;/h2&gt;

&lt;p&gt;Terraform explained with hands-on examples. Infrastructure as Code (IaC) is non-negotiable in modern cloud environments. This hands-on guide to Terraform explains state management, modules, and providers with extreme clarity. By following along with the practical examples, you will learn how to provision, mutate, and destroy infrastructure predictably and safely.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/iam-veeramalla/terraform-zero-to-hero" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fiam-veeramalla%2Fterraform-zero-to-hero" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fiam-veeramalla%2Fterraform-zero-to-hero" title="I learned this in 13 Years, Here are my DevOps Tips and Tricks 6" alt="Terraform Zero to Hero Demo" width="1200" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  7. How They DevOps
&lt;/h2&gt;

&lt;p&gt;Understand how companies implement DevOps in production. Ever wondered how Netflix, Spotify, or Uber actually handle their deployments? This repository aggregates engineering blogs and architectural teardowns from top-tier tech companies. It provides invaluable insight into how massive scale is managed in production, giving you ideas and patterns to adapt for your own organizational challenges.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/bregman-arie/howtheydevops" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fbregman-arie%2Fhowtheydevops" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fbregman-arie%2Fhowtheydevops" title="I learned this in 13 Years, Here are my DevOps Tips and Tricks 7" alt="How They DevOps Demo" width="1200" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  8. System Design Resources
&lt;/h2&gt;

&lt;p&gt;Curated system design interview preparation resources. A DevOps engineer who doesn’t understand system design will struggle to scale applications effectively. This repository offers a curated list of system design resources, architectures, and principles. It helps you think about caching layers, database sharding, load balancing, and overall system resilience-skills that are critical whether you’re interviewing or architecting a new platform.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/InterviewReady/system-design-resources" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2FInterviewReady%2Fsystem-design-resources" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2FInterviewReady%2Fsystem-design-resources" title="I learned this in 13 Years, Here are my DevOps Tips and Tricks 8" alt="System Design Resources Demo" width="1200" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Project Based Learning
&lt;/h2&gt;

&lt;p&gt;Learn by building real-world projects across domains. The fastest way to learn is by doing. This legendary repository aggregates tutorials where you build applications from scratch. From writing your own web server in C to building a bot in Python, stepping outside pure operations and writing code makes you a vastly superior DevOps engineer. It gives you the empathy and knowledge to better support the developers you work with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/practical-tutorials/project-based-learning" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fpractical-tutorials%2Fproject-based-learning" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F1%2Fpractical-tutorials%2Fproject-based-learning" title="I learned this in 13 Years, Here are my DevOps Tips and Tricks 9" alt="Project Based Learning Demo" width="1200" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;These repositories have shaped the way I think about infrastructure and automation over the years. Remember, DevOps is not a specific tool-it is a culture and a mindset. Spend time exploring these resources, try building the projects yourself, and you will undoubtedly level up your engineering career. Let’s keep building and learning together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related Reading:&lt;/strong&gt; If you want to dive deeper into system architectures and state management, check out my recent guide on &lt;a href="https://pratikpathak.com/managing-state-in-multi-agent-workflows-redis-vs-cosmos-db-in-production/" rel="noopener noreferrer"&gt;Managing State: Redis vs Cosmos DB&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>python</category>
      <category>5githubhacks</category>
      <category>addingpersistencetoa</category>
    </item>
    <item>
      <title>VSIX Download: How to Install VS Code Extensions Offline (The Easy Way)</title>
      <dc:creator>Pratik Pathak</dc:creator>
      <pubDate>Mon, 13 Apr 2026 15:05:04 +0000</pubDate>
      <link>https://dev.to/pratikpathak/vsix-download-how-to-install-vs-code-extensions-offline-the-easy-way-1mlf</link>
      <guid>https://dev.to/pratikpathak/vsix-download-how-to-install-vs-code-extensions-offline-the-easy-way-1mlf</guid>
      <description>&lt;p&gt;Have you ever tried to install a VS Code extension on an offline machine, or maybe an enterprise environment that actively blocks the VS Code Marketplace? Yeah, I’ve been there too. The frustration is real. Why did I decide to build a tool for this? Because digging through Microsoft’s undocumented APIs to manually download a &lt;code&gt;.vsix&lt;/code&gt; file, figure out the target platform, and distinguish between stable and pre-release versions is an absolute nightmare. I wanted something simple, fast, and local. Let’s figure this out together.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;VSIX Downloader&lt;/strong&gt; —a completely client-side tool to search and fetch any VS Code extension directly from the Microsoft Marketplace. No backend, no telemetry, just pure downloading power.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://github.pratikpathak.com/vsix-downloader/" rel="noopener noreferrer"&gt;View Live Demo&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why You Need a .vsix Downloader
&lt;/h2&gt;

&lt;p&gt;In the standard VS Code workflow, you just hit “Install” from the extensions tab. But what if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You’re air-gapped and need to transfer extensions via USB.&lt;/li&gt;
&lt;li&gt;Your corporate firewall blocks &lt;code&gt;marketplace.visualstudio.com&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;You want to archive a specific version of an extension before an update breaks it.&lt;/li&gt;
&lt;li&gt;You are building an automated environment setup script.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where downloading the raw &lt;code&gt;.vsix&lt;/code&gt; (Visual Studio Extension) file comes in handy. You can install any &lt;code&gt;.vsix&lt;/code&gt; file offline directly from the VS Code command palette or CLI using &lt;code&gt;code --install-extension my-extension.vsix&lt;/code&gt;. Let me show you how to get these files using the VSIX Downloader Web GUI.&lt;/p&gt;

&lt;p&gt;The web app runs entirely in your browser using the VS Code Marketplace API.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to the &lt;a href="http://github.pratikpathak.com/vsix-downloader/" rel="noopener noreferrer"&gt;live web app&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Type the name of the extension you want (e.g., “GitHub Copilot” or “Python”).&lt;/li&gt;
&lt;li&gt;Hit Fetch. The app queries the marketplace and displays the exact versions available.&lt;/li&gt;
&lt;li&gt;Click Download on the version you need.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpratikpathak.com%2Fwp-content%2Fuploads%2F2026%2F04%2Fvsix-live-screenshot.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpratikpathak.com%2Fwp-content%2Fuploads%2F2026%2F04%2Fvsix-live-screenshot.png" title="VSIX Download: How to Install VS Code Extensions Offline (The Easy Way) 1" alt="VSIX Downloader Web GUI Interface" width="800" height="359"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The sleek, dark-mode Web GUI for VSIX Downloader.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What I love about this is the version matrix. You can easily see which versions are stable and which are pre-release, and download exactly what you need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pro Tip:&lt;/strong&gt; Because the Web GUI is completely static, you can literally clone the repo and run index.html locally without any server!&lt;/p&gt;
&lt;h2&gt;
  
  
  The “Dependency Hell” Problem
&lt;/h2&gt;

&lt;p&gt;When I first started installing extensions offline, I realized many popular extensions don’t work in isolation. You download the Python extension, install it on an air-gapped machine, and suddenly it complains about missing Pylance or Jupyter. That’s because complex extensions have strict dependencies defined in their manifests.&lt;/p&gt;

&lt;p&gt;If you’re operating in a completely disconnected environment, you have to play detective. The easiest way to handle this is to install the extension on a connected machine, open the extension’s folder (usually located in your user directory under the extensions folder), and inspect the package manifest. Once you identify the required dependencies, you can search for them by their exact publisher ID in VSIX Downloader and grab those as well.&lt;/p&gt;

&lt;p&gt;Always check if the extension attempts to download language servers or binaries at runtime. Some extensions require you to download a specific offline build to avoid runtime fetch failures.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Target Platform Trap (Windows vs Linux vs Mac)
&lt;/h2&gt;

&lt;p&gt;A massive pain point I ran into was platform-specific extensions. A few years ago, a single VSIX file worked everywhere. Now, VS Code supports platform-specific builds. If you download the generic version of an extension (like C/C++) on your Windows workstation but try to install it on a remote Alpine Linux Docker container, it will fail silently or throw architecture errors.&lt;/p&gt;

&lt;p&gt;This is exactly why VSIX Downloader explicitly exposes the platform tags. When you search for a tool, you might see matrix options like Windows x64, Linux ARM64, or Alpine Linux. You absolutely must match the target architecture of the machine where the extension will ultimately run, not the machine you are downloading it from.&lt;/p&gt;
&lt;h2&gt;
  
  
  Automating Installations (The CLI Approach)
&lt;/h2&gt;

&lt;p&gt;If you’re provisioning enterprise environments, dragging and dropping files into the VS Code UI is a waste of time. You want to automate everything. Once you’ve downloaded the necessary VSIX files using the downloader, you can script their installation across a fleet of machines or bake them directly into a Docker image.&lt;/p&gt;

&lt;p&gt;The VS Code CLI makes this incredibly straightforward. Here is how I usually approach it in my setup scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install a single downloaded extension&lt;/span&gt;
code &lt;span class="nt"&gt;--install-extension&lt;/span&gt; ./downloaded-extension.vsix &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you have an entire directory of downloaded extensions, you can batch install them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Iterate through a directory of VSIX files and install all of them&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;ext &lt;span class="k"&gt;in&lt;/span&gt; ./extensions/&lt;span class="k"&gt;*&lt;/span&gt;.vsix&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;code &lt;span class="nt"&gt;--install-extension&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ext&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By using the force flag, you ensure that any existing corrupted installations are overwritten cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Version Pinning for Enterprise Stability
&lt;/h2&gt;

&lt;p&gt;Why do enterprise teams actively choose to be disconnected? It’s not always about air-gapping; often, it’s about stability. I’ve had perfectly functioning CI/CD pipelines break overnight simply because a VS Code extension auto-updated and changed its formatting rules.&lt;/p&gt;

&lt;p&gt;Downloading the exact VSIX file gives you the power of version pinning. You can grab a known stable release of an extension, store it in an internal Artifactory or private file share, and distribute it to your team. Everyone gets the exact same tooling environment, guaranteed.&lt;/p&gt;

&lt;p&gt;Just make sure to disable auto-updates in your workspace settings to prevent VS Code from trying to upgrade the pinned extensions the moment it connects to the internet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Whether you’re a developer stuck behind a corporate proxy, a security researcher air-gapping an environment, or an AI builder giving tools to your agents, downloading VS Code extensions shouldn’t be a hassle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zpratikpathak/vsix-downloader" rel="noopener noreferrer"&gt;Check out the GitHub Repo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Try out the web GUI and let me know what you think! Let me know what you think!&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://pratikpathak.com/the-3-lines-of-python-code-that-fixed-my-ai-agents-hallucinations/" rel="noopener noreferrer"&gt;The 3 Lines of Python Code That Fixed My AI Agent’s Hallucinations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pratikpathak.com/top-25-aws-devops-projects-for-practice-github/" rel="noopener noreferrer"&gt;Top 25+ AWS DevOps Projects for Practice on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pratikpathak.com/you-dont-need-a-vector-database-do-this-instead/" rel="noopener noreferrer"&gt;You Don’t Need a Vector Database. Do This Instead.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>azure</category>
      <category>bracketpaircolorizer</category>
      <category>coderunnervsixdownlo</category>
      <category>copilotchatvsixdownl</category>
    </item>
  </channel>
</rss>
