<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bhumika Makker</title>
    <description>The latest articles on DEV Community by Bhumika Makker (@bhumika_makker_e0906f1f4b).</description>
    <link>https://dev.to/bhumika_makker_e0906f1f4b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3871669%2Fc6796aa9-1006-4ada-bffc-246681c3f8e6.png</url>
      <title>DEV Community: Bhumika Makker</title>
      <link>https://dev.to/bhumika_makker_e0906f1f4b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bhumika_makker_e0906f1f4b"/>
    <language>en</language>
    <item>
      <title>Modal — Deep Dive</title>
      <dc:creator>Bhumika Makker</dc:creator>
      <pubDate>Sun, 12 Apr 2026 06:26:08 +0000</pubDate>
      <link>https://dev.to/bhumika_makker_e0906f1f4b/modal-deep-dive-50bi</link>
      <guid>https://dev.to/bhumika_makker_e0906f1f4b/modal-deep-dive-50bi</guid>
      <description>&lt;h2&gt;
  
  
  Company Overview
&lt;/h2&gt;

&lt;p&gt;Modal is a serverless cloud infrastructure platform designed specifically for AI, machine learning, and data-intensive applications. Founded in 2021 by &lt;strong&gt;Erik Bernhardsson&lt;/strong&gt;—the former CTO of Better.com and creator of Spotify's recommendation algorithms—Modal Labs has quickly emerged as a critical piece of infrastructure for developers looking to run Python code in the cloud without the operational burden of managing servers, Kubernetes clusters, or GPU provisioning.&lt;/p&gt;

&lt;p&gt;The company's mission is straightforward yet ambitious: &lt;strong&gt;eliminate infrastructure complexity so developers can focus on building intelligent applications&lt;/strong&gt;. Modal's platform allows developers to write standard Python code and execute it in the cloud with automatic containerization, scaling, and GPU provisioning. This approach has resonated strongly with the data science and AI engineering communities, who have historically struggled with the operational overhead of deploying ML models at scale.&lt;/p&gt;

&lt;p&gt;While specific funding figures and team size aren't disclosed in the current search results, Modal has positioned itself as a cloud-native platform that enables developers to run inference, training, batch jobs, sandboxes, and notebooks with sub-second cold starts. The platform competes in the rapidly growing AI infrastructure space, addressing a critical pain point: the gap between local development and production deployment for AI workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Latest News &amp;amp; Announcements
&lt;/h2&gt;

&lt;p&gt;Based on current search data, Modal continues to maintain active development and community engagement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GitHub Repository Activity&lt;/strong&gt;: The &lt;a href="https://github.com/modal-labs/modal-examples" rel="noopener noreferrer"&gt;modal-labs/modal-examples&lt;/a&gt; repository shows recent activity with the last commit on April 10, 2026, demonstrating ongoing updates and community contributions. The repository maintains 1,153 stars and features multiple examples including flash-attention implementations and LangChain agents. &lt;a href="https://github.com/modal-labs" rel="noopener noreferrer"&gt;source&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Claude Agent SDK Integration&lt;/strong&gt;: A new package &lt;a href="https://github.com/sshh12/modal-claude-agent-sdk-python" rel="noopener noreferrer"&gt;modal-claude-agent-sdk-python&lt;/a&gt; was released on January 18, 2026, wrapping the Claude Agent SDK to execute AI agents in secure, scalable Modal containers. This integration shows Modal's expanding ecosystem and compatibility with major AI frameworks. &lt;a href="https://github.com/sshh12/modal-claude-agent-sdk-python" rel="noopener noreferrer"&gt;source&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-Modal Agent Support&lt;/strong&gt;: GitHub repositories are increasingly leveraging Modal for multi-modal AI applications, including courses and frameworks for building production-ready multi-modal agents. &lt;a href="https://github.com/multi-modal-ai/multimodal-agents-course" rel="noopener noreferrer"&gt;source&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Platform Recognition&lt;/strong&gt;: Modal continues to be recognized by AI tool directories including &lt;a href="https://topai.tools/t/modal" rel="noopener noreferrer"&gt;topai.tools&lt;/a&gt;, &lt;a href="https://aiwiki.ai/wiki/modal" rel="noopener noreferrer"&gt;AI Wiki&lt;/a&gt;, and &lt;a href="https://www.thenextai.com/ai-tools/modal/" rel="noopener noreferrer"&gt;The Next AI&lt;/a&gt;, highlighting its growing presence in the developer tool ecosystem. &lt;a href="https://topai.tools/t/modal" rel="noopener noreferrer"&gt;sources&lt;/a&gt;(&lt;a href="https://aiwiki.ai/wiki/modal)(https://www.thenextai.com/ai-tools/modal/" rel="noopener noreferrer"&gt;https://aiwiki.ai/wiki/modal)(https://www.thenextai.com/ai-tools/modal/&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Product &amp;amp; Technology Deep Dive
&lt;/h2&gt;

&lt;p&gt;Modal's architecture represents a paradigm shift in how developers approach cloud infrastructure for AI workloads. Let's break down the core components and technical capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Platform Architecture
&lt;/h3&gt;

&lt;p&gt;Modal's foundation is built on several key principles:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Serverless Execution Model&lt;/strong&gt;: Unlike traditional cloud providers where you provision and manage servers, Modal abstracts away all infrastructure. Developers write Python functions decorated with Modal's decorators, and the platform handles everything else—containerization, scaling, GPU allocation, and execution. This serverless approach means you pay only for what you use, with automatic scaling from zero to thousands of containers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automatic Containerization&lt;/strong&gt;: Every function executed on Modal runs in an isolated container environment. The platform automatically builds Docker containers based on your dependencies, eliminating the need to write Dockerfiles or manage container registries. This is particularly valuable for ML workloads where dependency management can be complex.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sub-Second Cold Starts&lt;/strong&gt;: One of Modal's standout features is its ability to start containers in under a second. This is critical for interactive applications, APIs, and real-time inference where latency matters. Traditional serverless platforms often struggle with cold starts, especially for GPU workloads, but Modal has engineered their platform specifically to minimize startup time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPU Provisioning&lt;/strong&gt;: Modal provides seamless access to various GPU types including NVIDIA A100s, V100s, and other accelerators. The platform handles GPU allocation automatically based on your requirements, and you can request specific GPU types using simple decorators. This eliminates the need to manage GPU instances or deal with cloud provider-specific GPU provisioning APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Flexible Execution Modes&lt;/strong&gt;: Modal supports multiple execution patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Functions&lt;/strong&gt;: Run individual Python functions in the cloud&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classes&lt;/strong&gt;: Deploy entire Python classes with stateful behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandboxes&lt;/strong&gt;: Interactive environments for development and debugging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch Jobs&lt;/strong&gt;: Process large datasets with automatic parallelization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web Endpoints&lt;/strong&gt;: Expose functions as HTTP endpoints with automatic API gateway&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Filesystem Integration&lt;/strong&gt;: Modal provides a distributed filesystem that works seamlessly across containers. You can mount volumes, share data between functions, and persist results without worrying about storage infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scheduled Execution&lt;/strong&gt;: Native support for cron-like scheduling allows you to run functions on recurring schedules—perfect for daily model retraining, data pipelines, or periodic batch processing jobs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secrets Management&lt;/strong&gt;: Securely store and access API keys, database credentials, and other secrets without exposing them in your code or configuration files.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works Under the Hood
&lt;/h3&gt;

&lt;p&gt;When you deploy a Modal application:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Code Analysis&lt;/strong&gt;: Modal analyzes your Python code, identifying functions decorated with &lt;code&gt;@app.function&lt;/code&gt; or similar decorators&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependency Resolution&lt;/strong&gt;: The platform automatically detects your dependencies from requirements.txt, pip install commands, or import statements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container Building&lt;/strong&gt;: Containers are built with your dependencies and cached for rapid deployment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution&lt;/strong&gt;: When a function is called, Modal allocates resources (CPU, GPU, memory), starts a container, and executes your function&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling&lt;/strong&gt;: The platform automatically scales based on demand—spinning up containers for increased load and scaling to zero when idle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result Handling&lt;/strong&gt;: Return values are serialized and transmitted back to the caller, with automatic handling of large objects through filesystem references&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architecture enables developers to transition from local development to production deployment without changing their code or learning new paradigms.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub &amp;amp; Open Source
&lt;/h2&gt;

&lt;p&gt;Modal maintains an active open-source presence that serves as both documentation and community hub. Let's examine their GitHub footprint and ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Official Repositories
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/modal-labs/modal-examples" rel="noopener noreferrer"&gt;modal-labs/modal-examples&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stars&lt;/strong&gt;: 1,153&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language&lt;/strong&gt;: Python&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: MIT&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Last Updated&lt;/strong&gt;: April 10, 2026&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Collection of examples demonstrating Modal's capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This repository is the primary resource for developers learning Modal. Recent activity shows ongoing maintenance with examples covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flash-attention implementations (forked from Dao-AILab)&lt;/li&gt;
&lt;li&gt;LangChain agent integration&lt;/li&gt;
&lt;li&gt;Sandbox environments&lt;/li&gt;
&lt;li&gt;Various ML and data processing patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository demonstrates real-world use cases and serves as executable documentation. The MIT license encourages community contribution and adaptation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Community Ecosystem
&lt;/h3&gt;

&lt;p&gt;Modal's open-source ecosystem extends beyond official repositories:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/sshh12/modal-claude-agent-sdk-python" rel="noopener noreferrer"&gt;modal-claude-agent-sdk-python&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Released: January 18, 2026&lt;/li&gt;
&lt;li&gt;Purpose: Wraps Claude Agent SDK for execution in Modal containers&lt;/li&gt;
&lt;li&gt;Significance: Demonstrates Modal's integration with major AI frameworks and its growing ecosystem of third-party tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Multi-Modal Agent Projects&lt;/strong&gt;&lt;br&gt;
Multiple repositories are leveraging Modal for multi-modal AI applications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/multi-modal-ai/multimodal-agents-course" rel="noopener noreferrer"&gt;multimodal-agents-course&lt;/a&gt;: Educational content for building production-ready multi-modal agents&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aiming-lab/MDocAgent" rel="noopener noreferrer"&gt;MDocAgent&lt;/a&gt;: Multi-modal multi-agent framework for document understanding&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/jun0wanan/awesome-large-multimodal-agents" rel="noopener noreferrer"&gt;awesome-large-multimodal-agents&lt;/a&gt;: Curated list of multi-modal agent resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Community Engagement
&lt;/h3&gt;

&lt;p&gt;Modal's GitHub presence shows healthy community engagement with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regular updates to example repositories&lt;/li&gt;
&lt;li&gt;Active forks and contributions from the community&lt;/li&gt;
&lt;li&gt;Integration with popular AI frameworks (LangChain, Claude SDK, etc.)&lt;/li&gt;
&lt;li&gt;Educational content and courses built on Modal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The platform's Python-first approach has resonated with the data science community, as evidenced by the numerous ML-focused examples and integrations in the ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started — Code Examples
&lt;/h2&gt;

&lt;p&gt;Let's dive into practical code examples showing how to use Modal for different AI and ML workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 1: Basic Serverless Function with GPU
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize Modal app
&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ml-inference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define an image with dependencies
&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debian_slim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;pip_install&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transformers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;accelerate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A100&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Request NVIDIA A100 GPU
&lt;/span&gt;    &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate text using a pre-trained transformer model.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;

    &lt;span class="c1"&gt;# Load model (cached across invocations)
&lt;/span&gt;    &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generate text
&lt;/span&gt;    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;do_sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Local usage
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generate_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The future of AI is&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example demonstrates Modal's core value proposition: write Python code locally, execute it in the cloud with GPU resources, and pay only for actual execution time. The &lt;code&gt;@app.function&lt;/code&gt; decorator handles all the infrastructure magic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 2: Batch Processing with Parallel Execution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;batch-processor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debian_slim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;pip_install&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pandas&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;numpy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;  &lt;span class="c1"&gt;# 2GB memory
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process a batch of data with ML model inference.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

    &lt;span class="c1"&gt;# Convert to DataFrame
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Simulate ML processing
&lt;/span&gt;    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulate model inference
&lt;/span&gt;
    &lt;span class="c1"&gt;# Add computed features
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;records&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_parallel_processing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process data in parallel batches.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Split data into batches
&lt;/span&gt;    &lt;span class="n"&gt;batches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;all_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_data&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Process batches in parallel
&lt;/span&gt;    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process_batch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batches&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Flatten results
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Usage
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Generate sample data
&lt;/span&gt;    &lt;span class="n"&gt;sample_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="c1"&gt;# Process in parallel
&lt;/span&gt;    &lt;span class="n"&gt;processed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;run_parallel_processing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processed &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example showcases Modal's ability to handle batch processing workloads with automatic parallelization. The &lt;code&gt;map&lt;/code&gt; function distributes work across multiple containers, processing batches in parallel without manual orchestration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 3: Web Endpoint for ML Inference
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inference-api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;web_app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debian_slim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;pip_install&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fastapi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transformers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define request/response models
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InferenceRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InferenceResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;generated_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;model_used&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="nd"&gt;@app.function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;T4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Use T4 for cost-effective inference
&lt;/span&gt;    &lt;span class="n"&gt;container_idle_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;  &lt;span class="c1"&gt;# Keep container warm for 5 minutes
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@modal.asgi_app&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fastapi_app&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;FastAPI application wrapped in Modal.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="nd"&gt;@web_app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;InferenceResponse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InferenceRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;

        &lt;span class="c1"&gt;# Load model
&lt;/span&gt;        &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Generate
&lt;/span&gt;        &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_length&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;generated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;InferenceResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;generated_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;generated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;model_used&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;web_app&lt;/span&gt;

&lt;span class="c1"&gt;# The endpoint is automatically deployed and accessible via Modal's URL
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example demonstrates how to deploy a production-ready ML inference API with Modal. The platform handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatic HTTPS endpoint creation&lt;/li&gt;
&lt;li&gt;Container scaling based on traffic&lt;/li&gt;
&lt;li&gt;GPU provisioning&lt;/li&gt;
&lt;li&gt;Load balancing&lt;/li&gt;
&lt;li&gt;Health checks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Market Position &amp;amp; Competition
&lt;/h2&gt;

&lt;p&gt;Modal operates in the rapidly evolving AI infrastructure market, competing with both established cloud providers and specialized AI platforms. Let's analyze their position.&lt;/p&gt;

&lt;h3&gt;
  
  
  Competitive Landscape
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Established Cloud Providers (AWS, GCP, Azure)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strengths&lt;/strong&gt;: Massive infrastructure, comprehensive services, enterprise contracts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaknesses&lt;/strong&gt;: Complexity, high operational overhead, steep learning curve&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modal's Advantage&lt;/strong&gt;: Developer experience, automatic scaling, Python-first approach&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Specialized AI Platforms (RunPod, Lambda Labs, CoreWeave)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strengths&lt;/strong&gt;: GPU-focused, competitive pricing, ML-optimized&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaknesses&lt;/strong&gt;: Often require infrastructure management, limited serverless capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modal's Advantage&lt;/strong&gt;: True serverless experience, integrated platform, sub-second cold starts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Serverless Platforms (Vercel, AWS Lambda)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strengths&lt;/strong&gt;: Mature serverless offerings, large ecosystems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaknesses&lt;/strong&gt;: Limited GPU support, longer cold starts, not ML-optimized&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modal's Advantage&lt;/strong&gt;: GPU-first design, ML-optimized cold starts, Python-native&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;ML Deployment Platforms (SageMaker, Vertex AI)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strengths&lt;/strong&gt;: Integrated ML workflows, enterprise features&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaknesses&lt;/strong&gt;: Vendor lock-in, complex configuration, high cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modal's Advantage&lt;/strong&gt;: Flexibility, simplicity, pay-per-use pricing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Market Position Analysis
&lt;/h3&gt;

&lt;p&gt;Modal has carved out a unique position by focusing on &lt;strong&gt;developer experience&lt;/strong&gt; and &lt;strong&gt;Python-native workflows&lt;/strong&gt;. While competitors offer similar capabilities piecemeal, Modal provides an integrated platform specifically designed for AI/ML workloads.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Modal&lt;/th&gt;
&lt;th&gt;AWS SageMaker&lt;/th&gt;
&lt;th&gt;Google Vertex AI&lt;/th&gt;
&lt;th&gt;RunPod&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ease of Use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (Python decorators)&lt;/td&gt;
&lt;td&gt;Medium (console/CLI)&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low (manual setup)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cold Start Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&amp;lt;1 second&lt;/td&gt;
&lt;td&gt;10-30 seconds&lt;/td&gt;
&lt;td&gt;10-30 seconds&lt;/td&gt;
&lt;td&gt;N/A (always-on)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU Support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pricing Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pay-per-use&lt;/td&gt;
&lt;td&gt;Complex tiered&lt;/td&gt;
&lt;td&gt;Complex tiered&lt;/td&gt;
&lt;td&gt;Hourly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Python Native&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Serverless&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auto-scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Pricing Philosophy
&lt;/h3&gt;

&lt;p&gt;Modal's pay-per-use pricing model aligns with modern cloud economics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pay only for actual execution time&lt;/li&gt;
&lt;li&gt;No minimum commitments&lt;/li&gt;
&lt;li&gt;Automatic scaling to zero when idle&lt;/li&gt;
&lt;li&gt;Transparent per-second billing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This contrasts with traditional GPU providers that charge by the hour, even for brief workloads. For intermittent or bursty ML workloads, Modal can offer significant cost savings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Developer Impact
&lt;/h2&gt;

&lt;p&gt;Modal's emergence represents a significant shift in how developers approach AI infrastructure. Let's explore what this means for builders.&lt;/p&gt;

&lt;h3&gt;
  
  
  Who Should Use Modal?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;ML Engineers and Data Scientists&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploy models without learning Docker/Kubernetes&lt;/li&gt;
&lt;li&gt;Run experiments at scale without managing infrastructure&lt;/li&gt;
&lt;li&gt;Transition from notebooks to production seamlessly&lt;/li&gt;
&lt;li&gt;Access diverse GPU hardware without procurement complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AI Startup Founders&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ship products faster by eliminating infrastructure setup&lt;/li&gt;
&lt;li&gt;Reduce burn rate with pay-per-use pricing&lt;/li&gt;
&lt;li&gt;Scale from prototype to production without architectural changes&lt;/li&gt;
&lt;li&gt;Focus on product differentiation rather than infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Enterprise ML Teams&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standardize ML deployment across teams&lt;/li&gt;
&lt;li&gt;Reduce operational overhead and infrastructure costs&lt;/li&gt;
&lt;li&gt;Enable rapid experimentation and iteration&lt;/li&gt;
&lt;li&gt;Maintain flexibility without vendor lock-in&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Research Scientists&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run large-scale experiments without managing clusters&lt;/li&gt;
&lt;li&gt;Access specialized hardware on-demand&lt;/li&gt;
&lt;li&gt;Reproduce results with consistent environments&lt;/li&gt;
&lt;li&gt;Share reproducible workflows via code&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Benefits for Developers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Elimination of Infrastructure Anxiety&lt;/strong&gt;&lt;br&gt;
Modal removes the cognitive load associated with infrastructure decisions. No more debating instance types, container registries, or scaling strategies. Write Python code, deploy, and iterate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rapid Iteration Cycles&lt;/strong&gt;&lt;br&gt;
Sub-second cold starts mean developers can test changes quickly. This accelerates the feedback loop between code changes and production testing, which is critical for ML experimentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost Efficiency&lt;/strong&gt;&lt;br&gt;
Pay-per-use pricing means you're not paying for idle resources. For sporadic workloads like model retraining, batch processing, or development environments, costs can be dramatically lower than always-on alternatives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware Accessibility&lt;/strong&gt;&lt;br&gt;
Access to diverse GPU types (A100, V100, T4, etc.) without procurement lead times or capital expenditure. Developers can experiment with different hardware configurations to optimize for their specific workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Collaboration Friendly&lt;/strong&gt;&lt;br&gt;
Modal's code-based infrastructure definition enables version control, code review, and collaboration—practices that are difficult with GUI-based cloud consoles.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Opinionated Take
&lt;/h3&gt;

&lt;p&gt;From my perspective as a developer advocate, Modal represents the &lt;strong&gt;democratization of AI infrastructure&lt;/strong&gt;. Just as Vercel and Netlify democratized web deployment, Modal is doing the same for AI workloads.&lt;/p&gt;

&lt;p&gt;The genius of Modal's approach is recognizing that &lt;strong&gt;infrastructure should be invisible&lt;/strong&gt;. Developers shouldn't need to be infrastructure experts to deploy ML models. By making the common case (deploy Python functions to the cloud with GPUs) trivial and the complex case possible, Modal lowers the barrier to entry for AI development.&lt;/p&gt;

&lt;p&gt;However, this approach isn't without trade-offs. Modal's abstraction layer means less control over low-level infrastructure details. For teams with specialized requirements or existing infrastructure investments, the trade-off may not be worth it. But for the vast majority of AI developers, the productivity gains far outweigh the loss of control.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Based on current trends and Modal's trajectory, here are predictions for what's next:&lt;/p&gt;

&lt;h3&gt;
  
  
  Near-Term Predictions (2026)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Expanded Hardware Support&lt;/strong&gt;&lt;br&gt;
Modal will likely add support for newer GPU architectures (H100, Blackwell) and specialized AI accelerators (TPUs, custom silicon). The platform's abstraction layer makes it relatively straightforward to add new hardware options without changing user code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enhanced Observability&lt;/strong&gt;&lt;br&gt;
As production deployments scale, developers will need better monitoring, logging, and debugging tools. Expect Modal to invest in observability features that provide visibility into execution, performance, and costs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration Expansion&lt;/strong&gt;&lt;br&gt;
The Claude Agent SDK integration is just the beginning. Expect deeper integrations with major ML frameworks (PyTorch, TensorFlow, JAX), MLOps tools (MLflow, Weights &amp;amp; Biases), and data platforms (Snowflake, Databricks).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise Features&lt;/strong&gt;&lt;br&gt;
As Modal matures, expect enterprise-grade features including SSO, audit logging, compliance certifications, and dedicated support. These are table stakes for enterprise adoption.&lt;/p&gt;

&lt;h3&gt;
  
  
  Medium-Term Predictions (2026-2027)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Multi-Cloud Support&lt;/strong&gt;&lt;br&gt;
Currently, Modal abstracts infrastructure but likely runs on a single cloud provider. Multi-cloud support would provide redundancy, compliance flexibility, and cost optimization across providers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advanced Scheduling&lt;/strong&gt;&lt;br&gt;
While Modal already supports cron-based scheduling, expect more sophisticated workflow orchestration capabilities—DAG-based pipelines, conditional execution, and failure handling—for complex ML workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edge Deployment&lt;/strong&gt;&lt;br&gt;
As edge AI grows, Modal could extend its platform to support deployment to edge devices, maintaining the same developer experience across cloud and edge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Community Marketplace&lt;/strong&gt;&lt;br&gt;
An ecosystem of pre-built functions, models, and templates could emerge, allowing developers to share and discover Modal components. This would accelerate development and grow the community.&lt;/p&gt;

&lt;h3&gt;
  
  
  Long-Term Vision
&lt;/h3&gt;

&lt;p&gt;Modal has the potential to become the &lt;strong&gt;default infrastructure layer for AI development&lt;/strong&gt;. By abstracting infrastructure complexity while maintaining flexibility, they could occupy the same role in AI that AWS occupies in general cloud computing.&lt;/p&gt;

&lt;p&gt;The key question is whether they can scale their platform and team to meet growing demand while maintaining the developer experience that sets them apart. If they can, Modal could become one of the foundational platforms of the AI era.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Modal is a serverless cloud platform specifically designed for AI/ML workloads&lt;/strong&gt;, enabling developers to run Python code in the cloud with automatic containerization, scaling, and GPU provisioning without managing infrastructure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Founded in 2021 by Erik Bernhardsson&lt;/strong&gt; (former CTO of Better.com and creator of Spotify's recommendation algorithms), Modal addresses the critical gap between local ML development and production deployment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Key technical advantages include sub-second cold starts&lt;/strong&gt;, automatic GPU provisioning, flexible execution modes (functions, classes, sandboxes, batch jobs), and pay-per-use pricing that eliminates costs for idle resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The platform maintains active open-source presence&lt;/strong&gt; with &lt;a href="https://github.com/modal-labs/modal-examples" rel="noopener noreferrer"&gt;modal-labs/modal-examples&lt;/a&gt; (1,153 stars, updated April 10, 2026) and growing ecosystem integrations including the Claude Agent SDK.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Modal's competitive advantage lies in developer experience&lt;/strong&gt;—Python decorators replace infrastructure configuration, enabling rapid iteration and reducing the cognitive load of infrastructure management.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ideal users include ML engineers, AI startups, enterprise ML teams, and research scientists&lt;/strong&gt; who need to deploy models at scale without becoming infrastructure experts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Future trajectory likely includes expanded hardware support, enhanced observability, enterprise features, and potentially multi-cloud support&lt;/strong&gt; as Modal aims to become the default infrastructure layer for AI development.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Resources &amp;amp; Links
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Official Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://modal.com/" rel="noopener noreferrer"&gt;Modal Website&lt;/a&gt; - Official platform homepage&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://modal.com/docs" rel="noopener noreferrer"&gt;Modal Documentation&lt;/a&gt; - Platform documentation and guides&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://modal.com/pricing" rel="noopener noreferrer"&gt;Modal Pricing&lt;/a&gt; - Pricing information&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  GitHub Repositories
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/modal-labs/modal-examples" rel="noopener noreferrer"&gt;modal-labs/modal-examples&lt;/a&gt; - Official examples repository (1,153 stars)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/modal-labs" rel="noopener noreferrer"&gt;modal-labs Organization&lt;/a&gt; - Official GitHub organization&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/sshh12/modal-claude-agent-sdk-python" rel="noopener noreferrer"&gt;modal-claude-agent-sdk-python&lt;/a&gt; - Claude Agent SDK integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Community &amp;amp; Ecosystem
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://aiwiki.ai/wiki/modal" rel="noopener noreferrer"&gt;Modal on AI Wiki&lt;/a&gt; - Community documentation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://topai.tools/t/modal" rel="noopener noreferrer"&gt;Modal on topai.tools&lt;/a&gt; - Tool directory listing&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.thenextai.com/ai-tools/modal/" rel="noopener noreferrer"&gt;Modal on The Next AI&lt;/a&gt; - Platform review and use cases&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Related Projects
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/multi-modal-ai/multimodal-agents-course" rel="noopener noreferrer"&gt;multimodal-agents-course&lt;/a&gt; - Multi-modal AI course using Modal&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aiming-lab/MDocAgent" rel="noopener noreferrer"&gt;MDocAgent&lt;/a&gt; - Multi-modal document understanding framework&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/langchain-ai/langchain" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt; (133,212 stars) - Agent engineering platform with Modal integration examples&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Generated on 2026-04-12 by &lt;a href="https://github.com/gautammanak1/ai-tech-daily-agent" rel="noopener noreferrer"&gt;AI Tech Daily Agent&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was auto-generated by &lt;a href="https://github.com/gautammanak1/ai-tech-daily-agent" rel="noopener noreferrer"&gt;AI Tech Daily Agent&lt;/a&gt; — an autonomous Fetch.ai uAgent that researches and writes daily deep-dives.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
  </channel>
</rss>
