Modal Has a Free API — Here's How to Run Python in the Cloud Without DevOps

#modal #python #serverless #ai

A data scientist told me: 'I spent a week configuring Kubernetes, Docker, and GPU drivers to run my model. With Modal, I deployed the same model in 10 minutes — from my laptop.'

What Modal Offers for Free

Modal free tier:

$30/month free credits — generous for development
GPU access — A100, H100, T4 (pay per second)
CPU compute — scale to thousands of containers
No Docker — define environment in Python
Instant cold starts — < 1 second
Cron jobs — scheduled functions
Web endpoints — HTTP APIs from Python functions
Volumes — persistent storage
Secrets — secure environment variables

Quick Start

pip install modal
modal token new

import modal

app = modal.App("hello-world")

@app.function()
def square(x: int) -> int:
    return x ** 2

@app.local_entrypoint()
def main():
    # Runs locally, but square() executes in the cloud
    print(square.remote(42))  # 1764

modal run hello.py
# Your function runs in Modal's cloud — no Docker, no deploy

GPU Functions

import modal

app = modal.App("gpu-inference")

# Define environment (no Dockerfile needed)
image = modal.Image.debian_slim().pip_install("torch", "transformers")

@app.function(gpu="A100", image=image, timeout=300)
def generate_text(prompt: str) -> str:
    from transformers import pipeline

    generator = pipeline("text-generation", model="meta-llama/Llama-3-8b", device="cuda")
    result = generator(prompt, max_length=200)
    return result[0]["generated_text"]

@app.local_entrypoint()
def main():
    print(generate_text.remote("The future of AI is"))

Web Endpoints

import modal
from modal import web_endpoint

app = modal.App("my-api")

@app.function()
@web_endpoint()
def predict(text: str) -> dict:
    # Your ML model inference
    sentiment = analyze_sentiment(text)
    return {"text": text, "sentiment": sentiment, "score": 0.95}

# Deploy: modal deploy my_api.py
# Endpoint: https://your-username--my-api-predict.modal.run?text=hello

Scheduled Jobs (Cron)

@app.function(schedule=modal.Cron("0 9 * * *"))  # Every day at 9 AM
def daily_report():
    data = fetch_metrics()
    send_slack_report(data)
    print(f"Report sent: {len(data)} metrics")

Parallel Processing

@app.function()
def process_item(item: dict) -> dict:
    # Heavy processing per item
    result = expensive_computation(item)
    return result

@app.local_entrypoint()
def main():
    items = load_1000_items()

    # Process all 1000 items in parallel (Modal scales automatically)
    results = list(process_item.map(items))
    print(f"Processed {len(results)} items")
    # 1000 items × 10 sec each = 10 sec total (not 10,000 sec)

Persistent Volumes

vol = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={'/data': vol})
def save_model():
    # Train model...
    model.save('/data/model.pt')  # Persists between runs

@app.function(volumes={'/data': vol})
def load_and_predict(input):
    model = load('/data/model.pt')
    return model.predict(input)

Use Cases

ML inference — deploy models with GPU in minutes
Data processing — parallel batch jobs at scale
Web scraping — run 1000 scrapers in parallel
PDF processing — OCR thousands of documents
Video processing — transcription, thumbnail generation

Need web scraping at scale? Check out my web scraping actors on Apify — no infrastructure to manage.

Need cloud compute for your project? Email me at spinov001@gmail.com.

DEV Community