A data scientist told me: 'I spent a week configuring Kubernetes, Docker, and GPU drivers to run my model. With Modal, I deployed the same model in 10 minutes — from my laptop.'
What Modal Offers for Free
Modal free tier:
- $30/month free credits — generous for development
- GPU access — A100, H100, T4 (pay per second)
- CPU compute — scale to thousands of containers
- No Docker — define environment in Python
- Instant cold starts — < 1 second
- Cron jobs — scheduled functions
- Web endpoints — HTTP APIs from Python functions
- Volumes — persistent storage
- Secrets — secure environment variables
Quick Start
pip install modal
modal token new
import modal
app = modal.App("hello-world")
@app.function()
def square(x: int) -> int:
return x ** 2
@app.local_entrypoint()
def main():
# Runs locally, but square() executes in the cloud
print(square.remote(42)) # 1764
modal run hello.py
# Your function runs in Modal's cloud — no Docker, no deploy
GPU Functions
import modal
app = modal.App("gpu-inference")
# Define environment (no Dockerfile needed)
image = modal.Image.debian_slim().pip_install("torch", "transformers")
@app.function(gpu="A100", image=image, timeout=300)
def generate_text(prompt: str) -> str:
from transformers import pipeline
generator = pipeline("text-generation", model="meta-llama/Llama-3-8b", device="cuda")
result = generator(prompt, max_length=200)
return result[0]["generated_text"]
@app.local_entrypoint()
def main():
print(generate_text.remote("The future of AI is"))
Web Endpoints
import modal
from modal import web_endpoint
app = modal.App("my-api")
@app.function()
@web_endpoint()
def predict(text: str) -> dict:
# Your ML model inference
sentiment = analyze_sentiment(text)
return {"text": text, "sentiment": sentiment, "score": 0.95}
# Deploy: modal deploy my_api.py
# Endpoint: https://your-username--my-api-predict.modal.run?text=hello
Scheduled Jobs (Cron)
@app.function(schedule=modal.Cron("0 9 * * *")) # Every day at 9 AM
def daily_report():
data = fetch_metrics()
send_slack_report(data)
print(f"Report sent: {len(data)} metrics")
Parallel Processing
@app.function()
def process_item(item: dict) -> dict:
# Heavy processing per item
result = expensive_computation(item)
return result
@app.local_entrypoint()
def main():
items = load_1000_items()
# Process all 1000 items in parallel (Modal scales automatically)
results = list(process_item.map(items))
print(f"Processed {len(results)} items")
# 1000 items × 10 sec each = 10 sec total (not 10,000 sec)
Persistent Volumes
vol = modal.Volume.from_name("my-data", create_if_missing=True)
@app.function(volumes={'/data': vol})
def save_model():
# Train model...
model.save('/data/model.pt') # Persists between runs
@app.function(volumes={'/data': vol})
def load_and_predict(input):
model = load('/data/model.pt')
return model.predict(input)
Use Cases
- ML inference — deploy models with GPU in minutes
- Data processing — parallel batch jobs at scale
- Web scraping — run 1000 scrapers in parallel
- PDF processing — OCR thousands of documents
- Video processing — transcription, thumbnail generation
Need web scraping at scale? Check out my web scraping actors on Apify — no infrastructure to manage.
Need cloud compute for your project? Email me at spinov001@gmail.com.
Top comments (0)