DEV Community

Cover image for The Death of YAML: How I Build and Deploy My Entire LLMOps Stack in Pure Python
Python Programming Series
Python Programming Series

Posted on • Originally published at Medium

The Death of YAML: How I Build and Deploy My Entire LLMOps Stack in Pure Python

We’ve all been there. It’s 2 AM, and the production deployment is failing. You’re staring at a 500-line YAML file, and the error message is a cryptic mapping values are not allowed in this context. You find the culprit: a single, misplaced space.

For years, we’ve been told that Infrastructure as Code (IaC) is the answer to manual, error-prone cloud management. But for many of us, IaC has become synonymous with “Configuration File Hell.” We are software engineers who build logic with loops, functions, and classes, yet we are forced to define our critical infrastructure using static, unforgiving data formats like YAML or proprietary languages like HCL.

What if we could build our cloud infrastructure with the same tools we use to build our applications? What if we could define our entire AWS EKS cluster, our GPU node groups, and our serverless endpoints using pure, idiomatic Python?

This isn’t a fantasy. This is Infrastructure as Software (IaS), and it will change the way you think about DevOps.

The Problem: YAML is for Data, Not Logic

YAML is a great format for representing structured data. It is not, however, a programming language. When you need to create ten similar S3 buckets, you can’t write a for loop. When you need to deploy to staging with 2 replicas and to production with 20, you can’t write an if statement. You are forced to resort to complex templating tools like Helm or Jinja, adding another layer of abstraction and another potential point of failure.

For complex LLMOps platforms, this becomes untenable. We need to dynamically calculate resource limits based on model size, configure intricate network security rules, and manage dependencies between dozens of microservices. We need logic. We need Python.

The Solution: Infrastructure as Software with Pulumi

Pulumi is an open-source IaC tool that allows you to define and manage your cloud infrastructure using general-purpose programming languages like Python. It’s not just a wrapper around an API; it’s a full-fledged engine that translates your Python code into a desired state and reconciles it with your cloud provider.

Let’s build something real. Instead of a simple S3 bucket, let’s provision a complete, scalable, serverless endpoint for a containerized Python application on AWS. This is the kind of infrastructure that would normally require a complex mix of CloudFormation, Terraform, or multiple YAML files.

Here’s how we do it in a single Python script.

#
# File: __main__.py (A Pulumi program to deploy a serverless container)
#
import pulumi
import pulumi_aws as aws
import pulumi_awsx as awsx
import json
import os

# --- 1. Create a private Docker image repository (ECR) ---
# This is where our FastAPI application image will be stored.
# awsx is a high-level library that simplifies resource creation.
ecr_repo = awsx.ecr.Repository("llm-service-repo")

# --- 2. Define IAM Role (Permissions) for the Lambda Function ---
# Our function needs permission to execute and write logs to CloudWatch.
# We define the trust policy as a standard Python dictionary.
lambda_role = aws.iam.Role("llm-lambda-role", 
    assume_role_policy=json.dumps({
        "Version": "2012-10-17",
        "Statement": [{
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Principal": {"Service": "lambda.amazonaws.com"},
        }]
    })
)

# Attach the basic AWS-managed policy for Lambda execution.
aws.iam.RolePolicyAttachment("lambda-exec-policy",
    role=lambda_role.name,
    policy_arn="arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
)

# --- 3. Create the API Gateway and the Lambda Function ---
# awsx.apigateway.API creates the gateway, the route, and the Lambda integration
# all in one declarative Python object.
api = awsx.apigateway.API("llm-api-gateway", 
    routes=[{
        "path": "/api/v1/process-query",
        "method": "POST",
        # This defines the Lambda function that handles the request.
        "event_handler": aws.lambda_.Function("llm-service-fn",
            role=lambda_role.arn,
            package_type="Image", # We are deploying a container!
            image_uri=ecr_repo.image_uri, # Connects to the ECR repo
            memory_size=1024,
            timeout=30,
            # Inject environment variables, just like in any Python app
            environment={
                "variables": {
                    "LLM_MODEL_ID": "gemini-flash-latest",
                    "API_KEY": os.getenv("SECRET_API_KEY") # Securely read from environment
                }
            }
        )
    }]
)

# --- 4. Export the final, public URL of our API ---
# This URL is the output of our program, ready to be used.
pulumi.export("endpoint_url", api.url)
Enter fullscreen mode Exit fullscreen mode

What Just Happened?

When you run pulumi up, this Python script executes and tells Pulumi’s engine to provision the following resources on AWS, in the correct order:

  1. An ECR Repository to store our application’s Docker image.
  2. An IAM Role with the exact permissions our function needs.
  3. An AWS Lambda Function configured to run our container image.
  4. An API Gateway that exposes a public POST endpoint and routes traffic to our Lambda.

We just defined a complete, scalable, serverless microservice infrastructure in about 40 lines of idiomatic Python. We can add loops to create multiple endpoints, use if statements to change configurations for production, and even write pytest unit tests to validate our infrastructure logic before a single dollar is spent on cloud resources.

Why This is a Game-Changer for LLMOps

LLMOps platforms are notoriously complex. A production RAG system might require:

  • An EKS cluster with specialized GPU node groups.
  • A stateful Vector Database like Weaviate, deployed with persistent storage.
  • Multiple microservices for data ingestion, orchestration, and inference.
  • Complex IAM roles and security groups to ensure data privacy.

Trying to manage this with hundreds of disconnected YAML files is a recipe for disaster. By defining this entire stack in Python with Pulumi, we create a single, cohesive, and testable codebase for both our application and its infrastructure. We can create a class EKSCluster or a class RAGService that encapsulates all the underlying complexity, allowing us to deploy our entire “Private ChatGPT” with a few lines of Python.

Your Infrastructure is Now Software. Treat It That Way.

The days of treating infrastructure as a separate, arcane discipline are over. If you are a Python developer, you already have the skills to build and manage scalable, production-grade cloud infrastructure. You just need to stop thinking in terms of static configuration files and start thinking in terms of software.

This approach — Infrastructure as Software — is the future of DevOps and a non-negotiable requirement for building reliable AI systems at scale.

Stop fighting with indentation. Start building with code.


From a Single Endpoint to a Full-Scale AI Factory

This article provides a glimpse into the power of Infrastructure as Software, but it only scratches the surface. We’ve built a scalable serverless endpoint, but a true enterprise-grade LLMOps platform requires orchestrating stateful databases, managing GPU-powered Kubernetes clusters, and implementing robust security and monitoring.

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in “Cloud-Native Python, DevOps & LLMOps” (Volume 8, each book can be read as a standalone) of the Python Programming Series. Specifically, this workflow synthesizes the principles from:

  • Chapter 11 (The Orchestrator): Understanding the Kubernetes primitives that our Pulumi code abstracts.
  • Chapter 14 (Managing Secrets): Securely handling the SECRET_API_KEY in a production environment.
  • Chapter 15 (Helm Charts): Packaging the containerized application for repeatable deployments.
  • Chapter 19 (Infrastructure as Software): The deep dive into using Pulumi and Python to provision the entire AWS stack, including the EKS cluster this service would run on in a full-scale deployment.

The book includes production-ready code and architectural patterns that extend beyond the scope of an article format:
Amazon Link

Explore the complete “Python Programming Series” for a comprehensive journey from Python fundamentals to advanced AI deployment. Each book can be read as a standalone.

Python Programming Series on Amazon

Subscribe to my weekly newsletter on Substack:
https://programmingcentral.substack.com

Top comments (0)