Metta Surendhar

Posted on Jul 12

OpsFusion 2024: Insights into MLOps, DevOps, and Platform Engineering

#devops #discuss #cloud #googlecloud

Recently, I had the opportunity to attend OpsFusion: Where Dev Meets ML—a technical meetup that brought together practitioners and enthusiasts across DevOps, MLOps, and Platform Engineering. The event was an excellent blend of hands-on sessions, real-world experiences, and emerging trends across these intersecting domains.

In this blog, I’ve shared a structured summary of each session, along with key takeaways that resonated with me.

MLOps in Vertex AI – by Navaneethan Gopal

This session focused on building end-to-end machine learning pipelines using Vertex AI, with a specific emphasis on automating the ML lifecycle beyond model development.

Key Highlights

The demonstration used a multi-class classification problem (Dry Beans dataset) developed in Google Colab using Gemini for code assistance.
It was emphasized that less than 1% of MLOps involves actual ML code. The remaining majority lies in operations—such as infrastructure, orchestration, testing, and monitoring.

Core Components of MLOps

Data collection and validation
Model training and testing
Debugging and analysis
Model monitoring post-deployment
Cross-functional collaboration

MLOps Lifecycle Phases

Discovery – problem and data exploration
Development – feature engineering, dataset versioning, and integration with feature stores
Deployment – serving the model through automated pipelines

Maturity Levels in MLOps

Level 0: Manual build and deploy
Level 1: Automated training workflows
Level 2: Fully automated and reproducible pipelines across environments

Vertex AI Pipeline Overview

The speaker provided a walkthrough of how to build and deploy a Vertex AI pipeline triggered from Bitbucket or a cronjob. The steps included:

Creating a GCS (Google Cloud Storage) bucket
Defining dataset and training components using XGBoost
Initializing and deploying the pipeline via SDK integration

Emerging Operations in ML

FMOps (Foundation Model Operations): Managing LLMs, latency, token usage, and cost
LLMOps: Operations tailored to Retrieval-Augmented Generation (RAG) and large language models
PromptOps: Monitoring and optimizing prompt performance and hallucination tracking

Kubeflow

Introduction to Kubeflow as a Kubernetes-native platform for ML workflows
Creating custom components and reusable pipelines

This session bridged the gap between foundational ML and scalable production pipelines, highlighting the growing need for robust, reproducible ML systems.

Trunk-Based Development with Terraform – by Harini Muralidharan

This session covered the developer-driven DevOps model, focusing on enabling application developers to define and manage infrastructure using Infrastructure as Code (IaC).

Context: Challenges in Traditional DevOps

Frequent inconsistencies between dev and production environments
Developer reliance on operations teams for even minor infrastructure changes
Lack of visibility and traceability in changes made to the system

Principles of Developer-Driven DevOps

Developers define and version infrastructure alongside application code
Early detection and mitigation of issues via automation
Promotes ownership without expecting developers to become operations experts

Introduction to Terraform

The session provided a deep dive into Terraform, its ecosystem, and how it enables scalable infrastructure on GCP.

Why Terraform?

Open-source and cloud-agnostic
Declarative syntax (HCL)
Native support for GCP
Strong community adoption and extensibility

Core Components

Providers: Connect Terraform with cloud services
Resources: Define infrastructure components
Variables & Outputs: Parameterization and visibility
State Management: Track infrastructure state across teams

Common Workflow

terraform init → terraform plan → terraform apply → terraform destroy

Integrating Terraform with CI/CD

Using CI/CD pipelines (YAML) to automate Terraform commands
Promotes consistent, reliable infrastructure changes with version control

Best Practices

Store code in Git with proper version control
Use remote state storage (e.g., GCS or Terraform Cloud)
Follow the principle of least privilege
Modularize Terraform codebases for reusability
Perform automated testing on infra modules
Monitor for configuration drift and enforce corrective actions
This talk emphasized the benefits of empowering developers while maintaining operational integrity, security, and scalability.

This session bridged the gap between foundational ML and scalable production pipelines, highlighting the growing need for robust, reproducible ML systems.

Platform Engineering vs DevOps: Evolution or Revolution? – by Crystal Darling

This session helped clarify the difference between DevOps, SRE, and the growing field of Platform Engineering.

Challenges in Traditional DevOps

Operations teams are often blocked by development timelines
Developers submit tickets for operational support, resulting in slow turnaround
Limited autonomy in environments, infrastructure, and tool usage

What Is Platform Engineering?

The practice of building and maintaining Internal Developer Platforms (IDPs)
Platform engineers build self-service tools and abstractions for developers
Treat developers as clients, providing them with consistent and secure environments

Key Platform Engineering Skills

Kubernetes orchestration
IaC tools like Terraform and Helm
CI/CD systems
CNCF tooling for observability, deployment, and monitoring

Core Message

Platform Engineering is not a rebranding of DevOps. It is a cultural and architectural evolution focused on developer experience, autonomy, and scalability.

Discussions on ML Research and Networking

The event concluded with group discussions on recent research papers from Microsoft and Google—specifically those related to Copilot, RAG, and the inner workings of generative systems.

It was a highly engaging session where I got to connect with fellow learners, exchange ideas, and hear how others are applying these concepts in real-world environments.

Closing Thoughts

Attending OpsFusion gave me a broader and more integrated view of how software systems are evolving—whether it’s about scaling ML models through MLOps, automating infrastructure with Terraform, or building robust internal platforms that make developer lives easier.

If you're someone who is navigating the intersection of ML, infrastructure, and deployment—or wants to bridge the gap between development and operations—events like these are immensely valuable.

DEV Community