DEV Community

Cover image for OpsFusion 2024: Insights into MLOps, DevOps, and Platform Engineering
Metta Surendhar
Metta Surendhar

Posted on

OpsFusion 2024: Insights into MLOps, DevOps, and Platform Engineering

Recently, I had the opportunity to attend OpsFusion: Where Dev Meets ML—a technical meetup that brought together practitioners and enthusiasts across DevOps, MLOps, and Platform Engineering. The event was an excellent blend of hands-on sessions, real-world experiences, and emerging trends across these intersecting domains.

In this blog, I’ve shared a structured summary of each session, along with key takeaways that resonated with me.

MLOps in Vertex AI – by Navaneethan Gopal

This session focused on building end-to-end machine learning pipelines using Vertex AI, with a specific emphasis on automating the ML lifecycle beyond model development.

Key Highlights

  • The demonstration used a multi-class classification problem (Dry Beans dataset) developed in Google Colab using Gemini for code assistance.
  • It was emphasized that less than 1% of MLOps involves actual ML code. The remaining majority lies in operations—such as infrastructure, orchestration, testing, and monitoring.

Core Components of MLOps

  • Data collection and validation
  • Model training and testing
  • Debugging and analysis
  • Model monitoring post-deployment
  • Cross-functional collaboration

MLOps Lifecycle Phases

  1. Discovery – problem and data exploration
  2. Development – feature engineering, dataset versioning, and integration with feature stores
  3. Deployment – serving the model through automated pipelines

Maturity Levels in MLOps

  • Level 0: Manual build and deploy
  • Level 1: Automated training workflows
  • Level 2: Fully automated and reproducible pipelines across environments

Vertex AI Pipeline Overview

The speaker provided a walkthrough of how to build and deploy a Vertex AI pipeline triggered from Bitbucket or a cronjob. The steps included:

  • Creating a GCS (Google Cloud Storage) bucket
  • Defining dataset and training components using XGBoost
  • Initializing and deploying the pipeline via SDK integration

Emerging Operations in ML

  • FMOps (Foundation Model Operations): Managing LLMs, latency, token usage, and cost
  • LLMOps: Operations tailored to Retrieval-Augmented Generation (RAG) and large language models
  • PromptOps: Monitoring and optimizing prompt performance and hallucination tracking

Kubeflow

  • Introduction to Kubeflow as a Kubernetes-native platform for ML workflows
  • Creating custom components and reusable pipelines

This session bridged the gap between foundational ML and scalable production pipelines, highlighting the growing need for robust, reproducible ML systems.


Trunk-Based Development with Terraform – by Harini Muralidharan

This session covered the developer-driven DevOps model, focusing on enabling application developers to define and manage infrastructure using Infrastructure as Code (IaC).

Context: Challenges in Traditional DevOps

  • Frequent inconsistencies between dev and production environments
  • Developer reliance on operations teams for even minor infrastructure changes
  • Lack of visibility and traceability in changes made to the system

Principles of Developer-Driven DevOps

  • Developers define and version infrastructure alongside application code
  • Early detection and mitigation of issues via automation
  • Promotes ownership without expecting developers to become operations experts

Introduction to Terraform

The session provided a deep dive into Terraform, its ecosystem, and how it enables scalable infrastructure on GCP.

Why Terraform?

  • Open-source and cloud-agnostic
  • Declarative syntax (HCL)
  • Native support for GCP
  • Strong community adoption and extensibility

Core Components

  • Providers: Connect Terraform with cloud services
  • Resources: Define infrastructure components
  • Variables & Outputs: Parameterization and visibility
  • State Management: Track infrastructure state across teams

Common Workflow

terraform init → terraform plan → terraform apply → terraform destroy
Enter fullscreen mode Exit fullscreen mode

Integrating Terraform with CI/CD

  • Using CI/CD pipelines (YAML) to automate Terraform commands
  • Promotes consistent, reliable infrastructure changes with version control

Best Practices

  • Store code in Git with proper version control
  • Use remote state storage (e.g., GCS or Terraform Cloud)
  • Follow the principle of least privilege
  • Modularize Terraform codebases for reusability
  • Perform automated testing on infra modules
  • Monitor for configuration drift and enforce corrective actions
  • This talk emphasized the benefits of empowering developers while maintaining operational integrity, security, and scalability.

This session bridged the gap between foundational ML and scalable production pipelines, highlighting the growing need for robust, reproducible ML systems.


Platform Engineering vs DevOps: Evolution or Revolution? – by Crystal Darling

This session helped clarify the difference between DevOps, SRE, and the growing field of Platform Engineering.

Challenges in Traditional DevOps

  • Operations teams are often blocked by development timelines
  • Developers submit tickets for operational support, resulting in slow turnaround
  • Limited autonomy in environments, infrastructure, and tool usage

What Is Platform Engineering?

  • The practice of building and maintaining Internal Developer Platforms (IDPs)
  • Platform engineers build self-service tools and abstractions for developers
  • Treat developers as clients, providing them with consistent and secure environments

Key Platform Engineering Skills

  • Kubernetes orchestration
  • IaC tools like Terraform and Helm
  • CI/CD systems
  • CNCF tooling for observability, deployment, and monitoring

Core Message

Platform Engineering is not a rebranding of DevOps. It is a cultural and architectural evolution focused on developer experience, autonomy, and scalability.


Discussions on ML Research and Networking


The event concluded with group discussions on recent research papers from Microsoft and Google—specifically those related to Copilot, RAG, and the inner workings of generative systems.

It was a highly engaging session where I got to connect with fellow learners, exchange ideas, and hear how others are applying these concepts in real-world environments.


Closing Thoughts

Attending OpsFusion gave me a broader and more integrated view of how software systems are evolving—whether it’s about scaling ML models through MLOps, automating infrastructure with Terraform, or building robust internal platforms that make developer lives easier.

If you're someone who is navigating the intersection of ML, infrastructure, and deployment—or wants to bridge the gap between development and operations—events like these are immensely valuable.

Top comments (0)