DEV Community: Pete Miloravac

Building an AI-Native CI/CD Experience with sem-ai

Pete Miloravac — Wed, 27 May 2026 09:35:38 +0000

Developers don’t want to spend their time writing CI configuration, debugging flaky pipelines, or digging through logs to understand why a build failed.

They want to ship software.

That’s the idea behind sem-ai – Semaphore’s approach to building an AI-native CI experience where developers interact with CI/CD using natural language, directly from the tools they already use.

In our latest product update, we demonstrated what this looks like in practice: going from an empty repository to a fully working CI pipeline using Claude Code and sem-ai, without requiring any prior Semaphore knowledge.

And now, you can watch the full walkthrough and live demo here:

This is part of a broader direction for Semaphore in 2026: extending CI/CD with AI-powered automation that reduces developer toil while keeping developers fully in control.

Going from Zero to CI with Natural Language

In the demo, Marcos started with a fork of the GorillaMux repository after removing all existing CI configuration and GitHub Actions workflows.

Using the new sem-ai slash commands inside Claude Code, he initialized a complete Semaphore project with a single command:

/sem-ai:init

The initialization flow analyzes the repository, detects the technology stack, and proposes a tailored CI setup based on best practices.

For the Golang project in the demo, sem-ai automatically suggested:

Golang CI linting
Security scanning with gosec
Matrix testing across multiple Go versions
A recommended CI topology for the repository

Instead of manually learning Semaphore YAML, developers describe intent and review generated configuration.

This is exactly the onboarding experience we believe modern CI/CD platforms should provide:

Developers should not need to learn how to configure CI/CD systems before they can start shipping software.

Why Slash Commands Matter for AI Workflows

One of the most interesting insights from the week came from Nick, who worked on sem-ai’s onboarding and agent workflows.

Initially, the team experimented with “skills” alone — giving AI coding agents contextual information about Semaphore and hoping they would discover the right workflows automatically.

In practice, the results were inconsistent.

Agents sometimes failed to recognize Semaphore-specific concepts or didn’t know which tools to use. Success depended heavily on prompt quality.

That changed with the introduction of dedicated sem-ai slash commands.

Instead of relying purely on inference, slash commands provide a predictable interface between developers, agents, and Semaphore workflows.

The result is a much more reliable experience for agentic development.

Embedding CI/CD Best Practices into Agents

A major focus last week was improving the contextual “skills” that guide agents during CI/CD workflows.

The team expanded sem-ai understanding of:

Semaphore pipeline structure
Caching workflows
Artifact management
Test reports
Failure diagnostics
Pipeline optimization strategies

For example, when debugging failed jobs, agents now prioritize structured test reports instead of raw logs whenever available.

This seemingly small improvement dramatically increases the quality of automated debugging and resolution.

As Marko explained during the update:

High-quality skills with focused context dramatically improve success rates.

The result is a significantly better developer experience — one where best practices are embedded directly into the workflow instead of requiring developers to memorize them.

Self-Healing Pipelines

After sem-ai generated the initial pipeline, Marcos instructed the agent to:

“Work until the pipeline is green.”

The agent monitored pipeline execution, identified failures, applied fixes, and iterated until the build passed successfully.

Once the pipeline was green, sem-ai summarized all changes it had made and even proposed additional optimizations to improve pipeline topology and execution speed.

This is an important distinction in how we think about AI inside Semaphore.

Agents are not replacing developers.

They are automating repetitive operational work inside CI/CD workflows while developers remain in control of what gets applied and shipped.

That principle is central to Semaphore’s product strategy:

Developers define intent
Automation executes repetitive work
Developers stay in control of outcomes

AI-Native CI/CD Built Around Developer Workflows

What we’re building with sem-ai is not “AI bolted onto CI.”

We believe CI/CD should evolve into a control plane for developer intent — where developers and agents collaborate directly inside the tools they already use.

That means:

Creating CI pipelines using natural language
Automatically diagnosing failed builds
Optimizing workflows continuously
Embedding organizational best practices into agents
Running AI-driven workflows safely on Semaphore infrastructure

Over time, this becomes much bigger than onboarding.

It becomes a new interface for CI/CD itself.

Watch the Full Demo

The video includes:

A live walkthrough of sem-ai initialization
Setting up CI/CD from scratch using natural language
Agent-driven pipeline fixes
Pipeline optimization examples
Insights into how sem-ai skills and slash commands evolved internally

If you want to see what AI-native CI/CD looks like in practice, check out the full video:

What’s Next

This update focused on onboarding and pipeline setup, but the next phase is even more exciting.

We’re continuing to expand sem-ai’s capabilities around:

Pipeline optimization
Failure analysis
Workflow discovery
Test automation
Agent-driven development workflows

The long-term goal is simple: help developers spend less time on repetitive CI/CD work and more time building software.

The post Building an AI-Native CI/CD Experience with sem-ai appeared first on Semaphore.

Introducing Semaphore for AI Agents: An AI-Native Developer Experience for CI/CD

Pete Miloravac — Thu, 14 May 2026 08:28:46 +0000

Developers are no longer spending most of their time inside traditional IDEs.

Today, workflows increasingly happen inside AI-powered coding environments like Claude Code, Cursor, and Codex. Developers are collaborating with agents, asking questions in natural language, debugging through conversations, and automating repetitive work directly from their coding environment.

At Semaphore, we believe CI/CD needs to evolve alongside those workflows.

That’s why we’re introducing Semaphore for AI Agents : a new open-source CLI and agentic interface designed to make Semaphore fully accessible from AI coding agents.

This is the beginning of a broader initiative we call the AI-native Semaphore experience : a vision where developers can interact with Semaphore entirely through agents, natural language, and automation.

CI/CD Built for Agentic Workflows

Traditional CI/CD tools were designed around dashboards, manual configuration, and humans clicking through interfaces.

But modern development workflows are changing.

Developers want to stay focused inside their coding environment without constantly switching tabs, navigating dashboards, or manually gathering CI/CD context. Agents are becoming the new interface layer for software development.

Semaphore for AI Agents is designed specifically for that reality.

Instead of requiring developers to manually inspect pipelines, analyze failures, or collect build metrics, Semaphore for AI Agents gives AI coding assistants a structured way to understand and interact with Semaphore.

The result is a workflow where developers can simply ask:

“Why is my CI failing?”
“What tests are flaky?”
“Show me the critical path in this pipeline.”
“Summarize the health of this project over the last week.”

And their coding agent can retrieve, analyze, and act on that information directly.

What You Can Do With Semaphore for AI Agents

The first release already includes a growing set of commands focused on CI/CD visibility, debugging, and workflow automation.

Diagnose failing pipelines

Semaphore for AI Agents can analyze projects and identify failing workflows, blocks, and tests without requiring developers to manually navigate logs or dashboards.

Instead of piecing information together across multiple views, the CLI aggregates data into structured output designed specifically for agents.

Developers can retrieve:

Failing tests
Pipeline summaries
Workflow diagnostics
Commit metadata
Flaky test information

All of it is available in machine-readable formats that coding agents can immediately process.

Critical path and blast radius analysis

Semaphore for AI Agents introduces commands specifically designed to help agents reason about CI/CD systems.

For example:

Critical path analysis helps identify which jobs or blocks are having the largest impact on pipeline execution.
Blast radius analysis helps evaluate how failures propagate through workflows.

These workflows are especially useful for AI agents trying to debug complex pipelines automatically.

Organization-level insights

Semaphore for AI Agents can also analyze entire organizations and summarize CI/CD health over time.

Developers and engineering leaders can retrieve insights such as:

Pass rates
Queue times
Workflow durations
Job execution trends
Frequently failing tests
Historical comparisons across weeks

Because the CLI outputs structured JSON, agents can summarize and contextualize large amounts of operational data automatically.

Instead of manually generating reports, developers can ask their coding assistant for an overview of the organization’s CI/CD health and immediately get actionable insights.

Designed for AI Coding Agents

Semaphore for AI Agents was built from the ground up for agentic workflows.

The CLI includes:

Agent-oriented command structures
Discoverable commands and examples
Structured JSON outputs
Schema-aware responses
Built-in support for coding assistants

The project also ships with:

Claude Code skills
Generic agent skills
A local MCP server

This allows coding assistants to directly interact with Semaphore using the Model Context Protocol (MCP).

In practice, this means developers can stay entirely inside their coding environment while their agent:

Investigates CI failures
Retrieves pipeline information
Identifies flaky tests
Summarizes project health
Runs diagnostics
Executes workflows

without requiring developers to leave their workflow.

CI Infrastructure for AI Agents

One of the most exciting capabilities demonstrated in the release is the ability to dynamically provision machines on Semaphore infrastructure for agent-driven workflows.

Using Semaphore for AI Agents, developers can:

Spawn ephemeral machines
Sync local files
Run tests remotely
Execute agent workflows at scale
Keep environments warm for rapid iteration

Instead of waiting for commits to trigger CI, developers can use Semaphore infrastructure directly as an extension of their development environment.

This creates entirely new workflows where AI agents can:

Continuously rerun tests
Debug code remotely
Scale workloads dynamically
Execute large parallel workloads
Operate safely in isolated environments

All powered by the same infrastructure Semaphore already uses to run CI/CD at scale.

A Foundation for the AI-Native Semaphore Experience

Semaphore for AI Agents is not a standalone experiment.

It is the first step toward a larger vision for how software delivery evolves in an AI-native world.

At Semaphore, we believe the future of CI/CD is not just about executing pipelines reliably.

It is about helping developers continuously improve software quality by automating the repetitive and operational work surrounding software delivery.

Our vision is simple:

Developers define intent
Agents handle repetitive execution
Semaphore provides the infrastructure, orchestration, and control layer

Developers remain in control.

Agents become powerful collaborators.

And CI/CD evolves into a platform for developer automation.

Open Source, Built in Public, and Developer-Controlled

Semaphore for AI Agents is fully open source — and that is a core part of how we think about AI-powered developer tooling.

As an open-source company, we believe developers should be able to inspect, understand, and extend the tools they rely on every day. Especially when AI is involved.

Developers should be able to:

Understand how tools work
Inspect prompts and behavior
Extend workflows
Build custom automations
Contribute new commands and integrations

There is no black-box automation here.

Semaphore for AI Agents is designed as infrastructure developers can trust, customize, and improve together with us.

And this is only the beginning.

The current release focuses on foundational capabilities for debugging, diagnostics, visibility, and agent interaction, but we’ll continue shipping new workflows and capabilities in the open.

Over the coming weeks, we’ll expand:

MCP integrations
CI analysis tooling
Agent-driven workflows
Automation primitives
Testing workflows
Scalable agent execution capabilities

We’ll also continue publishing demos, examples, and real-world workflows showing how developers can integrate Semaphore for AI Agents into their daily development process.

Because our goal is not to add AI for the sake of AI.

Our goal is to help developers spend less time on repetitive operational work and more time building great software.

Get Started

Semaphore for AI Agents is available today as an open-source project.

You can:

Explore the repository:https://github.com/semaphoreio/sem-ai
Compile it locally
Connect it to your existing Semaphore CLI configuration
Experiment with MCP integrations
Build your own workflows and automations
Contribute new commands and ideas

If you already use the Semaphore CLI, Semaphore for AI Agents reuses the same authentication and configuration setup, making onboarding straightforward.

We’re excited to see what developers build with it.

We also recorded a full demo showing Semaphore for AI Agents in action, including CI/CD debugging workflows, MCP integrations, organization-wide insights, and remote execution capabilities on Semaphore infrastructure.

👉 Watch the demo here.

👉 Explore the project on GitHub.

👉 Read the docs.

We’re excited to see what developers build with it.

The post Introducing Semaphore for AI Agents: An AI-Native Developer Experience for CI/CD appeared first on Semaphore.

What CI/CD strategies work for embedded or IoT projects that require hardware testing?

Pete Miloravac — Thu, 30 Apr 2026 10:48:53 +0000

Embedded and IoT teams face a very different CI/CD reality than traditional SaaS teams. While most continuous integration and continuous delivery pipelines assume everything can run in the cloud, embedded systems depend on physical hardware, constrained environments, and real world signals.

If you search forums like Reddit (r/embedded, r/devops), Stack Overflow, or vendor communities, the same questions keep appearing:

How do I run CI tests when I need actual hardware?
How do I scale hardware testing across teams?
How do I avoid flaky tests caused by devices?
Can I still use modern CI/CD tools or do I need something custom?

This guide walks through practical CI/CD strategies used by engineering teams building firmware, IoT platforms, and hardware dependent systems. It focuses on approaches that scale, reduce cost, and maintain reliability.

Why CI/CD is harder for embedded and IoT projects

Unlike pure software systems, embedded pipelines must deal with:

Physical device availability
Hardware state and reset issues
Long flashing and boot cycles
Non deterministic behavior (timing, sensors, connectivity)
Limited parallelization

This creates a mismatch with traditional CI/CD platforms that expect fast, stateless, fully virtualized execution.

For engineering leaders, this often results in slower pipelines, higher costs, and fragile automation. The goal is not to force cloud native assumptions onto hardware, but to design a hybrid pipeline.

The core strategy: split your pipeline by test layers

The most effective approach seen across teams is to separate tests into layers, minimizing hardware usage to only what is necessary.

1. Fast feedback layer (no hardware)

Run as much as possible without devices:

Unit tests for firmware logic
Static analysis (clang tidy, cppcheck)
Build validation
Simulation or emulation (QEMU, Renode)

Example Semaphore pipeline block:

version: v1.0
name: Firmware CI

blocks:
  - name: Build and unit tests
    task:
      jobs:
        - name: Build
          commands:
            - checkout
            - make build
            - make test

This layer should cover 70 to 90 percent of your test surface. It keeps pipelines fast and cost effective.

2. Hardware in the loop testing (HIL)

Only after passing fast checks should jobs use real devices.

Typical setup discussed in forums:

USB connected device farms
Network controlled power switches
Raspberry Pi or similar acting as device controllers

Example structure:

  - name: Hardware tests
    task:
      prologue:
        commands:
          - checkout
      jobs:
        - name: Run on device farm
          commands:
            - ./scripts/flash_device.sh
            - ./scripts/run_integration_tests.sh

With Semaphore, you can integrate self hosted agents that have direct access to hardware, allowing your pipeline to orchestrate physical tests while keeping control centralized.

This hybrid model is critical for teams that have outgrown default CI tools that cannot reliably interface with hardware.

Designing a reliable device farm

A common pain point across discussions is flaky hardware tests. The root cause is usually poor device management.

Best practices:

Isolate devices per job

Avoid shared hardware when possible. If sharing is required, implement locking mechanisms.

Automate reset and recovery

Use:

Smart power switches
USB relay boards
Watchdog scripts

Example reset script:

#!/bin/bash
# Power cycle device
curl http://power-switch.local/off
sleep 2
curl http://power-switch.local/on
sleep 5

Make tests idempotent

Each test run should not depend on previous device state.

Collect logs externally

Stream logs via serial or network so failures can be diagnosed post run.

Orchestrating hardware with CI/CD

One of the biggest questions teams ask is: how do I connect CI/CD pipelines to real devices?

The most robust approach is:

Use cloud CI/CD for orchestration
Use self hosted runners or agents for hardware interaction

With Semaphore, this means:

Running standard pipeline steps in the cloud
Routing hardware jobs to agents inside your lab

This keeps pipelines fast while maintaining control over physical infrastructure.

Handling test flakiness and timing issues

Forum discussions consistently highlight flaky tests as a major blocker.

Strategies that work:

Add retry logic at the pipeline level

retry:
  limit: 2

Add health checks before tests

Ensure device is reachable and responsive before running tests.

Use time budgets instead of strict timing

Avoid asserting exact timing unless necessary.

Tag and isolate flaky tests

Run them separately to avoid blocking the main pipeline.

Parallelizing hardware testing

Scaling is a key concern for engineering managers.

Options include:

Expanding device farms
Sharding tests across devices
Prioritizing critical test subsets

Example:

jobs:
  - name: Device 1
    commands:
      - ./run_tests.sh --group=1
  - name: Device 2
    commands:
      - ./run_tests.sh --group=2

Semaphore allows parallel job execution, which is especially valuable when coordinating multiple hardware nodes.

Cost optimization strategies

Hardware pipelines can become expensive quickly.

Key tactics:

Maximize non hardware test coverage
Use on demand hardware instead of always on
Optimize pipeline duration
Avoid rerunning full hardware suites unnecessarily

Engineering teams often switch CI/CD platforms when costs become unpredictable or tied to inefficient pipelines. A system that allows fine grained control over execution and resource usage is critical.

When to consider custom tooling vs CI/CD platforms

Many teams ask whether they should build their own system.

Build custom only if:

Your workflow is highly specialized
You require deep hardware orchestration beyond standard pipelines

Otherwise, modern CI/CD platforms like Semaphore can handle orchestration while letting you customize the hardware layer.

This balance avoids reinventing core CI/CD capabilities while still supporting embedded workflows.

Putting it all together

A production ready embedded CI/CD pipeline typically looks like:

Code commit triggers pipeline
Build and unit tests run in parallel
Simulation tests validate behavior
Hardware tests run on device farm via self hosted agents
Results aggregated and reported
Deployment triggered if all checks pass

This structure aligns with how high performing engineering teams reduce risk while maintaining delivery speed.

FAQs

How do I run CI tests on real hardware?

Use a device farm connected to self hosted CI agents. The pipeline triggers scripts that flash firmware and execute tests on physical devices.

What tools are commonly used for hardware testing in CI?

Teams commonly use QEMU or Renode for simulation, and custom scripts combined with device controllers for real hardware testing.

How do I reduce flaky tests in IoT CI/CD?

Focus on device reset automation, idempotent tests, retries, and proper health checks before execution.

Can CI/CD pipelines scale with hardware constraints?

Yes, by parallelizing across device farms and minimizing reliance on hardware through simulation and unit testing.

Is CI/CD worth it for embedded systems?

Yes. Teams that invest in CI/CD for embedded systems see improvements in reliability, faster debugging, and more predictable releases.

The post What CI/CD strategies work for embedded or IoT projects that require hardware testing? appeared first on Semaphore.

How to orchestrate IaC and application deployments together in CI/CD?

Pete Miloravac — Tue, 28 Apr 2026 12:28:28 +0000

As teams scale, one of the first cracks in a CI/CD setup appears between infrastructure and application deployments.

Infrastructure (Terraform, Pulumi, CloudFormation) evolves on one track. Application code ships on another. Eventually, they drift—and that’s when deployments become fragile, slow, and risky.

The question engineering leaders increasingly ask is:

How do we orchestrate infrastructure-as-code (IaC) and application deployments together in a single, reliable CI/CD pipeline?

This isn’t just a tooling problem. It’s about ensuring your system evolves as a unit, without introducing coupling that slows teams down.

Why This Matters at Scale

In smaller systems, it’s common to apply infrastructure changes manually and deploy applications independently. But as teams grow, this breaks down.

Infrastructure changes lag behind application needs. Deployments fail due to missing resources or configuration drift. Rollback becomes unclear or impossible.

This directly impacts key engineering metrics like change failure rate, deployment frequency, and time to restore (MTTR).

The goal is not to tightly couple everything—but to coordinate changes safely and predictably.

The Core Principle: One Pipeline, Two Lifecycles

Infrastructure and application code should not be treated the same.

Infrastructure changes are slower, riskier, and often irreversible. Application changes are faster, frequent, and easier to roll back.

Instead of merging them into one lifecycle, the better approach is to orchestrate both lifecycles within the same CI/CD pipeline, with clear boundaries and sequencing.

A Practical Architecture for IaC + Application CI/CD

At a high level, your pipeline should follow this flow:

Validate infrastructure changes
Plan infrastructure updates
Apply infrastructure with controls
Deploy application
Run post-deploy checks

Step 1: Validate Infrastructure Changes Early

Before anything is applied, validate IaC changes.

terraform fmt -check
terraform validate
terraform plan -out=tfplan

This should run on every pull request to catch errors early.

Step 2: Separate Plan and Apply

Separating plan from apply introduces visibility and control.

if terraform_plan_has_changes:
    require_manual_approval()
    run_terraform_apply()
else:
    skip_infra_step()

Step 3: Sequence Infrastructure Before Application Deployment

Application deployments should depend on infrastructure readiness.

if infra_apply_successful:
    build_application()
    run_tests()
    deploy_application()
else:
    block_pipeline()

Step 4: Maintain Traceability

Track which infrastructure version supports which application version.

export DEPLOYMENT_VERSION="app:v1.4.2-infra:v0.9.1"

Step 5: Handle Rollbacks Carefully

Application rollback is simple. Infrastructure rollback is not.

Avoid automatic infrastructure rollback. Use controlled processes instead.

Step 6: Use Environment-Based Controls

Different environments require different levels of control.

if environment == "production":
    require_manual_approval_for_infra()
    deploy_with_canary_strategy()
else:
    auto_apply_infra()
    deploy_application()

Step 7: Optimize for Cost and Performance

Running IaC and application pipelines together increases cost and execution time.

Optimize by skipping unnecessary steps, caching dependencies, and parallelizing jobs.

Example: End-to-End Flow

run_terraform_validate()
plan = run_terraform_plan()

if plan.has_changes:
    require_manual_approval()
    run_terraform_apply()

build_application()
run_tests()

if tests_pass:
    deploy_application()
else:
    block_deployment()

run_smoke_tests()
monitor_metrics()

Strategic Takeaway

The goal is not to tightly couple infrastructure and application code, but to orchestrate them intentionally.

High-performing teams validate early, control execution, sequence deployments clearly, and maintain full traceability.

Final Thought

As systems grow, the boundary between infrastructure and application becomes a source of risk. Your CI/CD pipeline is where that boundary should be resolved.

Because mature teams don’t deploy code or infrastructure separately—they deploy systems that evolve together.

FAQs

Should infrastructure and application deployments be in the same pipeline?

Yes, but they should maintain separate lifecycles. The pipeline should orchestrate both with clear sequencing rather than tightly coupling them.

Why separate Terraform plan and apply steps?

Separating plan and apply increases visibility, allows for manual approvals, and reduces the risk of unintended infrastructure changes.

Can infrastructure changes be rolled back automatically?

No, infrastructure rollback is complex and should be handled carefully through controlled processes rather than automatic rollback mechanisms.

What happens if infrastructure changes are not needed?

If no infrastructure changes are detected during the plan phase, the pipeline should skip the apply step and proceed with application deployment.

How do you ensure traceability between infrastructure and application versions?

By tagging deployments with both application and infrastructure versions, teams can track compatibility and quickly diagnose issues.

The post How to orchestrate IaC and application deployments together in CI/CD? appeared first on Semaphore.

Air-Gapped Deployments: How to Deploy to Servers Without Internet Access (Complete Guide)

Pete Miloravac — Fri, 24 Apr 2026 11:55:00 +0000

Deploying to servers with no internet access—also known as air-gapped environments —is a common requirement in regulated industries, enterprise on-prem setups, and high-security networks. However, most modern CI/CD pipelines assume constant access to public registries, APIs, and external services.

This mismatch is one of the biggest causes of failed deployments when teams move from standard cloud environments to restricted networks.

In this guide, you’ll learn how to deploy applications without internet access , including proven strategies, tools, and patterns used by production engineering teams.

What Is an Air-Gapped Environment?

An air-gapped environment is a system or network that is physically or logically isolated from the public internet. These environments are designed to maximize security by preventing external communication.

Common use cases include:

Financial institutions and regulated industries
Government and defense systems
On-prem enterprise infrastructure
Internal production networks with strict security policies

Why Deployments Fail Without Internet Access

Most CI/CD pipelines are built around assumptions that break in air-gapped environments:

Pulling dependencies from npm, pip, Maven, or apt
Downloading Docker images from Docker Hub
Calling external APIs during deployment
Installing packages at runtime

Typical errors teams encounter:

“Cannot reach package registry”
“Docker pull failed”
“Dependency install timeout”

The root cause is simple: deployments rely on external resources that are unavailable.

Core Principle: Build Once, Deploy Anywhere

The key to reliable air-gapped deployments is shifting your pipeline design:

❌ Pull dependencies during deployment

✅ Package everything during CI

This means your CI pipeline must produce a fully self-contained artifact that includes:

Application code
All dependencies
Runtime components (if needed)
Configuration templates

Once built, deployment becomes a simple transfer + execution step.

Approach 1: Artifact-Based Deployments (Recommended)

This is the most common and reliable method.

Instead of installing dependencies on the target server, you package everything during CI.

Example (Node.js)

npm ci
npm run build
tar -czf app.tar.gz dist/ node_modules package.json

Semaphore CI Example

version: v1.0
name: Build and Package

blocks:
  - name: Build
    task:
      jobs:
        - name: Build app
          commands:
            - checkout
            - npm ci
            - npm run build
            - tar -czf app.tar.gz dist/ node_modules package.json
      artifacts:
        files:
          - app.tar.gz

Deployment (Offline)

tar -xzf app.tar.gz
npm start

No internet required.

Approach 2: Private Package Registries and Mirrors

If your environment allows internal networking, you can mirror dependencies inside the network.

Popular tools:

npm : Verdaccio, Nexus
Python (pip): Devpi
Docker : Private registry
OS packages : Aptly, Artifactory

Example: Docker Image Push/Pull

docker build -t registry.internal/app:1.0 .
docker push registry.internal/app:1.0

Inside the air-gapped network:

docker pull registry.internal/app:1.0

This approach preserves developer workflows while removing external dependencies.

Approach 3: Docker Image Transfer (Offline)

If no registry access is possible, transfer images as files.

docker save app:1.0 -o app.tar

Transfer the file securely, then:

docker load -i app.tar

This method is widely used in highly restricted environments.

Approach 4: Immutable Infrastructure (Golden Images)

For larger systems, consider building pre-configured machine images.

Using tools like Packer:

packer build image.json

The resulting image includes:

Application
Dependencies
Configuration

Deployment becomes provisioning infrastructure instead of running scripts.

How to Handle Secrets in Air-Gapped Deployments

Secrets management becomes more complex without external services.

Best practices:

Inject secrets at deploy time (not build time)
Use encrypted configuration bundles
Avoid embedding credentials in artifacts
Use internal secret management systems where possible

Networking Patterns That Work

Common real-world setups:

Bastion host for controlled access
One-way artifact transfer (CI → production)
Scheduled sync between environments

Avoid manual, untracked file transfers whenever possible.

End-to-End Air-Gapped Deployment Workflow

Developer pushes code
CI pipeline builds application
Dependencies are bundled into artifact or image
Artifact is transferred into secure network
Deployment runs locally (no internet required)

This ensures reproducibility and consistency across environments.

Common Mistakes to Avoid

Installing dependencies during deployment
Relying on external APIs at runtime
Not versioning artifacts
Manual deployments without audit trail

These issues lead to fragile and non-reproducible systems.

Benefits of Proper Air-Gapped Deployment Strategy

Organizations that adopt these patterns see improvements in:

Deployment reliability
Change failure rate
Security posture
Operational efficiency

Final Thoughts

Air-gapped deployments introduce constraints—but also force better engineering practices.

By adopting a build once, deploy anywhere model, teams can achieve:

More predictable releases
Fewer deployment failures
Better scalability across environments

If your current pipeline struggles in restricted environments, it’s a strong signal that it relies too heavily on runtime assumptions.

Fixing that will improve your entire delivery process—not just air-gapped deployments.

FAQ: Air-Gapped Deployments

How do you deploy software without internet access?

Build a self-contained artifact in CI and transfer it to the target environment for execution.

Can Docker run in air-gapped environments?

Yes. Use private registries or transfer images via docker save and docker load.

How do you install dependencies offline?

Either bundle them into your artifact or use internal mirrors.

What is the best deployment strategy for secure environments?

Artifact-based deployments and immutable infrastructure are the most reliable.

Do CI/CD tools support air-gapped deployments?

Yes, but support varies. Look for tools with strong artifact management and flexible workflows.

The post Air-Gapped Deployments: How to Deploy to Servers Without Internet Access (Complete Guide) appeared first on Semaphore.

How Does CI/CD Differ for Machine Learning Pipelines (MLOps)?

Pete Miloravac — Thu, 23 Apr 2026 15:24:09 +0000

For most engineering teams, CI/CD is already a solved problem—at least on the surface. You commit code, run tests, build artifacts, and deploy.

But when teams start introducing machine learning into production systems, that familiar pipeline begins to break down.

Across forums like Reddit (r/MachineLearning, r/devops), Stack Overflow, and Hacker News, the same questions come up repeatedly:

“How do I version datasets in CI/CD?”
“Why does my model degrade after deployment even though tests pass?”
“How do I test something that learns from data?”
“Should I deploy models the same way as application code?”

This tutorial answers those questions with a practical lens. More importantly, it explains what engineering leaders need to rethink when adapting CI/CD pipelines for MLOps.

Why Traditional CI/CD Breaks Down for Machine Learning

In traditional software delivery, your pipeline is built around code determinism.

Given the same input, your application produces the same output. Your CI/CD pipeline enforces this through:

Unit tests
Integration tests
Build reproducibility
Static artifacts

Machine learning systems violate this assumption in three key ways:

Data is a first-class dependency
Outputs are probabilistic, not deterministic
Performance degrades over time (data drift)

This fundamentally changes how you design continuous integration and continuous deployment.

For engineering managers and CTOs, this is where pipelines often become fragile, slow, and expensive—especially when built on top of tools that were not designed for these workflows.

Key Differences Between CI/CD and MLOps Pipelines

1. What You Version: Code vs Code + Data + Models

In a standard CI/CD pipeline:

You version application code
Dependencies are managed via package managers
Builds are reproducible

In MLOps, you must version:

Training data
Feature engineering logic
Model artifacts
Hyperparameters

A typical approach is to combine Git with a data versioning tool like DVC.

Example:

# Track dataset
dvc add data/training.csv

# Push data to remote storage
dvc push

# Commit metadata
git add data/training.csv.dvc .gitignore
git commit -m "Track training dataset"

Your CI/CD pipeline now needs to fetch not just code, but also the correct dataset version.

2. What You Test: Logic vs Behavior

Traditional CI focuses on correctness:

assert(add(2, 2) === 4)

In machine learning, you test behavior:

Accuracy thresholds
Precision and recall
Model drift
Bias detection

Example test step in a pipeline:

assert model_accuracy > 0.87, "Model accuracy below threshold"

This introduces a new challenge: tests can fail even when code hasn’t changed.

3. What You Build: Binaries vs Experiments

In traditional pipelines:

Build once
Deploy artifact

In MLOps:

Train model
Evaluate multiple experiments
Select best candidate

Your pipeline becomes iterative and branching.

Example workflow:

blocks:
  - name: Train models
    task:
      jobs:
        - name: train-xgboost
        - name: train-random-forest

  - name: Evaluate
    task:
      jobs:
        - name: compare-metrics

  - name: Deploy best model
    task:
      jobs:
        - name: deploy

4. Deployment: Static Releases vs Continuous Retraining

Traditional deployment:

Triggered by code changes
Releases are versioned and stable

MLOps deployment:

Triggered by new data
Models may be retrained daily or hourly
Performance must be monitored continuously

This is where many teams struggle. They try to force data-driven workflows into code-driven pipelines.

Designing a CI/CD Pipeline for Machine Learning

Let’s walk through a practical pipeline using Semaphore.

Semaphore is particularly well-suited here because it allows you to orchestrate complex workflows without introducing unnecessary pipeline overhead—critical for compute-heavy ML workloads.

Step 1: Reproducible Environment

version: v1.0
name: ML Pipeline

agent:
  machine:
    type: e1-standard-4
    os_image: ubuntu2004

Pin dependencies:

pip install -r requirements.txt

For ML, reproducibility is everything. Use Docker or pinned environments to avoid failures.

Step 2: Fetch Data and Dependencies

blocks:
  - name: Setup
    task:
      jobs:
        - name: Fetch data
          commands:
            - checkout
            - dvc pull

This step is often missing in traditional pipelines—and is one of the main sources of confusion discussed in forums.

Step 3: Train Model

  - name: Train
    task:
      jobs:
        - name: Train model
          commands:
            - python train.py

Step 4: Evaluate Model

  - name: Evaluate
    task:
      jobs:
        - name: Evaluate model
          commands:
            - python evaluate.py

Example evaluation script:

if accuracy < 0.87:
    raise Exception("Model did not meet quality threshold")

Step 5: Conditional Deployment

  - name: Deploy
    task:
      jobs:
        - name: Deploy model
          commands:
            - python deploy.py

In Semaphore, you can gate this step using promotions, approvals, or conditions—important for controlling risk in ML deployments.

Common Pitfalls

Treating Models Like Code Artifacts

Models are not static. If you deploy them once and forget them, they will degrade.

Fix: Add monitoring and retraining triggers.

Ignoring Data Versioning

Without versioned data, debugging becomes impossible.

Fix: Use DVC, feature stores, or data snapshots.

Overloading CI with Training Jobs

Training jobs can be expensive and slow.

Fix: Separate lightweight CI from heavy training workflows.

Lack of Observability

Traditional CI/CD tools focus on build logs—not model performance.

Fix: Integrate monitoring and metrics.

Strategic Implications for Engineering Leaders

For decision makers, the shift to MLOps is not just technical—it affects:

Cost structure
Reliability
Tooling decisions

Teams that succeed treat CI/CD for ML as a first-class system, not an extension of existing pipelines.

This is where platforms like Semaphore position themselves differently:

Flexible pipeline orchestration for complex workflows
Predictable performance at scale
Cost efficiency compared to legacy tools

When Should You Adapt Your Pipeline?

You likely need to rethink your CI/CD if:

You are deploying models to production
Your pipelines are slowing down due to training workloads
You cannot reproduce model results reliably
CI/CD costs are increasing unpredictably

FAQs

What is the main difference between CI/CD and MLOps pipelines?

Traditional CI/CD focuses on deterministic code, while MLOps pipelines must handle data, probabilistic outputs, and continuous retraining.

Can I use standard CI/CD tools for machine learning?

Yes, but most teams need to extend them significantly to support data versioning, model evaluation, and retraining workflows.

How do you test machine learning models in CI/CD?

By validating metrics such as accuracy, precision, recall, and monitoring for drift.

Should model training run in CI?

Not always. Many teams separate training pipelines from CI to control cost and runtime.

How do you deploy machine learning models safely?

Use staged rollouts, approval gates, and continuous monitoring.

The post How Does CI/CD Differ for Machine Learning Pipelines (MLOps)? appeared first on Semaphore.

Can ChatGPT generate a CI/CD YAML pipeline for my Node.js project?

Pete Miloravac — Tue, 07 Apr 2026 10:20:20 +0000

If you have searched this question online, you have likely seen a pattern across GitHub discussions, Reddit threads, and Stack Overflow posts:

“Can ChatGPT write my CI pipeline from scratch?”
“Why does the generated YAML fail in CI but work locally?”
“How do I adapt AI generated pipelines to my actual environment?”
“Is it safe to rely on AI for production CI/CD pipelines?”

The short answer is yes — ChatGPT can generate a CI/CD YAML pipeline for your Node.js project. But the more useful answer for engineering leaders is this:

AI can accelerate pipeline creation, but it cannot replace engineering judgment, environment awareness, or platform specific optimization.

In this tutorial, we will walk through how to use ChatGPT effectively to generate a CI/CD pipeline for a Node.js application, how to validate and productionize it, and where teams typically run into issues at scale.

Why this matters for engineering teams

For engineering managers and CTOs, CI/CD is not just about “having a pipeline”. It directly impacts:

Deployment frequency
Change failure rate
Developer productivity
CI/CD cost predictability

Many teams using default tools or legacy setups (like heavily customized Jenkins pipelines) struggle with:

Slow builds due to poor caching strategies
Fragile pipelines that break with small changes
Increasing costs as usage scales

AI generated pipelines promise faster setup, but without structure, they often introduce hidden complexity and instability.

The goal is not to replace your CI/CD system with AI — it is to use AI to bootstrap and iterate faster, while relying on a platform like Semaphore to run fast, reliable pipelines at scale.

Step 1: Ask ChatGPT the right way

The quality of the YAML pipeline depends heavily on your prompt.

A weak prompt:

“Generate a CI/CD pipeline for Node.js”

A strong prompt:

“Generate a CI/CD pipeline YAML for a Node.js project using npm. Include steps for installing dependencies, running tests, caching node_modules, and deploying to a staging environment. Assume Node 18. Optimize for fast builds.”

This additional context ensures the output is closer to production ready.

Step 2: Example AI generated pipeline

Below is a Semaphore specific pipeline generated from a strong prompt. Semaphore pipelines are defined as code and executed in blocks and jobs, which makes them easy to parallelize and optimize for performance.

If you are new to Semaphore, you can explore the official pipeline configuration docs here.

Semaphore’s model (blocks, jobs, tasks) is especially useful for teams that need to scale CI/CD without rewriting pipelines as complexity grows.

Here is a simplified example:

Here is a simplified Semaphore pipeline generated from a strong prompt:

version: v1.0
name: Node.js CI Pipeline

agent:
  machine:
    type: e1-standard-2
    os_image: ubuntu2004

blocks:
  - name: Install Dependencies
    task:
      jobs:
        - name: npm install
          commands:
            - checkout
            - cache restore node-modules-$(checksum package-lock.json)
            - npm install
            - cache store node-modules-$(checksum package-lock.json) node_modules

  - name: Run Tests
    task:
      jobs:
        - name: npm test
          commands:
            - checkout
            - cache restore node-modules-$(checksum package-lock.json)
            - npm test

  - name: Deploy to Staging
    task:
      jobs:
        - name: deploy
          commands:
            - echo "Deploying to staging..."

This is a solid starting point. But it is not production ready yet.

Step 3: Fix the most common issues (based on real forum questions)

Before jumping into fixes, it is worth noting that many of these issues are not just “AI problems”. They are symptoms of weak CI/CD foundations. Semaphore addresses many of these at the platform level through:

Built in caching primitives
First class parallelism
Ephemeral, reproducible environments

Docs reference for deeper exploration.

Now let’s look at the most common issues.

Across developer forums, the same issues appear repeatedly.

1. “Works locally but fails in CI”

Common causes:

Missing environment variables
Node version mismatch
Implicit local dependencies

Fix by explicitly defining runtime:

commands:
  - nvm install 18
  - nvm use 18
  - npm ci

And define environment variables in Semaphore project settings.

2. Inefficient dependency installation

Many AI generated pipelines use npm install instead of npm ci.

For CI environments, always prefer:

npm ci

This ensures deterministic installs and faster builds.

3. Poor caching strategy

AI often adds caching, but not always correctly.

Key improvement:

Use lockfile checksum
Cache only what is needed

Semaphore provides native caching commands that make this easier and more reliable compared to ad hoc scripts.

Semaphore caching docs.

AI often adds caching, but not always correctly.

Key improvement:

Use lockfile checksum
Cache only what is needed

Semaphore caching docs.

4. No parallelization

Most generated pipelines are sequential.

At scale, this becomes a bottleneck.

Semaphore is designed for parallel execution by default, which allows teams to split workloads across jobs without additional tooling.

Improve by splitting jobs:

Most generated pipelines are sequential.

At scale, this becomes a bottleneck.

Improve by splitting jobs:

- name: Run Tests
  task:
    jobs:
      - name: unit tests
        commands:
          - npm run test:unit
      - name: integration tests
        commands:
          - npm run test:integration

This directly improves pipeline speed and developer feedback loops.

5. Missing failure handling and visibility

AI rarely includes:

Test reporting
Artifact storage
Debug logs

These are critical for teams managing multiple services.

Step 4: Productionizing the pipeline

This is where Semaphore becomes particularly valuable for engineering teams that have outgrown default CI/CD tools.

Unlike generic CI systems, Semaphore is optimized for:

Fast execution through efficient resource allocation
Predictable performance at scale
Clear cost control through usage based pricing

To move from “AI generated” to “team ready”, apply these principles.

Make pipelines predictable

Pin Node versions
Use npm ci
Avoid implicit dependencies

Optimize for speed

Use caching correctly
Parallelize test suites
Avoid unnecessary steps

Control costs

Engineering leaders often overlook this.

Inefficient pipelines increase CI/CD spend significantly. Semaphore helps teams reduce costs by optimizing execution time and resource usage.

Align with your workflow

AI does not know your:

Branching strategy
Deployment approvals
Security requirements

You must adapt the pipeline to match your actual delivery process.

Step 5: Where ChatGPT helps vs where it does not

Where it helps

Bootstrapping pipelines quickly
Converting ideas into YAML
Suggesting improvements (caching, parallelism)

Where it falls short

Understanding your infrastructure
Handling edge cases at scale
Optimizing for cost and performance across teams

This is why high performing teams pair AI with a purpose built CI/CD platform instead of relying on generated YAML alone.

Example: Improved production ready pipeline

version: v1.0
name: Node.js Optimized Pipeline

agent:
  machine:
    type: e1-standard-2
    os_image: ubuntu2004

blocks:
  - name: Setup
    task:
      jobs:
        - name: Install dependencies
          commands:
            - checkout
            - nvm install 18
            - nvm use 18
            - cache restore node-modules-$(checksum package-lock.json)
            - npm ci
            - cache store node-modules-$(checksum package-lock.json) node_modules

  - name: Test
    task:
      jobs:
        - name: Unit tests
          commands:
            - checkout
            - cache restore node-modules-$(checksum package-lock.json)
            - npm run test:unit

        - name: Integration tests
          commands:
            - checkout
            - cache restore node-modules-$(checksum package-lock.json)
            - npm run test:integration

  - name: Deploy
    task:
      jobs:
        - name: Deploy to staging
          commands:
            - echo "Deploying..."

Key takeaway for engineering leaders

ChatGPT can generate a CI/CD pipeline YAML for your Node.js project, but it should be treated as a starting point, not a finished solution.

The real differentiation comes from the platform running your pipelines.

With Semaphore, teams can:

Run pipelines faster through parallel execution and optimized infrastructure
Reduce CI/CD costs by eliminating inefficiencies
Scale pipelines without rewriting YAML as complexity grows

This is especially important for teams migrating from tools like Jenkins or GitHub Actions where performance and cost often degrade over time.

The real differentiation comes from:

How fast your pipelines run
How reliable they are under scale
How predictable your costs remain

Teams that outgrow default tools typically need more than generated YAML — they need a platform that enforces performance, reliability, and consistency.

Semaphore is designed for this stage: when your team has moved beyond basic CI/CD and needs fast, scalable pipelines without operational overhead.

ChatGPT can generate a CI/CD pipeline YAML for your Node.js project, but it should be treated as a starting point, not a finished solution.

The real differentiation comes from:

How fast your pipelines run
How reliable they are under scale
How predictable your costs remain

Teams that outgrow default tools typically need more than generated YAML — they need a platform that enforces performance, reliability, and consistency.

Semaphore is designed for this stage: when your team has moved beyond basic CI/CD and needs fast, scalable pipelines without operational overhead.

FAQ

Can ChatGPT generate a complete CI/CD pipeline for Node.js?

Yes, it can generate a functional YAML pipeline, but it usually requires adjustments for your environment, dependencies, and deployment workflow.

Is it safe to use AI generated pipelines in production?

Only after review and testing. AI does not understand your infrastructure, so validation is critical before production use.

Why do AI generated pipelines fail in CI?

Common reasons include missing environment variables, incorrect Node versions, and differences between local and CI environments.

How can I improve an AI generated CI/CD pipeline?

Focus on deterministic installs, proper caching, parallelization, and aligning the pipeline with your team’s workflow.

When should a team move beyond basic generated pipelines?

When pipelines become slow, fragile, or expensive. At that point, adopting a platform optimized for CI/CD performance becomes more effective than iterating on YAML alone.

The post Can ChatGPT generate a CI/CD YAML pipeline for my Node.js project? appeared first on Semaphore.

How to Manage CI/CD for Game Development (Unity, Unreal, Large Binaries)

Pete Miloravac — Fri, 27 Mar 2026 10:43:02 +0000

Game development teams face a very different CI/CD reality than traditional SaaS engineering teams. Instead of small, stateless builds, you’re dealing with gigabytes of assets, long build times, platform-specific toolchains, and fragile pipelines that often break under scale.

If you’ve ever searched for this topic on forums like Reddit, Stack Overflow, or Unreal/Unity communities, the same patterns emerge:

“Our builds take hours—how do we speed this up?”
“How do we version large assets in CI?”
“Why does Unity/Unreal behave differently in CI than locally?”
“How do we cache dependencies and avoid re-importing everything?”

This guide walks through how to design a CI/CD pipeline for game development that is fast, reliable, and cost-efficient—without relying on brittle workarounds.

Why CI/CD is Harder for Game Development

Unlike typical web services, game pipelines introduce three unique challenges:

1. Large Binary Assets

Game projects include textures, audio, models, and compiled assets that don’t behave well with traditional Git workflows.

2. Heavy Build Steps

Unity and Unreal builds involve asset import, shader compilation, and platform packaging—often taking 30–120 minutes.

3. Environment Sensitivity

Builds depend on specific engine versions, OS configurations, GPU drivers, and SDKs.

This means your CI/CD pipeline must optimize for caching, reproducibility, and parallelization—not just correctness.

Step 1: Structure Your Repository for CI/CD

The biggest mistake teams make is treating game repos like standard application repos.

Use Git LFS for Large Files

git lfs track "*.psd"
git lfs track "*.fbx"
git lfs track "*.wav"

Without LFS, your CI pipeline will choke on clone times and storage.

Separate Code and Assets (When Possible)

Core game logic → standard Git repo
Large assets → LFS or external storage (S3, artifact storage)

This reduces pipeline overhead and improves caching efficiency.

Step 2: Use Deterministic Build Environments

Forum discussions frequently highlight “works locally but not in CI” issues for Unity/Unreal.

The root cause is almost always environment drift.

Solution: Containerized or Prebuilt Environments

For example, using a Docker-based Unity build:

version: v1.0
name: Unity Build Pipeline

agent:
  machine:
    type: e1-standard-4
    os_image: ubuntu2004

blocks:
  - name: Build
    task:
      jobs:
        - name: Unity Build
          commands:
            - checkout
            - ./ci/install-unity.sh
            - ./ci/build.sh

Key idea: lock Unity/Unreal versions and dependencies.

For Unreal, you might pre-bake images with:

Unreal Engine installed
Required SDKs (Android, iOS, etc.)

Step 3: Cache Aggressively (This Is Non-Negotiable)

The #1 complaint across forums: “CI rebuilds everything every time.”

Cache Unity Library Folder

cache:
  paths:
    - Library

This avoids re-importing assets on every build.

Cache Unreal Derived Data Cache (DDC)

export UE-SharedDataCachePath=/cache/ue-ddc

This dramatically reduces shader compilation time.

Use Remote Caching

For distributed teams, store caches in shared storage (S3 or CI-native cache).

Step 4: Parallelize Builds Across Platforms

Game teams often target multiple platforms:

Windows
macOS
iOS
Android
Consoles

Running these sequentially kills productivity.

Example Parallel Pipeline

blocks:
  - name: Build Matrix
    task:
      jobs:
        - name: Windows Build
          commands:
            - ./build-windows.sh

        - name: Android Build
          commands:
            - ./build-android.sh

        - name: iOS Build
          commands:
            - ./build-ios.sh

Parallelization is one of the fastest ways to reduce total pipeline time.

Step 5: Manage Artifacts Efficiently

Game builds produce large outputs (often multiple GBs).

Best Practices

Store artifacts outside the CI workspace
Use artifact versioning
Expire old builds automatically

Example:

artifacts:
  paths:
    - build/

For large studios, push builds to:

S3
CDN
Internal distribution systems

Step 6: Automate Testing (Even for Games)

A common misconception: “Games are hard to test in CI.”

But modern pipelines support:

Unity Test Runner

/Applications/Unity/Hub/Editor/Unity \
  -runTests \
  -testPlatform PlayMode \
  -projectPath .

Unreal Automation Framework

RunUAT BuildCookRun -RunAutomationTests

Even partial test coverage dramatically improves confidence.

Step 7: Control Costs and Scale Intelligently

Game CI/CD pipelines are expensive by default.

Common issues raised by engineering leaders:

“We’re paying too much for idle build agents”
“Scaling builds is unpredictable”

Strategies

Use autoscaling runners
Avoid over-provisioning
Cache to reduce compute time
Use pay-per-use CI/CD platforms

This is where teams outgrow default tools like Jenkins or basic GitHub Actions setups.

Step 8: Handle Long Build Times with Pipeline Design

Instead of one monolithic pipeline, split workflows:

Fast checks (lint, unit tests) → run on every commit
Full builds → run on merge or nightly

Example:

pipeline:
  stages:
    - quick-checks
    - full-build

This keeps feedback loops tight while still validating full builds.

Step 9: Debugging CI Failures in Game Pipelines

From forum discussions, common failure causes include:

Missing licenses (Unity)
Incorrect SDK versions
Asset import failures
Path length issues (Windows)

Add Debug Visibility

set -x

And always log:

Engine version
Build parameters
Environment variables

Step 10: Choosing the Right CI/CD Platform

Game development pushes CI/CD tools to their limits.

Engineering leaders should evaluate:

Performance with large repositories
Caching capabilities
Parallel execution
Cost predictability
Ease of environment setup

Modern platforms like Semaphore are designed for teams that have outgrown default tools, offering:

Fast pipelines
Flexible caching
Pay-per-use pricing
Strong support for custom workflows

Putting It All Together

A production-ready game CI/CD pipeline should:

Use Git LFS or external storage for assets
Cache aggressively (Unity Library, Unreal DDC)
Run builds in parallel
Use deterministic environments
Split pipelines for faster feedback
Optimize for cost and scale

This isn’t just about automation—it’s about enabling your team to ship faster without increasing infrastructure complexity.

FAQs

Why are Unity builds so slow in CI?

Because asset import and shader compilation are expensive. Without caching (Library folder), CI rebuilds everything from scratch.

How do I reduce Unreal build times?

Use a shared Derived Data Cache (DDC), prebuilt engine images, and parallel builds.

Should I store game assets in Git?

Yes, but use Git LFS or external storage. Standard Git is not optimized for large binaries.

How do I debug CI-only failures?

Ensure environment parity, log all build parameters, and verify engine and SDK versions match local setups.

What’s the biggest mistake teams make?

Treating game CI/CD like web CI/CD. Game pipelines require different optimization strategies—especially around caching and artifacts.

The post How to Manage CI/CD for Game Development (Unity, Unreal, Large Binaries) appeared first on Semaphore.

How does AI-driven deployment differ between traditional software and ML models (MLOps)?

Pete Miloravac — Thu, 26 Mar 2026 10:27:33 +0000

AI is increasingly involved in deployment decisions—auto-rollbacks, approvals, test selection—but not all “AI-driven deployments” are the same.

There’s a critical distinction engineering leaders need to understand:

How does AI-driven deployment differ between traditional software and ML models (MLOps), and what does that mean for our CI/CD pipeline?

If you don’t account for this difference, you risk building pipelines that are:

difficult to reason about
expensive to operate
unreliable at scale

Why This Matters for Engineering Leaders

Most teams are now operating in one (or both) of these modes:

Adding AI-assisted capabilities into existing applications
Deploying machine learning models into production systems

At the same time, they’re still accountable for core outcomes:

deployment frequency
change failure rate
time to restore (MTTR)
cost predictability

Many teams already struggle with pipeline fragility, scaling limits, or rising CI/CD costs . Introducing AI—especially ML models—without adapting your deployment approach amplifies those problems.

The key insight:

AI doesn’t just change what you deploy. It changes how your pipeline behaves.

The Core Difference in One Sentence

Here’s the simplest way to think about it:

Traditional CI/CD deploys deterministic code.

MLOps deploys probabilistic behavior shaped by data.

Everything else—testing, rollout, monitoring—follows from that.

Where AI-Driven Deployment Diverges in Practice

To make this actionable, it helps to break the differences into four areas that directly impact your CI/CD pipeline: artifact, validation, rollout, and feedback loops.

1. Artifact: What You Deploy Changes Fundamentally

In traditional CI/CD pipelines:

you deploy versioned code and dependencies
behavior is defined by logic

In MLOps:

you deploy a trained model plus implicit assumptions about data
behavior depends on both code and data distribution

This introduces a new requirement: you need to version behavior, not just code.

In practice, this means:

treating model artifacts as first-class outputs
tracking which model version is running in production
linking deployments to training data and configuration

In Semaphore, this maps naturally to pipelines where artifacts (including models) are versioned and passed between workflow stages.

2. Validation: From Pass/Fail to Acceptable Risk

In traditional CI/CD:

tests are deterministic
failures block deployment

In MLOps:

evaluation is probabilistic
decisions are based on thresholds

This changes how pipelines enforce quality.

Instead of:

“Does this pass?”

You ask:

“Is this good enough to deploy?”

That requires:

evaluation datasets
comparison against previous models
clearly defined acceptance thresholds

This is where many default tools break down—they are built for binary gates, not graded decision-making.

You can extend CI/CD pipelines to support this by introducing evaluation stages and conditional logic.

3. Rollout: Releases vs Controlled Experiments

In traditional deployments:

you release a new version
you monitor for errors
you roll back if needed

In MLOps:

deployments are often experiments
multiple versions may run simultaneously
behavior is validated in production

This introduces patterns like:

canary releases
shadow deployments
A/B testing

The implication for engineering leaders is clear:

Your pipeline needs to support experimentation—not just delivery.

That requires flexible workflows and conditional execution, not rigid, linear pipelines.

4. Feedback Loops: Monitoring vs Continuous Learning

In traditional CI/CD:

monitoring detects failures
teams fix issues and redeploy

In MLOps:

monitoring detects drift and degradation
pipelines may trigger retraining automatically

This creates a continuous loop:

build → train → evaluate → deploy → monitor → retrain

This loop increases:

pipeline frequency
infrastructure usage
operational complexity

Without guardrails, this can quickly lead to cost overruns and unstable systems —two major concerns for engineering leaders.

Where AI-Driven Deployment Overlaps (and Compounds Risk)

AI-driven deployment decisions—like auto-rollback or dynamic approvals—apply to both systems.

But the impact is different:

In traditional CI/CD, AI optimizes deterministic systems
In MLOps, AI operates on top of already probabilistic systems

That compounds uncertainty.

This is why governance becomes critical. If you haven’t defined guardrails yet, start here.

Example: CI/CD vs MLOps Deployment Logic

Traditional CI/CD pipeline:

if tests_fail:
   block_deployment()

elif error_rate_increases:
   rollback()

else:
   deploy()

MLOps pipeline:

if model_accuracy < threshold:
   block_deployment()

elif performance_delta < acceptable_range:
   require_review()

elif drift_detected:
   trigger_retraining()

else:
   deploy_model()

The difference is subtle but important:

CI/CD enforces correctness
MLOps manages performance over time

What This Means for Your CI/CD Platform

As soon as you introduce ML models—or AI-driven decisions—your CI/CD platform needs to support more than just execution.

Engineering leaders should evaluate:

Can we model both deterministic and probabilistic workflows?
Does the system support conditional logic and branching at scale?
Can we version and trace artifacts beyond code (e.g. models)?
Do we have visibility into decisions and outcomes?
Can we maintain predictable cost as pipelines grow in complexity?

Many default tools struggle here because they were designed for simpler workflows—not dynamic, evolving systems.

How This Looks in a Modern CI/CD Platform

In a modern CI/CD platform, CI/CD and MLOps are not separate systems—they are variations of the same pipeline.

You should be able to:

define pipelines that include training, evaluation, and deployment
version both code and model artifacts
implement threshold-based decision gates
run controlled rollout strategies
maintain performance and cost predictability at scale

Semaphore is designed for teams that have outgrown default tools and need this level of flexibility—without sacrificing speed or reliability.

Strategic Takeaway

AI-driven deployment is not a single pattern—it’s two overlapping systems:

deterministic CI/CD for application code
probabilistic MLOps for machine learning models

The teams that succeed are the ones that:

understand the difference
adapt their pipelines accordingly
avoid overcomplicating their tooling

Final Thought

The biggest mistake teams make is treating ML deployments like traditional software releases.

They’re not.

And as AI becomes embedded in both, your CI/CD pipeline needs to evolve into a system that can handle both determinism and uncertainty—without losing control.

FAQs

What is the main difference between AI-driven deployment in CI/CD and MLOps?

The key difference is that traditional CI/CD operates on deterministic code paths, where behavior is predictable and testable with pass or fail conditions. In contrast, MLOps deploys probabilistic systems where behavior depends on data, model performance, and changing real-world inputs. This shifts deployment decisions from correctness to acceptable performance thresholds.

Why do standard CI/CD pipelines struggle with ML model deployments?

Most default CI/CD tools are built around binary decision-making—tests pass or fail. ML workflows require evaluating metrics like accuracy, precision, or drift within acceptable ranges. Without support for threshold-based gating, artifact versioning beyond code, and conditional workflows, pipelines become fragile or overly complex.

How should engineering teams adapt their pipelines for AI and ML workloads?

Teams should extend their CI/CD pipelines to include model training, evaluation stages, and conditional deployment logic. This includes versioning model artifacts, defining performance thresholds, enabling branching workflows for experiments, and integrating monitoring systems that can trigger retraining when needed.

Does AI-driven automation increase risk in deployment pipelines?

Yes—especially in MLOps. In traditional CI/CD, AI can optimize decisions like rollback timing or test selection within predictable systems. In MLOps, AI operates on top of already uncertain systems, compounding risk. This is why guardrails, visibility, and clear approval policies are critical for maintaining control.

What should engineering leaders look for in a CI/CD platform supporting both CI/CD and MLOps?

Engineering leaders should evaluate whether the platform can handle both deterministic and probabilistic workflows. This includes support for conditional logic, artifact traceability (including models and datasets), scalable experimentation (e.g., canary or A/B deployments), and cost predictability as pipeline complexity grows. These capabilities directly impact outcomes like deployment frequency, reliability, and total cost of ownership.

The post How does AI-driven deployment differ between traditional software and ML models (MLOps)? appeared first on Semaphore.

What guardrails or policies should be in place when AI is part of deployment decisions (e.g., auto-rollback, approvals)?

Pete Miloravac — Tue, 24 Mar 2026 10:29:40 +0000

AI is quickly moving into the critical path of software delivery from test automation to deployment decisions like auto-rollbacks, approvals, and release gating.

For engineering leaders, this raises a practical and urgent question:

What guardrails do we need to safely use AI in our CI/CD pipeline without increasing risk?

If your continuous integration and continuous delivery (CI/CD) system becomes partially autonomous, you’re no longer just optimizing for speed – you’re redefining control, accountability, and failure handling.

Why This Matters for Engineering Leaders

Engineering managers and CTOs are already accountable for outcomes like deployment frequency, change failure rate, time to restore service (MTTR), and cost predictability.

AI promises improvements across all of these but without guardrails, it can just as easily increase failure rates, introduce opaque decision-making, and create unpredictable production behavior.

This is especially relevant for teams already dealing with slow or fragile pipelines, scalability limits, and rising CI/CD costs. Introducing AI into deployment decisions doesn’t just optimize the system, it changes its risk profile.

Where AI Fits in the CI/CD Pipeline

In modern continuous delivery systems, AI is starting to influence key decision points:

whether a deployment proceeds or is blocked
whether approvals are required or skipped
whether a rollback is triggered automatically
which tests are prioritized or skipped

At this point, your CI/CD pipeline stops being purely deterministic. It becomes a decision-making system under uncertainty.

That shift is where most teams get into trouble and where guardrails become essential.

A Practical Framework for AI Guardrails in CI/CD

Instead of thinking about guardrails as a checklist, it’s more useful to group them into four areas: control, safety, governance, and efficiency. This is how high-performing teams reason about AI in deployment decisions.

1. Control: Keep Humans in Charge

The most common mistake teams make is assuming AI decisions are always safe to automate. In reality, control must remain explicit and immediate.

Every AI-driven action should be overrideable. Engineers must be able to step in, require approvals, or disable automation entirely, especially during incidents. A useful pattern here is confidence-based decision-making: high-confidence scenarios can proceed automatically, while ambiguous cases require human review.

Without this layer, teams lose the ability to respond quickly under pressure which directly impacts MTTR.

2. Safety: Prevent Cascading Failures

Speed without safety is where AI becomes dangerous.

Auto-rollback is a good example. While it can reduce recovery time, poorly designed rollback logic can create loops, deploy, fail, rollback, redeploy, amplifying instability instead of containing it.

High-performing teams define boundaries around where AI can act. For example, allowing autonomous decisions in staging or low-risk services, while requiring stricter controls in production systems, databases, or revenue-critical paths.

The goal is not just fast recovery, but stable recovery under pressure.

3. Governance: Make Every Decision Traceable

As soon as AI is involved in deployment decisions, explainability becomes non-negotiable.

Every action, whether it’s skipping an approval or triggering a rollback, should be accompanied by a clear, inspectable reason. Not just for debugging, but for compliance, security reviews, and internal trust.

This also ties into accountability. Teams need to know:

what decision was made
why it was made
what data influenced it

Without this, you introduce a new class of operational risk: decisions no one fully understands.

4. Efficiency: Control Cost and Scale

One of the less obvious risks of AI in CI/CD pipelines is cost creep.

AI-driven decisions can increase:

pipeline executions
test runs
infrastructure usage

Without explicit constraints, teams can lose cost predictability: one of the core evaluation criteria for engineering leaders.

This is why mature teams introduce cost guardrails alongside technical ones: limits on execution, visibility into cost per deployment, and alignment between automation behavior and budget constraints.

Example: AI Guardrails in a CI/CD Pipeline

To make this concrete, here’s what a guarded deployment flow might look like:

if (deployment_risk_score > 0.8):
    require_manual_approval()

elif (error_rate_increase > 20%):
    trigger_auto_rollback()
    notify_team()

elif (confidence_score < 0.6):
    block_deployment()

else:
    proceed_with_deployment()

The important detail here is not the logic itself, it’s the fact that AI operates within explicit, enforceable boundaries , not as an autonomous system.

How This Should Influence Your CI/CD Platform Choice

Once AI enters your deployment workflow, your requirements for CI/CD tooling change.

It’s no longer enough to have pipelines that “run.” You need systems that can express and enforce decision logic clearly.

When evaluating platforms, engineering leaders should ask:

Can we define dynamic approval policies based on context?
Does the pipeline support conditional logic and branching at scale?
Do we have full visibility into why decisions were made?
Can we enforce both technical and cost guardrails?

Many default tools start to break down here not because they can’t run pipelines, but because they struggle to model complex, conditional workflows with transparency.

How This Looks in a Modern CI/CD Platform

In a modern CI/CD platform, guardrails are not bolted on, they are part of how pipelines are defined.

You should expect:

pipeline logic that encodes decision-making clearly
visibility into every action taken by the system
flexible approval workflows that adapt to context
performance and cost that remain predictable at scale

This is especially important for teams that have outgrown default tools and need both speed and control as they scale.

Strategic Takeaway

AI in CI/CD is not just an automation upgrade, it’s a shift in how deployment decisions are made.

The teams that benefit most are not the ones that automate the fastest, but the ones that introduce AI with clear boundaries and strong governance.

They move faster, reduce toil, and improve reliability without increasing risk.

Final Thought

The real question isn’t whether AI should be part of your deployment pipeline.

It’s whether your system is designed to control how AI makes decisions.

Because once AI is in the loop, your CI/CD pipeline is no longer just executing code: it’s making decisions on your behalf.

FAQs

Should AI be allowed to approve deployments automatically?

Only in clearly defined, low-risk scenarios. High-performing teams use confidence thresholds and risk scoring to decide when AI can act autonomously versus when human approval is required. For production or revenue-critical systems, manual approval should remain the default unless there is strong historical reliability.

What’s the safest way to implement AI-driven auto-rollbacks?

Auto-rollbacks should always be bounded by safeguards:

* Limit the number of consecutive rollbacks to prevent loops

* Require human intervention after repeated failures

* Tie rollback triggers to multiple signals (e.g., error rate + latency), not a single metric

The goal is controlled recovery—not blind automation.

Can AI replace human judgment in CI/CD pipelines?

No—and it shouldn’t. AI should augment decision-making, not replace it. The most effective teams treat AI as a recommendation system operating within strict boundaries, with humans retaining final control in ambiguous or high-risk situations.

What’s the biggest mistake teams make when introducing AI into CI/CD?

Treating AI as fully autonomous too early.

Teams often skip incremental rollout and guardrails, which leads to unpredictable behavior, higher failure rates, and loss of control. The most successful teams introduce AI gradually, with strict boundaries and continuous monitoring.

Where should AI be introduced first in the CI/CD pipeline?

Start with low-risk, high-signal areas:

– Test prioritization

– Flaky test detection

– Non-critical deployment environments (e.g., staging)

– Only expand into production decision-making once reliability and behavior are well understood.

The post What guardrails or policies should be in place when AI is part of deployment decisions (e.g., auto-rollback, approvals)? appeared first on Semaphore.

How to Add AI Test Selection Without Breaking CI Reliability

Pete Miloravac — Fri, 20 Mar 2026 11:26:00 +0000

AI-based test selection promises faster CI builds by running only the tests most likely to be impacted by a code change. In large repositories with thousands of tests, this can significantly reduce build times.

But there’s a trade-off.

If implemented poorly, AI test selection can reduce reliability, increase escaped defects, and erode trust in CI pipelines.

This article explains how to introduce AI-driven test selection safely, without sacrificing CI reliability.

What AI Test Selection Actually Does

AI test selection typically analyzes signals such as:

Files changed in a commit
Historical test results
Code ownership patterns
Dependency graphs
Past failure correlations

Based on these inputs, the system predicts which subset of tests is sufficient for validating a given change.

Instead of running 5,000 tests, the pipeline might run 600.

The goal is faster feedback. The risk is incomplete validation.

The Core Reliability Risk

The primary risk is false negatives.

If AI skips a test that should have run, the build passes even though a regression exists.

This leads to:

Defects escaping into production
Broken main branches
Increased rollback frequency
Loss of confidence in CI

Speed improvements must never compromise signal integrity.

Step 1: Establish a Strong Baseline First

AI test selection should not be introduced into an unstable pipeline.

Before adopting it, ensure:

Flaky tests are minimized
Full test suites are reliable
Test reporting is consistent
Historical build data is available

CI systems like Semaphore provide structured test reports that help track stability over time.

If your baseline signal is noisy, AI will learn from noise.

Step 2: Start in Observation Mode

Do not immediately replace full test runs.

Instead:

Run AI test selection in parallel with the full suite.
Record which tests AI would have skipped.
Compare outcomes over multiple weeks.

Key metrics to track:

Missed failure rate
Over-selection rate (too many tests selected)
Build time difference
False confidence incidents

Only after observing stable accuracy should AI influence actual test execution.

Step 3: Keep Full Test Runs on Main

A safe pattern is:

Pull requests: AI-selected tests
Main branch: full regression suite

This creates a safety net.

Even if AI misses something during PR validation, the main branch will catch it before production deployment.

This layered approach preserves CI reliability while reducing feedback time for developers.

Step 4: Define Guardrails Explicitly

AI test selection should operate within constraints.

Examples:

Never skip security tests
Never skip migration tests
Always run smoke tests
Always run tests touching core modules

These rules provide deterministic safety boundaries around probabilistic selection.

CI workflows can enforce structured stages and test groupings.

AI should operate inside those defined structures.

Step 5: Log What Was Skipped

Transparency is critical.

Every AI-assisted test run should record:

Which tests were selected
Which tests were skipped
Why they were skipped (if explainable)
Model version used

When regressions occur, teams must verify whether skipped tests would have detected them.

Without traceability, trust declines quickly.

Step 6: Monitor Escaped Defects

Reliability is not measured only by build time.

Track:

Post-merge failures
Production incidents linked to skipped tests
Rollback frequency
Defect escape rate

If defect rates increase after introducing AI selection, the optimization is too aggressive.

Speed gains must not come at the cost of quality.

Step 7: Periodically Re-Train or Re-Validate

Codebases evolve.

Test coverage shifts.

Dependencies change.

New failure patterns emerge.

AI test selection models must be:

Re-evaluated periodically
Updated with fresh data
Validated against full-suite comparisons

Treat AI configuration like infrastructure — versioned, reviewed, and monitored.

Step 8: Avoid Over-Optimization

There is a diminishing return point.

Reducing test runs from 5,000 to 1,000 may provide major gains.

Reducing from 1,000 to 200 may introduce disproportionate risk.

Find the balance where:

Build times improve significantly
Confidence remains high
Escaped defect rate does not increase

Optimization without measurement is gambling.

A Safe Rollout Strategy

A practical rollout might look like this:

Measure full-suite baseline performance.
Introduce AI in shadow mode.
Compare AI vs full-suite outcomes.
Gradually allow AI to control PR test selection.
Keep full tests on main.
Monitor quality metrics continuously.

At any point, be ready to revert to deterministic full runs.

When AI Test Selection Works Well

AI selection tends to perform best when:

The repository is large
Test coverage is strong
Flakiness is low
Historical data is rich
Changes are modular

It performs poorly when:

Tests are unstable
Coverage is inconsistent
Architectural boundaries are unclear
Failure data is sparse

AI amplifies existing structure. It does not create it.

Summary

AI test selection can significantly reduce CI build times, but it introduces reliability risk if not carefully managed.

To add AI test selection safely:

Start with stable full-suite baselines
Run AI in observation mode first
Keep full regression suites on main
Define deterministic guardrails
Log skipped tests
Monitor defect escape rates

CI reliability must remain the priority.

Optimization is valuable. Confidence is essential.

FAQ

Can AI safely skip tests in CI?

Yes, but only with guardrails, observation periods, and continuous monitoring of defect escape rates.

Should full test suites ever be removed?

Generally no. Keeping full runs on main or scheduled pipelines preserves safety.

What is the biggest risk of AI test selection?

False negatives — skipped tests that would have caught regressions.

How do I measure success?

Track build time reduction alongside defect escape rate and rollback frequency.

The post How to Add AI Test Selection Without Breaking CI Reliability appeared first on Semaphore.

MCP OAuth in Practice: Lessons from Building Authentication for AI Agents

Pete Miloravac — Thu, 19 Mar 2026 10:25:57 +0000

As AI agents become a core part of modern development workflows, the need for secure, flexible authentication is quickly becoming essential.

In our latest product news episode, our engineering team takes a deep dive into how we implemented OAuth for Semaphore’s MCP server—and what we learned along the way.

This work is part of a broader shift in how we’re evolving Semaphore: from a traditional CICD platform into a foundation for running AI-powered developer workflows safely, transparently, and at scale.

Why OAuth Matters for MCP

As MCP servers move from local environments to remote infrastructure, authentication becomes critical.

API keys aren’t enough for this new world of agents and integrations. OAuth provides a more secure and flexible way for agents to authenticate and interact with MCP servers—while giving developers control over permissions and access.

But implementing OAuth in this ecosystem isn’t straightforward.

The Challenge: A Moving Target

One of the biggest challenges we encountered is that the MCP ecosystem is still evolving rapidly.

Different agents:

Discover authentication endpoints in slightly different ways
Support different versions of the MCP spec
Handle OAuth flows inconsistently

Even small differences—like variations in URL paths—can break integrations.

On top of that, the MCP specification itself is changing quickly. New versions introduce new concepts, while older ones are still the most widely supported.

Key takeaway: the “latest” spec isn’t always the most practical one to implement.

Client Registration & Discovery Complexity

A core challenge in OAuth for MCP is client registration —how agents and servers identify and trust each other.

In a traditional system, this is predictable. In MCP:

You don’t always know the client in advance
You don’t control how agents behave
Discovery mechanisms vary across implementations

This creates friction in establishing secure, reliable authentication flows.

Real-World Testing Beats Theory

Another major learning: testing across real agents is essential.

Specs alone aren’t enough.

We found that:

Some clients behave differently than documented
Errors are often unclear or missing
Local + browser-based clients introduce additional complexity (like CORS issues)

Tools like MCP Jam Inspector proved invaluable for debugging and understanding how OAuth flows actually behave step by step.

Rethinking Authorization: Beyond Identity Providers

We initially used Keycloak as an identity provider—which worked well for authentication.

But when it came to fine-grained authorization (like project-level permissions), limitations became clear.

To maintain flexibility and control, we:

Kept identity management external
Built our own authorization logic internally

This aligns with a broader principle behind Semaphore’s evolution:

developers should stay in control, while automation handles execution.

Practical Advice for Developers

If you’re implementing OAuth for MCP today, here’s what we recommend:

Start with a stable, widely supported spec (not the newest one)
Test with multiple agents early and often
Expect inconsistencies—and design for them
Focus on real-world compatibility over theoretical completeness

In short: be conservative, iterate quickly, and validate everything in practice.

What This Means for Semaphore

This work is a building block for a bigger vision.

As we extend CICD with AI-driven capabilities, secure and flexible authentication becomes foundational. MCP servers—and the agents that use them—will play a key role in enabling:

Agent-driven workflows
Secure automation at scale
Developer-controlled AI systems

This is how we move from pipelines to programmable, intelligent workflows—without sacrificing transparency or control.

🚀 Try Semaphore

Ready to explore what’s next for CICD and AI-powered development?

👉 Sign up for Semaphore and start building smarter, more automated workflows today.

The post MCP OAuth in Practice: Lessons from Building Authentication for AI Agents appeared first on Semaphore.