Kav Pather for Air Pipe

Posted on Feb 18, 2025 • Originally published at blog.airpipe.io on Feb 17, 2025

Latency based container scaling with Orbit

#rust #containers #opensource #devops

Haven't read Part 1? Start with Building Orbit: A Lightweight Container Orchestrator in Rust to learn about our journey's beginning.

In our previous article, we introduced Orbit, our lightweight container orchestrator built in Rust. Since then, we've made significant improvements driven by both community feedback and production requirements. Let's dive into the technical evolution that's making Orbit even more powerful and efficient.

Community-Driven Development

One of the most exciting aspects of Orbit's development has been the community engagement. A perfect example is our implementation of CoDel (Controlled Delay) for scaling decisions, which came directly from a community member's suggestion on Medium. We're also grateful to community members like Josselin Chevalay who contributed the pull_policy feature in our latest release, allowing control over container image pulling behavior. This collaborative approach will continue to help shape Orbit's feature set and technical direction.

Technical Evolution: Key Improvements

1. CoDel (Controlled Delay) - Inspired Scaling: Latency-Driven Container Orchestration

Unlike traditional orchestrators that rely solely on CPU and memory metrics, we've implemented CoDel-based/inspired scaling - a feature not natively available in Kubernetes or other major orchestrators. Here's how it works:

pub struct CoDelMetrics {
    service_name: String,
    sojourn_times: VecDeque<(Instant, Duration)>,
    first_above_time: Option<Instant>,
    last_scale_time: Instant,
    config: CoDelConfig,
}

name: adaptive-scaling
instance_count:
  min: 2
  max: 10

# CoDel-inspired adaptive scaling based on request latency
codel:
  target: 100ms                   # Target latency threshold
  interval: 1s                    # Interval for checking delays
  consecutive_intervals: 3        # Number of intervals above target before scaling
  max_scale_step: 1              # Maximum instances to scale up at once
  scale_cooldown: 30s            # Minimum time between scaling actions
  overload_status_code: 503      # Return 503 when overloaded

# Fine-tune scaling behavior
scaling_policy:
  cooldown_duration: 60s         # Wait time between scaling actions
  scale_down_threshold_percentage: 50.0  # Scale down if usage below 50%

spec:
  containers:
    - name: main
      image: airpipeio/infoapp:latest
      ports:
        - port: 80
          node_port: 4335

The CoDel inspired implementation monitors request latency and makes intelligent scaling decisions based on both immediate and historical performance data. Benefits include:

More responsive scaling based on actual service performance
Better handling of latency spikes
Prevention of unnecessary scale-ups during temporary load increases

Note that this is just our initial implementation, and we will continue to improve where possible and perhaps rename when appropriate.

Key Differences from Traditional CoDel:

Service-Level Application :
- Our implementation applies CoDel principles at the service level rather than packet level
- Uses request latency instead of packet sojourn time
- Focuses on scaling rather than packet dropping
State Management :
- This is simpler than traditional CoDel's state machine.

pub struct CoDelMetrics {
    sojourn_times: VecDeque<(Instant, Duration)>,
    first_above_time: Option<Instant>,
    last_scale_time: Instant,
}

2. Health Monitoring

We've added the usual health monitoring with TCP health checks:

pub struct HealthCheckConfig {
    pub startup_timeout: Duration,
    pub startup_failure_threshold: u32,
    pub liveness_period: Duration,
    pub liveness_failure_threshold: u32,
    pub tcp_check: Option<TcpHealthCheck>,
}

This system provides:

Configurable health check parameters
TCP-level connectivity verification
Granular control over failure thresholds
Separate startup and liveness checks

3. Performance Optimizations

We've made several low-level optimizations to improve performance:

Switching to FxHashMap/FxHashSet

use rustc_hash::{FxHashMap, FxHashSet};

pub static INSTANCE_STORE: OnceLock<
    Arc<RwLock<FxHashMap<String, FxHashMap<Uuid, InstanceMetadata>>>>
> = OnceLock::new();

By replacing standard HashMap with FxHashMap:

Reduced memory overhead
Faster hash computation
Better performance for string keys
Lower collision rates in our specific use cases

4. Improved Resource Management

We've implemented a more sophisticated resource management system:

pub struct ResourceThresholds {
    pub cpu_percentage: Option<u8>,
    pub cpu_percentage_relative: Option<u8>,
    pub memory_percentage: Option<u8>,
    pub metrics_strategy: PodMetricsStrategy,
}

This allows for:

Fine-grained control over resource utilization
Better handling of CPU quota management
More accurate memory tracking
Customizable metrics aggregation strategies

Real-World Impact

These improvements have had significant real-world impact:

30% reduction in unnecessary scaling operations
More stable performance under varying load conditions
Reduced resource usage in the orchestrator itself
Better handling of microservices with varying performance characteristics
Still managed to retain a <5MB binary size footprint

What's Next: Decentralized Clustering!?

We're excited to explore our next major development focus: a decentralized clustering solution. This will allow Orbit to:

Operate without a central control plane
Provide better resilience in edge deployments
Enable peer-to-peer node coordination
Support dynamic cluster topology changes

We have some initial ideas on how to design the solution, so please follow for our next update to see how we hope to make this happen!

Building at Scale with Air Pipe

While Orbit handles container orchestration, it's just one piece of the puzzle. At Air Pipe, we're building a comprehensive platform for creating scalable, resilient APIs, integrations, and workflows. Our platform enables you to:

Build and deploy scalable APIs with minimal boilerplate
Create robust integration workflows
Implement resilient data processing pipelines
Leverage edge computing capabilities

If you're building distributed systems or scalable applications, visit airpipe.io to learn how our platform can accelerate your development.

Get Involved

We're building Orbit in the open and value community input. Whether you're interested in the technical details or want to contribute to our upcoming clustering features:

Star us on GitHub
Join our Discord community
Visit Air Pipe
Follow our progress as we build the decentralized clustering solution

Stay tuned for our next technical deep-dive where we'll explore the architecture of our decentralized clustering approach!

DEV Community