DEV Community: Daniel Kraszewski

Kubernetes HPA Scale to Zero Without KEDA: Native Autoscaling for Idle Workloads

Daniel Kraszewski — Wed, 27 May 2026 10:00:00 +0000

If you run queue processors, batch workers, or event-driven workloads that sit idle for hours between bursts, you're paying for compute you don't need. Kubernetes HPA can scale these deployments to zero replicas — no KEDA, no Knative, no external controllers required. You need one feature gate, an external metrics source, and about twenty minutes of setup. When the queue is empty your pods disappear, and if you pair this with cluster autoscaler, the nodes disappear too. Real scale-to-zero, using nothing but native Kubernetes primitives.

Quick Reference

Requirement	Detail
Feature gate	`HPAScaleToZero=true` (available since Kubernetes 1.16)
Minimum replicas	`minReplicas: 0` in the HPA spec
Metrics source	Must use external or custom metrics (not CPU/memory)
Scale-to-zero trigger	Metric value drops to zero
Scale-from-zero trigger	Metric value rises above zero
Why not CPU/memory?	No pods means no resource metrics to observe — the HPA controller needs a signal that exists independently of the pods

Local Setup with kind

Since HPAScaleToZero requires an explicit feature gate, we need a cluster that has it enabled on both the API server and the controller manager. kind makes this straightforward — especially on Kubernetes 1.36, which at the time of writing is too new for most managed providers.

If you haven't already built the node image:

kind build node-image --type release v1.36.0

Create a cluster config that enables the feature gate:

# kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
  HPAScaleToZero: true
nodes:
  - role: control-plane
  - role: worker

kind create cluster --name hpa-scale-to-zero --config kind-config.yaml --image kindest/node:v1.36.0

The featureGates field at the cluster level propagates the flag to all relevant components (kube-apiserver, kube-controller-manager, kubelet). No need to manually patch component configs.

The Metrics Pipeline

The HPA controller can't scale from zero based on CPU or memory — there are no pods to measure. You need a metric that exists outside the workload itself. For queue-based workloads, the natural choice is the queue length: how many items are waiting to be processed.

We'll use Redis as the queue (specifically a Redis list), expose its length via a redis-exporter sidecar, scrape it with Prometheus, and surface it to the Kubernetes metrics API through prometheus-adapter. This is the same pipeline pattern you'd use in production — the only difference is we're running everything in a single kind cluster.

Deploy Redis with an Exporter

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
  --set architecture=standalone \
  --set auth.enabled=false \
  --set metrics.enabled=true \
  --set metrics.extraArgs.check-keys=work-queue

The metrics.enabled=true deploys a redis-exporter sidecar alongside Redis. The --check-keys work-queue argument tells the exporter to emit redis_key_size{key="work-queue"} — the length of our Redis list. That's the metric we'll use to drive the HPA. The chart also sets up the correct Prometheus scrape annotations automatically.

Deploy Prometheus and the Adapter

The prometheus-community Helm charts get us running in two commands:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm install prometheus prometheus-community/prometheus \
  --set server.service.type=ClusterIP \
  --set alertmanager.enabled=false \
  --set kube-state-metrics.enabled=false \
  --set prometheus-node-exporter.enabled=false \
  --set prometheus-pushgateway.enabled=false

This gives us a minimal Prometheus that auto-discovers pods annotated with prometheus.io/scrape: "true" — which the Bitnami Redis chart already configures.

Now deploy the prometheus-adapter with a custom external metrics rule:

helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --set prometheus.url="http://prometheus-server" \
  --set prometheus.port=80 \
  --set-json 'rules.external=[{"seriesQuery":"{__name__=\"redis_key_size\",key=\"work-queue\"}","metricsQuery":"sum(<<.Series>>{<<.LabelMatchers>>})","name":{"as":"redis_queue_length"},"resources":{"namespaced":false}}]'

This rule tells the adapter: take the redis_key_size metric where key="work-queue", expose it as an external metric called redis_queue_length, and don't filter by namespace (since the queue is a cluster-wide resource — there's only one Redis).

After a minute or so, verify the external metric is available:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/redis_queue_length" | jq .

You should see a response with a value of 0 (since the queue is empty).

The Worker and HPA

The worker is deliberately simple — a shell script that pops items from the Redis list and "processes" them by sleeping for a second. In production this would be your actual consumer code.

Important: start the deployment with replicas: 1, not zero. The HPA controller uses a ScaledToZero condition internally — it only scales from zero if it was the one that previously scaled the workload to zero. A deployment that starts at zero replicas with a fresh HPA will never scale up. Let the HPA handle the initial scale-down to zero on its own.

# worker.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: queue-worker
spec:
  replicas: 1  # start at 1 — the HPA will scale to zero once metrics are available
  selector:
    matchLabels:
      app: queue-worker
  template:
    metadata:
      labels:
        app: queue-worker
    spec:
      containers:
        - name: worker
          image: redis:7
          command:
            - /bin/sh
            - -c
            - |
              while true; do
                item=$(redis-cli -h redis-master BRPOP work-queue 30)
                if [ -n "$item" ]; then
                  echo "Processing: $item"
                  sleep 1  # simulate work
                fi
              done

Now the HPA that scales this deployment between 0 and 10 replicas based on the queue length:

# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-worker
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 0  # this is what HPAScaleToZero enables
  maxReplicas: 10
  metrics:
    - type: External
      external:
        metric:
          name: redis_queue_length
        target:
          type: Value
          value: "5"  # one pod per 5 queue items
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15

The target.value: "5" means the HPA will try to maintain a ratio of one pod per five pending items. When the queue has 20 items, you get 4 pods. When it drops to zero, you get zero pods. The behavior.scaleDown section shortens the stabilization window from the default 5 minutes to 60 seconds — useful for demos and workloads where you want faster scale-to-zero.

Seeing It in Action

Apply the worker and HPA:

kubectl apply -f worker.yaml -f hpa.yaml

Open a watch in one terminal:

kubectl get hpa queue-worker -w

Since the queue is empty and the deployment starts at 1 replica, the HPA will first scale it down to zero. This initial scale-down is critical — it sets the internal ScaledToZero condition that enables future scale-from-zero behavior. After about a minute (our shortened stabilization window), you'll see replicas drop to 0.

Now push 20 items into the queue:

kubectl exec redis-master-0 -c redis -- sh -c '
  for i in $(seq 1 20); do
    redis-cli LPUSH work-queue "job-$i"
  done
'

Within 30-60 seconds (Prometheus scrape interval + adapter refresh + HPA sync period), you'll see the HPA scale the deployment from 0 to 4 replicas. The workers will process the items — one per second per worker — and as the queue drains, the replica count drops. Once the queue is empty and the metric reads zero, the HPA scales the deployment back to zero replicas. Pods gone.

Here's what the kubectl get hpa queue-worker -w output looks like over the full cycle:

NAME           REFERENCE                 TARGETS   MINPODS   MAXPODS   REPLICAS
queue-worker   Deployment/queue-worker   0/5       0         10        1
queue-worker   Deployment/queue-worker   0/5       0         10        0
queue-worker   Deployment/queue-worker   20/5      0         10        0
queue-worker   Deployment/queue-worker   20/5      0         10        4
queue-worker   Deployment/queue-worker   8/5       0         10        4
queue-worker   Deployment/queue-worker   0/5       0         10        2
queue-worker   Deployment/queue-worker   0/5       0         10        0

Push items again — the HPA scales from zero immediately, no manual intervention needed. The cycle repeats indefinitely.

Cluster Autoscaler Integration

When a deployment scales to zero, its pods vanish. If those pods were the only workload on a node, the node becomes empty. Cluster autoscaler (or any node lifecycle manager) will notice the idle node and remove it after its configurable cool-down period — typically 10 minutes.

This is where the real cost savings live. You're not just saving pod-level resources; you're saving entire node-hours. For a workload that processes a queue for two hours a day and sits idle for twenty-two, you go from paying for 24 node-hours to paying for roughly 2.5 (accounting for scale-up overhead).

The cold-start chain matters. When an item hits the queue after a period of silence, here's what happens in sequence:

Prometheus scrapes the redis-exporter (scrape interval, typically 15s)
Prometheus-adapter picks up the new value (adapter refresh, typically 30s)
HPA sync loop fires (default 15s) and decides to scale up
Scheduler assigns the pod to a node — if no nodes are available, cluster autoscaler provisions one (60-120s for cloud providers)
Kubelet pulls the container image and starts the pod
Pod becomes Ready

In a warm cluster (nodes already present), steps 1-5 take about 30-60 seconds total. When a new node must be provisioned, add 1-2 minutes. For workloads where a minute or two of latency is unacceptable, consider these mitigations:

Smaller container images reduce pull time. Distroless or scratch-based images start faster.
Priority-based overprovisioning: deploy a low-priority "placeholder" pod that reserves a node. When real work arrives, the placeholder gets preempted and the worker starts immediately — no node provisioning needed.
Shorter scrape intervals on Prometheus reduce the detection lag.

For batch and queue workloads, the 30-60 second cold start is typically fine. You're processing a backlog, not serving interactive requests.

When to Use This vs. KEDA

KEDA is the most popular alternative for event-driven autoscaling in Kubernetes. It supports 60+ scalers (Redis, SQS, RabbitMQ, Kafka, Azure Queue, and many more), handles the metrics plumbing internally, and can scale to zero without any feature gate.

Use native HPA scale-to-zero when:

You already have Prometheus and a metrics adapter deployed
You want fewer moving parts and no additional CRDs
Your scaling logic is simple (one metric, linear scaling)
You prefer to stay within the boundaries of native Kubernetes APIs

Use KEDA when:

You don't have Prometheus or don't want to maintain the adapter
You need advanced scaling logic (multiple triggers, complex cooldowns, cron-based scaling)
You want out-of-the-box support for dozens of event sources without manual adapter configuration
You need ScaledJobs (scaling Kubernetes Jobs rather than Deployments)

There's no shame in either choice. KEDA is excellent software. The native approach is simpler when your infrastructure already includes Prometheus — you're just connecting pieces that already exist.

Gotchas

You cannot use CPU or memory metrics to scale from zero. This is the most common mistake. When replica count is zero, there are no pods, so there's no CPU or memory to measure. The HPA controller requires at least one non-resource metric (external or custom) to make scaling decisions at zero replicas. If you configure only resource metrics, the HPA will refuse to scale from zero.

The deployment must not start at zero replicas. This is the subtlest gotcha and I haven't seen it documented clearly anywhere. The HPA controller maintains a ScaledToZero condition internally. It only scales FROM zero if it was the one that previously scaled the workload TO zero. If you deploy with replicas: 0 and create a fresh HPA, the controller sees zero replicas but no ScaledToZero condition, and sets ScalingActive: False with reason ScalingDisabled. The fix is simple: start with replicas: 1 and let the HPA scale it down naturally. Once the HPA has performed that initial scale-to-zero, the ScaledToZero condition is set to True and all subsequent scale-from-zero operations work correctly.

The stabilization window still applies at zero. By default, the HPA waits 5 minutes of sustained low metrics before scaling down. This applies to the transition from 1 to 0 as well. If you want faster scale-to-zero, configure the behavior.scaleDown section with a shorter stabilizationWindowSeconds.

The feature gate is alpha — but don't let that scare you. HPAScaleToZero has been available since Kubernetes 1.16 without breaking changes. It hasn't graduated to beta primarily due to KEP process inertia, not because of instability. The implementation is a small conditional in the HPA controller that allows minReplicas: 0 when the feature gate is enabled. It's been running in production clusters for years.

PodDisruptionBudgets at zero replicas. A PDB with minAvailable: 1 on a deployment that's at zero replicas won't prevent the scale-to-zero — PDBs apply to voluntary eviction of running pods, and scaling down is a different path. However, consider what happens if you have a PDB and scale from zero to one: the single pod is protected by the PDB immediately upon becoming ready.

Wrapping Up

The combination of native HPA scale-to-zero, external metrics from Prometheus, and cluster autoscaler gives you genuine pay-for-what-you-use economics on idle workloads. No third-party controllers, no additional CRDs, no vendor lock-in. The feature gate has been stable for twenty releases. Enable it, point your HPA at a queue metric, set minReplicas: 0, and stop paying for compute that's doing nothing.

The KEP tracking this feature is KEP-2021: HPA Scale to Zero — if you want to see it graduate to beta and become the default, that's the issue to watch and engage with.

Architecting and Operating Geospatial Workflows with Dagster

Daniel Kraszewski — Tue, 17 Feb 2026 12:00:00 +0000

This article is a technical companion to our earlier essay on geospatial data orchestration (link: https://u11d.com/blog/geospatial-data-orchestration/).

Geospatial data pipelines present a distinct set of challenges that traditional ETL frameworks struggle to address. Raster datasets measured in gigabytes, coordinate reference system transformations, temporal partitioning across decades of observations, and the need for reproducible lineage across every derived artifact-these requirements demand an orchestration model that treats data as the primary actor rather than a byproduct of task execution. This article describes the architecture we developed for water management analytics: a system that ingests elevation models, meteorological observations, satellite imagery, and forecast data to support flood prediction and hydrological modeling.

The System Boundary

The platform serves as a data preparation layer, not a model serving infrastructure. Its responsibility begins at external data sources-WFS endpoints, FTP servers, governmental APIs-and ends at materialized artifacts in object storage ready for consumption by downstream ML training pipelines and QGIS-based analysts. We deliberately exclude real-time inference, user-facing APIs, and visualization from this system's scope.

Two Dagster projects comprise the workspace. The landing-zone project handles all data ingestion and transformation: elevation tiles, meteorological station observations, satellite imagery, climatic indices, and forecast GRIB files. The discharge-model project consumes these prepared datasets to train neural network models for water discharge prediction using NeuralHydrology. Both projects share a common library under shared/ containing IO managers, resource definitions, helper utilities, and cross-project asset references.

The platform does not manage model deployment, handle user authentication, or serve predictions. Those responsibilities belong to separate systems that consume our outputs via S3-compatible object storage.

Data Types and Access Patterns

Three fundamental data types flow through the system, each with distinct storage and access characteristics.

Raster data-digital elevation models, satellite imagery, land cover maps-dominates storage volume. We store all rasters as Cloud Optimized GeoTIFFs (COGs) in S3, enabling range-request access for partial reads. The elevation pipeline alone processes tiles from multiple national geodetic services, converting formats like ARC/INFO ASCII Grid and XYZ point clouds into standardized COGs with consistent coordinate reference systems. A single consolidated elevation model can exceed several gigabytes.

Tabular data encompasses meteorological observations, hydrological measurements, station metadata, and computed indices. We standardize on Parquet with zstd compression, leveraging Polars for in-process transformations and DuckDB for SQL-based quality checks. A custom IO manager handles serialization of both raw DataFrames and Pydantic model collections, automatically recording row counts and column schemas as Dagster metadata.

Vector data-catchment boundaries, station locations, regional polygons-exists primarily as intermediate artifacts used for spatial joins and raster clipping operations. Shapely geometries serialize alongside tabular records in Parquet files, with coordinate transformations handled through pyproj.

All data resides in S3-compatible object storage. The storage layout follows a predictable convention: s3://{bucket}/{asset-path}/{partition-keys}/filename.{ext}, where asset paths derive directly from Dagster asset keys. This enables both programmatic access through IO managers and ad-hoc exploration via S3 browsers.

Asset Graph Design

Every durable artifact in the system maps to a Dagster asset. This is not a philosophical preference but a practical requirement: when a hydrologist questions why a model prediction differs from last month's, we need to trace backward through the exact elevation tiles, meteorological observations, and climatic indices that produced those training features.

Asset naming follows a hierarchical convention reflecting data lineage. Elevation data progresses through elevation/dtm/wfs_index → elevation/dtm/raw_tiles → elevation/dtm/converted_tiles → elevation/dtm/consolidated. Each stage represents a distinct, independently materializable artifact with its own partitioning strategy. The shared/assets/landing_zone.py module maintains a centralized registry of asset keys, enabling type-safe cross-project references:

ELEVATION_DSM = AssetKey(["elevation", "dsm"])
ELEVATION_DTM = AssetKey(["elevation", "dtm"])
FORECAST_GRIB_RAW = AssetKey(["forecast", "grib", "raw"])
CLIMATIC_INDICES_CATCHMENT = AssetKey(["climatic_indices", "catchment"])

Dependencies between assets express both data flow and materialization order. The converted elevation tiles asset explicitly declares its dependency on raw tiles through the ins parameter, ensuring Dagster's asset graph correctly represents the relationship and prevents stale data from propagating downstream.

We employ Dagster's component pattern for assets that share structural similarities but operate on different data. The elevation pipeline defines Converted as a component that can be instantiated for both DTM and DSM processing, sharing conversion logic while maintaining separate asset keys and partition spaces.

Partitioning Strategy

Partitioning serves two purposes: it bounds the scope of individual materializations to manageable sizes, and it enables incremental updates without full recomputation. We use different partitioning strategies depending on the data's natural structure.

Elevation data partitions spatially by tile grid and region. A MultiPartitionsDefinition combines a tile index dimension with a regional dimension, allowing selective materialization of specific geographic areas. Dynamic partition definitions enable the tile catalog to grow without code changes-sensors read from index parquet files and issue AddDynamicPartitionsRequest calls to register new partitions.

Satellite imagery partitions temporally using year and month dimensions:

def get_multipartitions_def(self) -> dg.MultiPartitionsDefinition:
    return dg.MultiPartitionsDefinition({
        "index": self.indices_partition,
        "region": self.regions_partition,
    })

New temporal partitions register automatically through a sensor that monitors the catalog file for previously unseen year/month combinations.

Meteorological observations partition by data source and processing stage rather than by time. Then pipeline uses a sync-plan/execute-plan pattern where a planning asset determines which source files need synchronization, and an execution asset processes only the delta. This approach handles the irregular update patterns of governmental data sources more gracefully than fixed temporal partitions.

Raster Processing Pipeline

The raster processing subsystem converts heterogeneous input formats into standardized COGs suitable for ML feature extraction. A dedicated module provides the core transformation utilities, built on GDAL and Rasterio.

The main COG writing function orchestrates the complete transformation: coordinate reprojection, resolution resampling, nodata gap filling, geometry clipping, and overview generation. Memory management is critical—we process large rasters block-wise to avoid loading entire datasets into RAM:

def _fill_nodata_gaps(
    dataset: DatasetWriter,
    max_search_distance: float,
    smoothing_iterations: int,
) -> None:
    """Fill nodata gaps block-wise to avoid loading the whole raster into memory."""
    for _, window in dataset.block_windows(1):
        block = dataset.read(1, window=window, masked=True)
        if not np.ma.is_masked(block) or not block.mask.any():
            continue
        mask = np.asarray(~block.mask).astype(np.uint8)
        filled = fillnodata(
            block.filled(fill_value=dataset.nodata),
            mask=mask,
            max_search_distance=max_search_distance,
            smoothing_iterations=smoothing_iterations,
        )
        dataset.write(np.asarray(filled, dtype=block.dtype), 1, window=window)

Format-specific converters handle the idiosyncrasies of source data. The XYZ converter detects axis ordering issues in point cloud data, the ASCII grid converter parses ESRI's legacy format, and the VRT builder creates virtual rasters for multi-file operations. Each converter produces consistent metadata that downstream assets can rely on.

For smaller rasters below a configurable threshold, we process entirely in memory using Rasterio's MemoryFile. Larger rasters write to temporary files before final COG copy to S3 via GDAL's /vsis3/ virtual filesystem driver. This dual-path approach optimizes for both small-tile throughput and large-raster memory safety.

Data Quality as Dependencies

Data quality checks execute as first-class Dagster asset checks, not as afterthoughts in logging statements. The climatic indices pipeline defines sixteen distinct checks covering structural integrity, range validation, and business logic constraints:

@dg.multi_asset_check(
    specs=[
        dg.AssetCheckSpec("non_empty_data", asset=CLIMATIC_INDICES_CATCHMENT),
        dg.AssetCheckSpec("primary_key_uniqueness", asset=CLIMATIC_INDICES_CATCHMENT),
        dg.AssetCheckSpec("pet_avg_valid_range", asset=CLIMATIC_INDICES_CATCHMENT),
        dg.AssetCheckSpec("aridity_index_avg_valid_range", asset=CLIMATIC_INDICES_CATCHMENT),
        # ... additional checks
    ],
    name="catchment_climatic_indices_checks",
)
def _catchment_climatic_indices_checks(
    context: dg.AssetCheckExecutionContext, duckdb: DuckDBResourceExtended
) -> Iterable[dg.AssetCheckResult]:
    # Checks execute SQL against DuckDB, returning structured results

Checks validate domain-specific constraints: potential evapotranspiration must fall within 0-15 mm/day, aridity indices between 0-5, precipitation seasonality between -1 and +1. Each check returns structured metadata-not just pass/fail, but the actual values found, enabling rapid diagnosis when checks fail.

A helper function loads the asset's parquet file into a DuckDB temporary table, enabling SQL-based validation without loading the entire dataset into Python memory. This pattern scales to multi-million row datasets while keeping check execution times reasonable.

Geometry validation occurs inline during raster conversion. A dedicated validation function compares source and destination bounds, skipping tiles where reprojection would introduce unacceptable distortion. Rather than failing the entire partition, we record skipped tiles with explicit reasons, allowing manual review of edge cases.

Execution and Elasticity

The platform runs locally for development using uv run dg dev and deploys to Kubernetes for production workloads. Resource requirements vary dramatically across asset types-a metadata sync might need 256MB of memory, while elevation tile conversion demands 16GB.

We express resource requirements through operation tags that map to Kubernetes node pools:

class K8sOpTags:
    @staticmethod
    def xlarge() -> dict[str, Any]:
        return K8sOpTags._config(Nodes.M6I_2XLARGE, divider=2)

    @staticmethod
    def gpu() -> dict[str, Any]:
        return K8sOpTags._config(Nodes.G4DN_2XLARGE, divider=1)

The divider parameter enables fractional node allocation-running two medium workloads on a single large instance, or dedicating an entire GPU node to model training. Assets declare their requirements via op_tags=K8sOpTags.xlarge(), and the Kubernetes executor schedules pods accordingly.

Concurrent processing within assets uses a custom worker pool implementation that handles the messy realities of I/O-bound geospatial work: network timeouts, partial failures, and graceful cancellation. The worker pool provides retry policies, progress logging, and fail-fast behavior while aggregating per-item metrics:

result = worker_pool.process(
    items=tiles,
    worker=worker,
    logger=context.log,
    max_workers=self.max_workers,
    cancellation_event=cancellation_event,
    metrics_extractor=lambda res: {
        size_key: int(res.metadata.get(size_key, 0)),
    },
)
result.raise_if_failures()

The worker pool tracks success, failure, skip, and cancellation states for each item, building aggregate statistics that appear in Dagster's materialization metadata. When processing fails partway through, we know exactly which items succeeded and which need retry.

Observability and Lineage

Every asset materialization records structured metadata: row counts for tabular data, dimensions and CRS for rasters, processing duration, and custom metrics like total bytes written. The custom IO manager automatically captures table schemas and row counts; raster assets explicitly record resolution, bounds, and file sizes.

Sensors provide the observability layer for external data sources. The satellite data partition sensor polls catalog files every 30 seconds, logging new partition discoveries and registration requests. Forecast schedules run four times daily at fixed UTC times, with run keys that encode the scheduled timestamp for easy identification:

@dg.schedule(
    cron_schedule="30 8,13,16,20 * * *",
    target=dg.AssetSelection.assets(FORECAST_GRIB_RAW),
)
def forecast_grib_raw_schedule(context: dg.ScheduleEvaluationContext) -> dg.RunRequest:
    scheduled_time = context.scheduled_execution_time
    return dg.RunRequest(
        run_key=f"forecast_grib_raw_{scheduled_time.strftime('%Y%m%d_%H%M')}",
        tags={"schedule": "forecast_grib_raw"},
    )

The declarative automation condition enables declarative freshness policies-assets can specify how stale they're allowed to become, and Dagster automatically triggers materializations to maintain freshness. We use this for assets that need to stay current with upstream changes without manual intervention.

Asset lineage flows automatically from dependency declarations. When investigating a model prediction, we can trace from the model asset back through training data, through climatic indices, through individual station observations, to the original source files. This lineage persists across runs, enabling historical comparisons when methodology changes.

Lessons from Production: Architectural Revisions

Three architectural decisions required significant revision after initial deployment.

First, we underestimated the memory requirements for raster operations. Early implementations loaded entire tiles into memory, which worked fine for small test datasets but caused OOM kills on production-scale elevation models. The fix required systematic refactoring to block-wise processing throughout the raster pipeline-reading, transforming, and writing in chunks that fit within pod memory limits. This added complexity but eliminated an entire class of production incidents.

Second, we initially implemented dynamic partitions without proper cleanup logic. Sensors would happily add new partitions as data arrived, but nothing removed partitions for data that had been superseded or corrected. Over time, the partition space accumulated stale entries that confused operators and wasted storage. We added explicit partition lifecycle management: sensors now compare desired partitions against current state and issue both AddDynamicPartitionsRequest and DeleteDynamicPartitionsRequest as needed.

Third, we placed too much validation logic inside asset compute functions rather than as separate asset checks. This made debugging failures difficult-a validation error would fail the entire materialization, losing the partial work already completed. Extracting validation into dedicated asset checks with structured metadata output dramatically improved debuggability and allowed us to materialize "known-bad" data when necessary for investigation, running checks separately.

Open Questions and Next Steps

Several architectural questions remain unresolved in the current implementation.

The platform lacks a unified approach to schema evolution. When upstream data sources change their formats-which governmental APIs do without warning-we currently handle it through ad-hoc converter updates. A more systematic approach might involve schema registries or versioned asset definitions, but the right pattern for geospatial data with complex nested structures remains unclear.

Cross-project asset dependencies work through shared asset key definitions, but the materialization coordination relies on manual scheduling or external triggers. A more elegant solution might use Dagster's cross-code-location asset dependencies, but this would require restructuring how projects deploy and discover each other's assets.

Forwarding Cookies Using CloudFront: A Workaround for AWS Cache Policy Limitations

Daniel Kraszewski — Wed, 07 Jan 2026 12:00:00 +0000

When building our Terraform module for deploying Medusa on AWS, we ran into an unexpected challenge with Amazon CloudFront. We wanted to use CloudFront as a simple way to provide HTTPS and a public URL without requiring users to bring their own domain or SSL certificate. However, we discovered that CloudFront's managed cache policies don't forward cookies, headers, and query parameters when caching is disabled - exactly what we needed for our backend API.

The Problem: Managed Cache Policies and CachingDisabled

AWS CloudFront offers managed cache policies that handle common caching scenarios. The "CachingDisabled" policy seems perfect for dynamic content that shouldn't be cached. However, this policy doesn't forward cookies, headers, or query parameters to your origin by default.

For e-commerce platforms like Medusa, this is a dealbreaker. The backend needs:

Cookies for session management and authentication
Headers for content negotiation and API functionality
Query parameters for filtering and pagination

We initially tried to create a custom cache policy with MinTTL=0 (no caching) while specifying header and cookie forwarding behaviors. AWS rejected this with an error:

operation error CloudFront: CreateCachePolicy, https response error StatusCode: 400,
InvalidArgument: The parameter HeaderBehavior is invalid for policy with caching disabled.

AWS's validation logic considers forwarding settings incompatible with disabled caching when using formal cache policies. The problem is clear: cache policies won't let you forward data without caching, but dynamic applications need that data forwarded to work properly.

Why We Use CloudFront

Before diving into the solution, let's clarify why we chose CloudFront in the first place:

Free HTTPS with Default Certificate - CloudFront provides a free SSL/TLS certificate via cloudfront_default_certificate = true, giving you a URL like https://d123456abcdef.cloudfront.net
No Domain Required - Users don't need to purchase a domain, manage DNS records, or provision ACM certificates
VPC Security - Our Application Load Balancer (ALB) stays in private subnets, accessible only through CloudFront's VPC Origin feature
Simple Setup - One Terraform resource provides HTTPS, DNS, and secure origin access without additional configuration

For a deployment-focused module, this convenience is valuable. Users get a working HTTPS endpoint immediately after terraform apply.

The Solution: Legacy `forwarded_values` Configuration

The workaround is to use CloudFront's legacy forwarded_values block instead of modern cache policies. While AWS recommends cache policies for new distributions, the forwarded_values configuration still works and allows zero-TTL caching with full data forwarding.

Here's the configuration we use in our backend module:

default_cache_behavior {
  target_origin_id       = local.origin_id
  viewer_protocol_policy = "redirect-to-https"

  # Disable caching by setting all TTLs to zero
  min_ttl     = 0
  default_ttl = 0
  max_ttl     = 0

  forwarded_values {
    query_string = true    # Forward all query parameters
    headers      = ["*"]   # Forward all headers to origin

    cookies {
      forward = "all"      # Forward all cookies to origin
    }
  }

  allowed_methods = ["GET", "HEAD", "POST", "PUT", "PATCH", "OPTIONS", "DELETE"]
  cached_methods  = ["GET", "HEAD", "OPTIONS"]
}

Key Configuration Elements

The heart of this solution is the TTL configuration. By setting min_ttl, default_ttl, and max_ttl all to 0, we're telling CloudFront "don't cache anything, ever." Every request goes straight through to the origin, which is essential for dynamic content like user sessions and real-time inventory updates.

Inside the forwarded_values block, we're basically saying "pass everything through." Setting query_string = true ensures that API parameters like ?page=2&limit=20 reach your backend. The headers = ["*"] configuration is particularly important-it forwards every header, including Authorization, Content-Type, and custom headers your application might use. And crucially, forward = "all" in the cookies block ensures that session cookies make the round trip from browser to CloudFront to your backend and back again.

The allowed_methods array supports the full spectrum of HTTP verbs (GET, POST, PUT, PATCH, DELETE) because Medusa's admin API needs them all. This configuration effectively turns CloudFront into a passthrough proxy with HTTPS termination-not a traditional CDN, but a secure front door for your API.

Trade-offs and Considerations

This approach shines when you're working with dynamic applications that maintain session state-think authentication systems, shopping carts, or any API where each request is unique. It's particularly valuable in rapid deployment scenarios where getting HTTPS working quickly matters more than squeezing out every bit of performance optimization. We've also found it perfect for development and staging environments where managing domains and certificates feels like overkill.

That said, this isn't a one-size-fits-all solution. If you're serving static content like CSS, JavaScript bundles, or images, you're missing out on CloudFront's real strength: global edge caching. Similarly, if you're running a high-traffic production service where caching could significantly reduce origin load and costs, the no-cache approach leaves performance on the table. For applications serving a global audience where edge caching could shave hundreds of milliseconds off response times, you'd want to reconsider this pattern.

For our Medusa module specifically, the no-cache approach makes sense because backend APIs are inherently dynamic-every request involves database queries, authentication checks, and business logic that can't be cached safely. Caching would actually break core functionality like session management and real-time inventory updates. The convenience of instant HTTPS deployment is worth the trade-off, and users always have the option to add a proper CDN layer in front for their static storefront assets if needed.

Conclusion

AWS CloudFront's managed cache policies work well for typical CDN use cases, but they have limitations when you need no caching with full data forwarding. The legacy forwarded_values configuration provides a reliable workaround that's been working in production for our Medusa. deployments.

While AWS's documentation encourages using modern cache policies, the forwarded_values approach remains supported and is sometimes the pragmatic choice for dynamic applications. As always in infrastructure engineering, the "right" solution depends on your specific requirements-in our case, deployment convenience and session state management won the day.

This article is based on our experience building the terraform-aws-medusajs module for deploying Medusa. e-commerce backends on AWS.

The default user in the Docker image

Daniel Kraszewski — Mon, 10 Nov 2025 07:00:00 +0000

This fifth article concludes the series of Docker best practices that deserve more love. We will look closely at the default user in the image and pave the way for uninterrupted usage of Docker platform.
Make sure you reviewed the previous articles:

Proper use of cache to speed up and optimize builds

Selecting the appropriate base image

Understanding Docker multi-stage builds

Understanding the context of the build

Using administrator privileges

The default user in the Docker image

Regardless of the operating system, it is always good practice to reasonably use administrator privileges. Whether it's the root user on Unix-like systems or the disabled UAC module on Windows the effect may be the same and lead to increased chances of malicious code breaking through security. The principles are the same when it comes to Docker images, despite the fact that it introduces additional isolation between the host system and containers.

So what is the default user? The answer is: the one that was set in the base image or root if the user remained unchanged. Using root gives us some advantages, such as the possibility to install additional dependencies. Due to the location of some libraries, these operations cannot be performed with limited privileges.

Ultimately, we aim to change the user that will be least privileged, which in Dockerfile is quite simple. This is done with the USER instruction which comes in two variants:

USER <user>[:<group>]
USER <UID>[:<GID>]

The first variant requires the textual names of the user and group we want to switch to, while the latter allows us to use their numeric identifiers directly. Let’s find out how the build process works when USER instruction in both variants is used.

Textual user name as nginx:

tips@u11d:~$ echo "
> FROM alpine:3.16
> USER nginx
> " > Dockerfile

tips@u11d:~$ sudo docker build .
Sending build context to Docker daemon  10.75kB
Step 1/2 : FROM alpine:3.16
 ---> 9b18e9b68314
Step 2/2 : USER nginx
 ---> Running in b67f70601e4c
Removing intermediate container b67f70601e4c
 ---> 8410027170c6
Successfully built 8410027170c6

tips@u11d:~$ sudo docker run --rm -it 8410027170c6
docker: Error response from daemon: unable to find user nginx: no matching entries in passwd file.

Numeric user identifier as 1000:

tips@u11d:~$ echo "
> FROM alpine:3.16
> USER 1000
> " > Dockerfile

tips@u11d:~$ sudo docker build .
Sending build context to Docker daemon  9.728kB
Step 1/2 : FROM alpine:3.16
 ---> 9b18e9b68314
Step 2/2 : USER 1000
 ---> Running in 869f2e785e83
Removing intermediate container 869f2e785e83
 ---> 7d0dcceafd9f
Successfully built 7d0dcceafd9f

tips@u11d:~$ sudo docker run --rm -it 7d0dcceafd9f
$ id
uid=1000 gid=0(root)

As we can see from the examples above, both versions were built without a problem. Unfortunately, only the version with a numeric user ID was successfully executed. This is because the USER instruction does not create a user in the image, but only switches to it. Docker, while running the container, could not replace the user's name with his UID (numeric ID), which caused the error.

The solution is to manually add the user before using it. In images based on the Alpine distribution, we can do this with the adduser -D -u 1000 username command, where:

-D parameter disables the password configuration,
-u 1000 sets the numeric user and group ID to the indicated value,
username is the name of the new user.

tips@u11d:~$ echo "
> FROM alpine:3.16
> RUN adduser -D -u 1000 nginx
> USER nginx
> " > Dockerfile

tips@u11d:~$ sudo docker build .
Sending build context to Docker daemon  9.728kB
Step 1/3 : FROM alpine:3.16
 ---> 9b18e9b68314
Step 2/3 : RUN adduser -D -u 1000 nginx
 ---> Running in a788a92773f9
Removing intermediate container a788a92773f9
 ---> 84aecbe49298
Step 3/3 : USER nginx
 ---> Running in 4db5f569c0dc
Removing intermediate container 4db5f569c0dc
 ---> 63a098041085
Successfully built 63a098041085

tips@u11d:~$ sudo docker run --rm -it 63a098041085
$ id
uid=1000(nginx) gid=1000(nginx)

As a result the user is successfully created and changed in the image. The next step is to adjust file and directory permissions. When copying data to the image with COPY or ADD instructions the ownership still remains root, which is a default behavior. Docker's authors anticipated this situation and added the ability to change ownership on the fly. This saves the extra RUN instruction, which would have to do it separately.

To change the ownership of copied files and directories use the --chown=<user>:<group> parameter of the COPY or ADD instruction.

FROM alpine:3.16
RUN adduser -D -u 1000 nginx
COPY --chown=nginx . .

If you have trouble remembering the chown shortcut you can always think of its full version - "change owner".

Conclusion

Summarizing, the security principle of least privilege (POLP) can be applied to the Docker platform and containerization concept. It is always a good practice to prevent uncontrolled files or directories ownership and give only the minimum level of access required. On the other hand it is a modus operandi that ownership as well as the process is executed by the same system user. This is definitely a good design as well as a way to meet the security rules and policies and improve the overall security of the system.

Still feeling unsure or need more information about the Docker platform? You can always go through our Docker series to catch up with things.

If this is not enough and you need professional support, do not hesitate to contact us. We are always happy to help.

Understanding the Docker build context

Daniel Kraszewski — Mon, 03 Nov 2025 13:56:03 +0000

In the fourth installment of our Docker best practices series, we take a deep dive into the context of the build to understand further details of image building.
Be sure you checked the previous articles:

Proper use of cache to speed up and optimize builds.

Selecting the appropriate base image.

Understanding Docker multi-stage builds.

Understanding the context of the build.

Using administrator privileges.

The context of the build - or where the megabytes in the console are coming from.

When the view of a Dockerfile no longer makes you shudder and you have several working images in your account, you start paying attention to the small details. Along the way some questions start to arise. Why despite maintaining all of the best practices does the build cache not always work? Why does Docker send hundreds of megabytes before it finally starts building my image? The answers to both these questions comes down to understanding what the build context is and how it works.

The Docker commands you execute in the console are not run directly by the docker executable file. It only serves the purpose of a user interface that communicates with the Docker daemon running in the background of your computer or server. Communication is usually done using a Unix socket available as a /var/run/docker.sock file or a TCP protocol on port 2375 (2376 with encryption). In the following data coming from the console, you can see how the docker version command will behave when the Docker daemon is unavailable on the system.

tips@u11d:~$ sudo docker version
Client:
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 22:59:14 2022
 OS/Arch:           linux/arm64
 Context:           default
 Experimental:      true
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

The image build command is no exception here. By running docker build . we inform Docker to use the current directory (the dot at the end) as the data source. As a result, what's in a folder will be packaged and uploaded to the Docker daemon, where the instructions from the Dockerfile will be executed. Sounds reasonable right? The data sent to the server is called the build context, and we can see its size in the first lines of the logs. In the following example, it is 104.9MB.

tips@u11d:~$ dd if=/dev/zero of=big_file.bin bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0607578 s, 1.7 GB/s

tips@u11d:~$ echo "
FROM alpine:3.16
COPY . .
" > Dockerfile

tips@u11d:~$ sudo docker build .
Sending build context to Docker daemon  104.9MB
Step 1/2 : FROM alpine:3.16
 ---> 9b18e9b68314
Step 2/2 : COPY . .
 ---> 23acc061163e
Successfully built 23acc061163e

Docker copies all the existing files and directories into the context, including the hidden ones like .git. If we use, for example, COPY . . to conveniently copy the entire context into the image, they will also be included. The hidden files are one of the most common reasons why the build cache works selectively, and we are unable to locate the cause.

Another potential problem is copying compilation results, or installed packages remaining on the host system after testing. This usually leads to transferring large amounts of data to the Docker server (e.g. a huge node_modules directory), and in the worst case, event developer versions of files.

How to deal with this problem? Certainly, you should carefully copy data to the image, and in addition to the Dockerfile, prepare a .dockerignore file. This file should be placed directly in the root directory of the build context and contain rules for ignoring what should not be included in the build context. Its syntax resembles the well-known equivalent of .gitignore from the Git versioning system.

Dockerfile
.dockerignore
.git
.idea
.vscode

/build
/dist
/gradle
/node_modules
/reports

Should .dockerignore also exclude itself and the Dockerfile? If you don't need these files in the image then by all means yes. Docker leaves that decision up to you.

Finally, let's see what the logs from the earlier example look like when we ignore all files with the .bin extension.

tips@u11d:~$ dd if=/dev/zero of=big_file.bin bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0817181 s, 1.3 GB/s

tips@u11d:~$ echo "
FROM alpine:3.16
COPY . .
" > Dockerfile

tips@u11d:~$ echo "
> *.bin
> " > .dockerignore

tips@u11d:~$ sudo docker build .
Sending build context to Docker daemon  10.75kB
Step 1/2 : FROM alpine:3.16
 ---> 9b18e9b68314
Step 2/2 : COPY . .
 ---> 0ef632c6dacf
Successfully built 0ef632c6dacf

As you probably already found out, the build context shrank to 10.75kB. This way we have been able to achieve better performance, efficiency and much lower data transfer rates between host and the Docker server.

Summary

As we found out, the context of the build process refers to the set of files and directories that are used as input to the build process. This can include source code files, configuration files, and any other files that are needed to create the final Docker image. The build context is the set of input files that are provided to the build system, and it is used to determine which files should be included in the final build output.
As a next step, I suggest revisiting the images you have previously created and looking at the image building process from a build context perspective. There is a chance that you may be able to introduce some valuable optimization.

Check out the next article that refers to cybersecurity and applies the principle of least privilege to control the artifact’s ownership.
Using administrator privileges

Understanding Docker multi-stage builds

Daniel Kraszewski — Mon, 27 Oct 2025 09:29:26 +0000

Third article in the series of Docker best practices will help you to understand how to make use of multi-stage Docker builds feature. If this is the first article in this series for you, be sure to check out the others.

Multi-stage builds

Many best practices on how to properly prepare Docker images focus their attention on size of images and number of layers. Certainly, the consequences of both of these parameters may result in the rapid exhausting of disk space or long downloading times of base images. This way it is very easy to "pollute" the resulting Docker images.
Not so long ago, since Docker version 17.05, a new mechanism was introduced to keep images "clean" in a quite convenient way. This mechanism is called multi-stage builds. Please see the exercise below to get the idea of multi-staging and the benefits it brings.

Let's start by preparing a sample application that we want to place in a Docker image. This will be a web application created using the React framework and its create-react-app tool. It will generate a code template and configuration, allowing us to focus on the image creation aspects.

Instead of installing Node.js locally, let's take advantage of the benefits of Docker and spawn a temporary container to generate the skeleton of React the application:

docker run --rm -v $(PWD):/opt -w /opt --entrypoint sh node:18.7.0-alpine -c "npm install create-react-app && npx create-react-app example"

Tip: If you are using Windows, we recommend using WSL2..

The above line will start a container with Node.js 18 based on the lightweight distribution Alpine. The current directory will be mounted under the /opt path. The default action after container launch will be to install the create-react-app package and run it using the npx command.

Ok, we have the application. Now let’s proceed with the "classic" Dockerfile:

FROM nginx:1.23.1-alpine
WORKDIR /opt/example
RUN apk add --no-cache --virtual .build-deps \
  nodejs \
  npm
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build \
  && cp -r build/* /usr/share/nginx/html \
  && apk del .build-deps

We use the NGINX base image, which is one of the most popular web servers. We change the current directory to /opt/example, and then install Node.js and NPM, which are necessary to build our application. We copy the package information and install it with the npm ci command. The next step is to copy all the code of the application and build it with npm run build. Then eventually copy the results to the directory where the web server expects them and uninstall Node.js along with NPM.

Has anything caught your attention? You are right, layers! By design Docker images are made up of layers. As a result if Node.js and NPM were installed with a separate RUN instruction then unfortunately deleting them at the end will not make the image smaller.

Before multi-stage builds were introduced, one solution for this problem was to split the Dockerfile into two parts or copy files previously prepared on the host system into the image. However both solutions are against the idea of moving the application preparation process to an independent environment.

Let's take a look at a Dockerfile which uses a multi-stage builds:

FROM node:18.7.0-alpine AS builder
WORKDIR /opt/example
COPY package.json package-lock.json ./
RUN npm install
COPY . .
RUN npm run build

FROM nginx:1.23.1-alpine
COPY --from=builder /opt/example/build/* /usr/share/nginx/html/

Isn't this form more readable? What you should pay attention to is the use of more than one FROM instruction. Each one starts the image creation process from scratch, but we can still use the results of the previous ones thanks to COPY --from=... instruction. The from parameter can take the index of the step, its name is given with AS (in our case it is builder), or the name of a completely separate image.

Analyzing the Dockerfile step by step you can notice that the image building process starts with selecting the Node.js 18.7.0 image based on the Alpine distribution. We labeled this stage as builder. In the next step we set the current directory and copied the information about the required packages into it. We installed the packages using npm ci and copied the application code and ran npm run build. The next instruction is to start a new stage based on the NGINX server image, this time without any additional name. We copied the files of the prepared application from the previous step to the place required by the server configuration.

Thanks to the multi-step build, the image we built contains exactly what we need: the web server and the generated application files. This approach also brings some form of simplification because we are able to perform the whole process by running a single docker build command.

Conclusion

Docker multi-stage builds are a feature in Docker that allows creating Docker images using multiple build stages. This splits the build process into multiple steps, each of which can use a different base image and produce a different intermediate image. This is useful because it allows one to take advantage of the benefits of different base images, while also keeping the final image as small and efficient as possible.
Overall, using multi-stage builds can help improve the performance and efficiency of Docker images, while also making the build process more flexible and customizable.

If you are interesting why there may be a lot of data transferred during build process, please check the next article in the series:
Understanding the context of the build

Speed up Docker image builds with cache management

Daniel Kraszewski — Wed, 15 Oct 2025 19:13:06 +0000

Over the past few years I have been working in multiple IT projects where the Docker platform was used to develop, ship and run applications targeting various industries. In addition, I have conducted many interviews and over the time I have noticed that many DevOps engineers do not pay enough attention to the details and essential elements of the platform.

Therefore, I decided to collect and summarize information relevant to create optimal Docker images.

I see five critical areas that are often overlooked or misused by developers. This is the first article related to this topic. Please see the full list of topics that will be covered below.

Proper use of cache to speed up and optimize builds.
Selecting the appropriate base image.
Understanding Docker multi-stage builds.
Understanding the context of the build.
Using administrator privileges.

I encourage you to start writing an optimal Dockerfiles journey now.

Speed up image builds with cache management

What interests us the most in creating Docker images is the ability to add custom files and execute commands, such as installing dependencies or compiling code. We can achieve these goals very quickly by creating a Dockerfile that resembles the steps we would perform in a console.

An example for Node.js, where npm is the dependency management tool:

FROM node:18.7.0-alpine
COPY . .
RUN npm ci

In the above example, we are using the Node.js 18 image in a lightweight version based on a Linux distribution called Alpine. We copy all the files from the current context and start installing dependencies using the npm ci command.

Simple right? However, this is not the optimal approach from Docker's point of view. This is because it uses an internal mechanism that allows you to reuse the layers of an image you built earlier. This mechanism will not be used if we leave the Dockerfile as presented.

The image build cache is not very complicated to use. When copying files to an image using the ADD or COPY statement, Docker compares the contents of the files and their metadata with those it already has (using checksums). If nothing has changed, it will use the previously prepared layers, including for subsequent RUN instructions. Encountering another ADD or COPY it will check the files again, and so on until the end of the build process.

Our Dockerfile should therefore look in the following:

FROM node:18.7.0-alpine
COPY package.json package-lock.json ./
RUN npm ci
COPY . .

As before, we use the same Node.js base image. This time, however, instead of immediately copying all the files into the image, we first copy only the files responsible for the information about the required dependencies. We run dependencies installation and finally copy the application code.

Let's now follow what the process of building this image will look like, assuming that only the application code has changed and the dependencies remain the same. Docker will encounter the first use of the COPY instruction, checking its resources it will find that nothing has changed, so it will use the previously prepared layer. In the next step, it will compare the contents of the RUN instruction. It remains the same, so you can also use the existing layer here. In the last step, it will copy the new application files into the image, since it's the only place where the changes have been made.

How much time and IOPS we have saved, will be understood by anyone who has at least once seen the size and number of files in the node_modules directory, where Node.js stores packages. After all, this is not an exception, similar dependency management can now be found in many languages/environments.

It is worth noting that installing changed dependencies is just one of many tasks that are performed less frequently than actual application code changes. Sometimes there is a need to prepare the appropriate directory structure, permissions or users accounts. All of these operations should be declared in the Dockerfile as early as possible, and using this rule is the easiest way to properly use the caching mechanisms in Docker.

If you want to know more about Docker best practices check out the next article in this series: Selecting the appropriate Docker base image

Selecting the appropriate Docker base image

Daniel Kraszewski — Tue, 14 Oct 2025 15:15:25 +0000

This is the second article related to Docker best practices. In this article I will explain how exactly you can choose the best base image.

Choose base images wisely

When learning Docker, we very quickly come across descriptions of how images are built. A set of layers that, stacked one on top of the other, form the final file system of a running container. Seemingly clear, but what do they give us in practice?
First of all, the fact that we can (although we don't have to) use another image as the base of our image, e.g. one available on public registries such as Docker Hub. It can be Ubuntu, CentOS, a Python interpreter or Bash. What matters is what libraries and tools we need to port our project to the Docker environment.

We configure the base image (that's how we call the image that our implementation will be based on) using one of the most commonly used instructions in the Dockerfile - the FROM instruction. Assuming that this time we are Dockerizing an application written in Python, an example of its use is shown below:

FROM python:3.10.6
COPY helloworld.py .
CMD [ "python", "./helloworld.py" ]

The first line instructs Docker that we want to use an existing Python image with the 3.10.6 tag as the base image. The 3.10.6 tag as described in repository means using Python version 3.10.6. In the second line of the Dockerfile we copy our sample application, and in the third line we decide that it will be executed when the Docker container starts.

At this point, we already know that we don't need to install Python manually. Someone has prepared the Python image for us. Same as the community prepares all kinds of packages used in software development. The question arises, however, what to look for when selecting a base image?

Who is the author?

Anyone can publish their images on Docker Hub, which comes with risk of including malicious code in their application. To minimize this risk, it is worth checking who the author is in the By section. As an added convenience, you can find trusted images labeled with Docker Official Image and Verified Publisher.

Architecture

Images are built to run in specific environments. Code compiled for the processor of a typical desktop computer will not run directly on a MacBook M1/2 or RaspberryPi. Although it is more advanced knowledge, it is worth checking that the appropriate architecture is available in the list of tags.

Version

When deciding on a version, it's a good idea to choose a tag that narrows down the possible image content as much as possible. Even upgrading with a patch version, may inconsistently cause backward compatibility problems that we do not expect. Therefore, given a choice of python:3, python:3.10 or python:3.10.6, a reasonable choice would be the latter.

Size

The size of the image translates directly into the amount of data needed to be sent over the network, as well as disk space. If we need to run a script in Bash, it is not worth using all of Ubuntu for this. A good practice is to select images tailored for specific needs, e.g. using smaller and leaner Linux distributions like Alpine. Such images often have alpine in the tags.

Security

Typical Linux distributions contain many libraries and tools that could potentially bring vulnerabilities. An immediate way to minimize the risk is to use smaller distributions, like the aforementioned Alpine. Fewer dependencies mean simpler monitoring and upgradeability.

Standard library C

tips@u11d:~$ sudo docker run --rm -it alpine:3.16
$ wget -O docker-compose https://github.com/docker/compose/releases/download/v2.7.0/docker-compose-linux-x86_64
Connecting to github.com (140.82.121.3:443)
Connecting to objects.githubusercontent.com (185.199.111.133:443)
saving to 'docker-compose'
docker-compose       100% |*****************************************| 11.6M  0:00:00 ETA
'docker-compose' saved

$ chmod +x docker-compose

$ ./docker-compose
/bin/sh: ./docker-compose: not found

On Unix-like systems, the standard library is treated as part of the operating system. This means that we will not be able to run an application built with glibc in an image based on, for example, the musl library. An example of this would be trying to run the docker-compose utility downloaded directly from GitHub in an image based on Alpine.

Is there anything else you can do better? Yes, such as building your image completely from scratch using FROM scratch and putting only the application and its direct dependencies in it. It is also worth looking at the Distroless initiative striving to achieve the same goal.

Conclusion

Selecting a good Docker base image is important for several reasons. First, the base image forms the foundation of Docker image, and it provides the underlying OS and runtime environment that the application will run in. Therefore, choosing a base image that is well-suited to your needs can help ensure that the application runs efficiently.

Second, the base image can impact the size of the final Docker image. For example, choosing a base image that includes a lot of unnecessary libraries or applications may end up with bloated Docker images. This can affect the performance of applications and make it more difficult to distribute and deploy.

Finally, the base image can also impact the security of a final Docker image. Using a base image that is known to have vulnerabilities results in a fact that Docker image may be more susceptible to attack. Therefore, it's important to choose a base image that is well-maintained and regularly updated with security patches.

If you want to build performant Docker images check out the next article in this series:
Understanding Docker multi-stage builds

Exposing Private Load Balancers with CloudFront VPC Origins

Daniel Kraszewski — Mon, 06 Oct 2025 08:41:06 +0000

Let's explore CloudFront VPC Origins, an AWS feature that allows you to connect CloudFront directly to private resources in your VPC without exposing them to the Internet. In this article, we'll see why this matters and how you can implement it using Terraform.

Introduction: Why Keep Your Load Balancers Private?

When building web applications on AWS, we've traditionally needed public load balancers so CloudFront could reach our origin servers. This presented several challenges:

Your load balancer is exposed to the world - Even with tight security groups, a public load balancer increases your application's attack surface
Maintaining IP allow lists is a pain - The traditional approach requires whitelisting CloudFront's IP ranges in your security groups, which change periodically and require updates
Secret headers aren't secure enough - Some try using secret headers as an alternative to IP whitelisting, but this "security by obscurity" approach can be discovered and exploited
Direct access bypassing - Attackers might discover your load balancer's public endpoint and bypass CloudFront entirely, circumventing any protections you've set up there

CloudFront VPC Origins provides a better solution. It creates a direct, secure connection between CloudFront and resources in your private subnets, keeping your load balancers completely hidden from the public internet while still being accessible through CloudFront.

The Architecture

Before diving into implementation details, let's visualize how CloudFront VPC Origins creates a secure architecture. The diagram below illustrates the end-to-end flow from internet users to your private resources, showing how CloudFront VPC Origins bridges the gap between the public internet and your private VPC without exposing your infrastructure:

Practical Implementation with Terraform

Let's implement this using Terraform, based on our Medusa.js AWS module. We'll go through this in a logical order based on dependencies.

1. Setting Up the Private Load Balancer

First, we create a load balancer in private subnets:

resource "aws_lb" "main" {
  load_balancer_type = "application"  # Default value
  subnets            = var.vpc.private_subnet_ids  # The key part - using private subnets!
  security_groups    = [aws_security_group.lb.id]
  name               = "${local.prefix}-lb"
  tags               = local.tags
}

resource "aws_lb_target_group" "main" {
  port        = 9000  # Default container port for Medusa backend
  protocol    = "HTTP"
  vpc_id      = var.vpc.id
  target_type = "ip"
  name        = "${local.prefix}-tg"
  health_check {
    protocol            = "HTTP"
    port                = 9000
    interval            = 30
    matcher             = "200"
    timeout             = 3
    path                = "/health"
    healthy_threshold   = 3
    unhealthy_threshold = 3
  }

  tags = local.tags
}

resource "aws_lb_listener" "main" {
  load_balancer_arn = aws_lb.main.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
        type = "forward"
        forward {
            target_group {
                arn = aws_lb_target_group.main.arn
            }
        }
    }

  lifecycle {
    replace_triggered_by = [aws_lb_target_group.main]
  }

  tags = local.tags
}

2. Creating the Security Groups

Next, we set up security groups that only allow traffic from CloudFront VPC Origins:

# Using AWS-managed prefix list for CloudFront VPC Origins
data "aws_ec2_managed_prefix_list" "vpc_origin" {
  name = "com.amazonaws.global.cloudfront.origin-facing"
}

resource "aws_security_group" "lb" {
  name_prefix = "${local.prefix}-lb-"
  description = "Allow inbound traffic from CloudFront VPC Origins"
  vpc_id      = var.vpc.id
  tags        = local.tags

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_vpc_security_group_ingress_rule" "vpc_origin" {
  security_group_id = aws_security_group.lb.id
  prefix_list_id    = data.aws_ec2_managed_prefix_list.vpc_origin.id
  from_port         = 80
  to_port           = 80
  ip_protocol       = "tcp"
  tags              = local.tags
}

resource "aws_vpc_security_group_egress_rule" "lb" {
  security_group_id = aws_security_group.lb.id
  cidr_ipv4         = "0.0.0.0/0"
  ip_protocol       = "-1"
  tags              = local.tags
}

This is where the magic happens - instead of maintaining our own list of CloudFront IPs, we use AWS's managed prefix list com.amazonaws.global.cloudfront.origin-facing. AWS keeps this updated automatically, so you don't have to worry about it.

3. Setting Up the VPC Origin Connection

Now, we create the CloudFront VPC Origin that connects to our private load balancer:

locals {
  origin_id = "${local.prefix}-lb"
}

resource "aws_cloudfront_vpc_origin" "main" {
  vpc_origin_endpoint_config {
    name                   = local.origin_id
    arn                    = aws_lb.main.arn
    http_port              = 80
    https_port             = 443
    origin_protocol_policy = "http-only"

    origin_ssl_protocols {
      quantity = 1
      items    = ["TLSv1.2"]
    }
  }

  timeouts {
    create = "30m"  # Important: VPC Origins take time to provision
  }

  depends_on = [aws_lb_target_group.main, aws_security_group.lb]

  tags = local.tags
}

Note: CloudFront VPC Origins can take a while to provision, and without extended timeout, Terraform might give up too early.

4. Configuring the CloudFront Distribution

Finally, we set up the CloudFront distribution to use our VPC Origin:

resource "aws_cloudfront_distribution" "main" {
  enabled = true
  comment = "My-Project-Dev-Backend"  # Example of what title(local.prefix) might render

  origin {
    domain_name = aws_lb.main.dns_name
    origin_id   = local.origin_id

    vpc_origin_config {
      vpc_origin_id = aws_cloudfront_vpc_origin.main.id
    }
  }

  # Default cache behavior
  default_cache_behavior {
    target_origin_id       = local.origin_id
    viewer_protocol_policy = "redirect-to-https"

    # Cache settings for APIs - disabled to allow dynamic content
    min_ttl     = 0
    default_ttl = 0
    max_ttl     = 0

    forwarded_values {
      query_string = true
      headers      = ["*"]

      cookies {
        forward = "all"
      }
    }

    allowed_methods = ["GET", "HEAD", "POST", "PUT", "PATCH", "OPTIONS", "DELETE"]
    cached_methods  = ["GET", "HEAD", "OPTIONS"]
  }

  # Certificate configuration
  viewer_certificate {
    cloudfront_default_certificate = true  # Use CloudFront's default certificate
  }

  # Other settings
  price_class = "PriceClass_100"  # Use only North America and Europe edge locations

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  tags = {
    Project     = "my-project"
    Environment = "dev"
    Owner       = "DevOps Team"
    ManagedBy   = "terraform"
    Component   = "Backend"
  }
}

The vpc_origin_config block that references our VPC Origin is the key difference from a traditional CloudFront setup.

Conclusion: Solving Real Security Challenges

CloudFront VPC Origins represents a significant advancement for securing web applications on AWS, directly addressing the security challenges we outlined at the beginning:

Remember the problem of exposed load balancers? With VPC Origins, your load balancers now remain completely isolated in private subnets, dramatically reducing your attack surface. The headache of maintaining IP allowlists is eliminated through AWS's managed prefix list com.amazonaws.global.cloudfront.origin-facing, which is automatically maintained for you.

Insecure secret header approaches are no longer needed, as the direct connection between CloudFront and your VPC resources provides a much stronger security model. And attackers trying to bypass CloudFront? With your load balancer in a private subnet, there's simply no way for external traffic to reach it except through CloudFront.

There are a few practical considerations: CloudFront VPC Origins take time to provision, each region needs its own VPC Origin, there's a cost for the connectivity, and setup is slightly more complex. However, for organizations handling sensitive data or with compliance requirements, these minor considerations are easily outweighed by the significant security benefits.

Terraform as a One-Shot Init Container in Docker Compose and CI: Ending "It Worked On My Machine"

Daniel Kraszewski — Wed, 24 Sep 2025 09:15:04 +0000

Picture this: It's Friday afternoon. Your pull request looks perfect locally - tests green, endpoints responsive, everything just works. You push to GitHub, confident it'll sail through CI. Twenty minutes later: red build. An Elasticsearch error pops up: "no such index [blog_posts]". This all-too-common 'it worked on my machine' problem highlights the dangers of environment drift - exactly what Terraform as a one-shot init container in Docker Compose and CI is designed to solve.

You scramble to check. Locally? Index exists. CI logs? Nothing obvious. You spend an hour discovering that your local Docker setup manually created that index months ago, while CI starts fresh every time. The infrastructure your app depends on exists in your head, not in code.

If this sounds familiar, you're not alone. Most of us have lived through the pain of environment drift - where your local development setup slowly becomes a unique snowflake that no one else can reproduce. Your docker compose up works, but only because of that one-off curl command you ran three sprints ago to "fix" an index mapping.

This post shows you a pattern that eliminates this drift entirely: treat infrastructure setup as code that runs automatically in every environment. We'll use Terraform as a one-shot initialization container that provisions Elasticsearch indices before your app even starts. The same container that sets up your local dev environment also runs in CI and can deploy to production. No more manual steps, no more "works on my machine," no more Friday afternoon mysteries.

The Problem: Infrastructure Assumptions Hidden in Plain Sight

Before diving into the solution, let's get concrete about what goes wrong. Consider a typical FastAPI application that needs Elasticsearch. Most tutorials show you this:

@app.post("/blogs")
async def create_blog(blog: BlogCreate):
    es.index(index="blog_posts", document=blog.dict())

Looks simple, right? But this code makes a critical assumption: the blog_posts index exists and has the right mapping. In development, you probably created it once with a curl command:

curl -X PUT "localhost:9200/blog_posts" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "title": {"type": "text"},
      "content": {"type": "text"},
      "author": {"type": "keyword"}
    }
  }
}'

Your app works perfectly... until a new team member clones the repo, runs docker compose up, and gets index errors. Or until CI runs in a clean environment. Or until you deploy to production and forget to create the index there too.

The real problem isn't that you forgot to document the setup step (though that happens). It's that the setup step exists outside your application's deployment process. Your app code and your infrastructure setup live in separate worlds, creating countless opportunities for them to drift apart.

The Solution: Infrastructure as Code, Everywhere

Here's the key insight: if your app needs infrastructure to exist, that infrastructure should be created by code, not by hand. And that code should run automatically in every environment where your app runs.

We'll build a FastAPI blog application that needs:

Elasticsearch indices with specific mappings
An API key with limited permissions
Audit logging that writes to a separate index

Instead of hoping these exist, we'll use Terraform to create them. But here's the twist: Terraform runs as a container in our Docker Compose stack, not as a separate manual step.

Let's look at how this works in practice. Our compose.yaml defines a terraform service that runs once and exits:

terraform:
  image: hashicorp/terraform:1.13.1
  restart: no
  working_dir: /workspace/terraform
  entrypoint: ["/bin/sh", "-ec"]
  command: |
    terraform init -backend-config=local.s3.tfbackend
    terraform apply -var-file=local.tfvars -auto-approve
  volumes:
    - .:/workspace
    - terraform_data:/workspace/terraform/.terraform
  depends_on:
    minio:
      condition: service_healthy
    elasticsearch:
      condition: service_healthy

This container waits for MinIO and Elasticsearch to be healthy, then runs terraform apply to create our indices and API key. It uses MinIO instance as an S3 backend for state storage, making the whole setup self-contained.

The magic is in the ordering guarantees. With depends_on and healthchecks, we get a deterministic startup sequence:

MinIO starts and becomes healthy (S3 backend ready)
Elasticsearch starts and becomes healthy (database ready)
Terraform runs and provisions indices/API key (infrastructure ready)
Only then can your application start or tests run

What Terraform Actually Creates

Let's look at the concrete infrastructure our application needs. Here's the Terraform configuration that runs in that container:

resource "elasticstack_elasticsearch_index" "blog_posts" {
  name = "blog_posts"

  mappings = jsonencode({
    properties = {
      title      = { type = "text" }
      content    = { type = "text" }
      author     = { type = "keyword" }
      created_at = { type = "date" }
      updated_at = { type = "date" }
      version    = { type = "integer" }
    }
  })
}

resource "elasticstack_elasticsearch_index" "blog_logs" {
  name = "blog_posts_log"

  mappings = jsonencode({
    properties = {
      action    = { type = "keyword" }
      blog_id   = { type = "keyword" }
      timestamp = { type = "date" }
      version   = { type = "integer" }
      data      = { type = "object" }
    }
  })
}

resource "elasticstack_elasticsearch_security_api_key" "backend" {
  name = "backend"

  role_descriptors = jsonencode({
    blog_backend = {
      indices = [{
        names = ["blog_posts", "blog_posts_log"]
        privileges = ["create", "index", "read", "maintenance"]
      }]
    }
  })
}

This is infrastructure as code at its best. Every field mapping, every privilege, every index name is explicitly defined. No assumptions, no manual steps, no tribal knowledge. When you change a mapping, you update this file and redeploy. When you add a new index, it's defined here first.

The beautiful part? This exact same Terraform code can run against a local Elasticsearch instance, a staging cluster, or production. The only thing that changes is the connection configuration.

Local ↔ CI Parity: The Same Flow Everywhere

Now here's where this approach really shines: the exact same workflow runs locally and in CI. No special CI scripts, no different Docker configurations, no "it works locally but not in CI" mysteries.

Locally, you run:

docker compose up -d
# Wait for terraform container to exit successfully
pytest -v

In GitHub Actions, the workflow does this:

- name: Start compose stack
  run: docker compose -p blog up -d

- name: Wait for terraform to finish
  run: |
    while [ "$(docker inspect -f '{{.State.Status}}' blog-terraform-1)" != "exited" ]; do
      sleep 1
    done
    terraform_exit_code=$(docker inspect -f '{{.State.ExitCode}}' blog-terraform-1)
    if [ "$terraform_exit_code" != "0" ]; then
      echo "Terraform failed with exit code $terraform_exit_code"
      docker logs blog-terraform-1
      exit 1
    fi

- name: Run tests
  run: pytest -v

The CI workflow is just the automated version of what you do locally. Same containers, same Terraform, same tests. If it works locally, it works in CI. If it fails in CI, you can reproduce the failure locally by running the exact same commands.

This eliminates the most frustrating class of CI failures: the ones that only happen "in the cloud" because the environment is subtly different from your local setup.

Testing Against Real Infrastructure (Not Mocks)

Here's where this approach gets really powerful for testing. Instead of mocking Elasticsearch calls, our tests run against the real thing:

def test_create_blog_and_log():
    client = TestClient(app)
    elasticsearch = get_elasticsearch_client()

# This hits real Elasticsearch indices created by Terraform
    response = client.post("/blogs", json={
        "title": "First post",
        "content": "Hello world",
        "author": "tester"
    })

    assert response.status_code == 201

# Verify the audit log was created
    elasticsearch.indices.refresh(index="blog_posts_log")
    log_count = elasticsearch.count(index="blog_posts_log")["count"]
    assert log_count >= 1

These aren't unit tests, they're integration tests that validate your entire stack. They catch problems that mocks can't:

Wrong field mappings (Elasticsearch rejects documents)
Missing indices (immediate failure, not silent bugs)
Permission issues (API key lacks required privileges)
Data type mismatches (string where integer expected)

When your tests pass, you know your application actually works with your infrastructure, not just with your assumptions about it.

The confidence boost is enormous. Instead of wondering "will this work in production?", you know it will because it's already working against the same infrastructure patterns that production uses.

From Local to Production: The Path Forward

The beauty of this approach becomes clear when you need to deploy to production. You're not rewriting infrastructure setup - you're just pointing the same Terraform code at different targets.

For local development, our local.tfvars file might look like:

elasticsearch_url = "http://host.docker.internal:9200"
elasticsearch_indices = {
  blog_posts = "blog_posts"
  blog_logs  = "blog_posts_log"
}

For production, you'd have a prod.tfvars that points to your actual Elasticsearch cluster:

elasticsearch_url = "https://your-elasticsearch.cloud.es.io:443"
elasticsearch_indices = {
  blog_posts = "prod_blog_posts"
  blog_logs  = "prod_blog_posts_log"
}

Same Terraform code, same index definitions, same API key privileges. The only difference is where it runs and what it connects to.

You can also run this pattern for ephemeral preview environments. Each pull request gets its own namespace in Kubernetes, with its own Elasticsearch indices created by the same Terraform container. Perfect isolation, perfect consistency.

Why This Matters Beyond Just "Working"

This isn't just about avoiding Friday afternoon debugging sessions (though that's nice). This pattern gives you something more valuable: confidence in your entire development workflow.

When you can run docker compose up and get a perfect replica of your production infrastructure, you catch problems early. When your CI tests run against real services, you catch integration issues before they reach users. When your deployment process is identical across environments, you eliminate a huge class of production surprises.

Most importantly, when a new team member joins, they don't need to run seven manual setup commands from a README that might be outdated. They run docker compose up, wait for the terraform container to exit, and they have a working development environment that matches everyone else's.

Getting Started: Your Next Steps

Ready to try this pattern? Start simple:

Identify your manual setup steps - What curl commands, database migrations, or configuration tweaks does your local environment need?
Convert one step to Terraform - Pick the simplest infrastructure dependency (maybe an index or a queue) and define it in Terraform.
Add it to your Docker Compose - Create a terraform service that runs your configuration and exits.
Test the flow - Run docker compose up from a clean checkout. Does everything work without manual intervention?
Expand gradually - Add more infrastructure components to Terraform as you build confidence.

You don't need to solve everything at once. Even converting one manual setup step eliminates a whole class of "works on my machine" problems.

The Bigger Picture

This pattern is about more than just Terraform and Elasticsearch. It's about treating infrastructure as an integral part of your application, not as an afterthought. It's about making your development environment as reproducible as your CI pipeline. It's about catching problems early, when they're cheap to fix.

In a world where microservices depend on dozens of backing services, and where a single misconfigured index can bring down a feature, this kind of deterministic infrastructure setup isn't just nice to have - it's essential.

The next time you hear "it worked on my machine," you'll know exactly how to fix it. Not with documentation or Slack messages, but with code that runs the same way everywhere.

Want to see this in action? Check out the complete example repository with a working FastAPI + Elasticsearch + Terraform setup you can run locally.

Creating Docker images that can run on different platforms, including Raspberry Pi

Daniel Kraszewski — Mon, 01 Sep 2025 07:00:00 +0000

Building a multi-architecture Docker images

As each generation of the Raspberry Pi single-board computer emerges, its processing power continues to advance. The current fourth generation boasts a processor with four Cortex-A72 cores and up to 8GB of RAM. These impressive capabilities, enclosed within a printed circuit board measuring 85✕55, make the Raspberry Pi an excellent choice for use as a home server, offering a wide range of possibilities from media serving to automation of the software development process (CI/CD) or home automation. Given our commitment to containerization through Docker, it makes sense to consider deploying Docker images on the Raspberry Pi.

It's worth noting that the Raspberry Pi's processor architecture and instruction set differ from those of typical desktop computers. As a result, the Docker image must be built separately, targeting specific architecture. Fortunately, the increasing popularity of other processor families has led to many Docker Hub images already being prepared for use on the Raspberry Pi. For example, let's examine the Debian stable version:

We are most interested in the linux/arm/v7 and linux/arm64 platforms, because these instruction sets have the last 3 generations of RPi. These platforms are targeting various operating systems. In this case the linux/arm/v7 is aimed for 32-bit and linux/arm64/v8 for 64-bit operating systems.

The question arises as to what to do when a specific Raspberry Pi version requires an image that has not been provided. While it is possible to build the image directly on a single-board computer (SBC), this process can be complicated. In this case, we recommend using the Docker extension Buildx to build the image on a local machine. This approach offers greater flexibility and can help to streamline the development process.

Docker Buildx

Buildx is a plugin for Docker's CLI that significantly expands Docker's image building and management capabilities. Introduced in version 19.03, Buildx is currently enabled by default. For more detailed information on how to use Buildx, please consult the official documentation.

Ok, Docker build capabilities are enhanced, so what’s next?

With Buildx, it is possible to prepare images for one or multiple platforms depending on the configuration. The building process can be performed using various drivers, each with different feature sets. These include the Docker server, an additional container capable of emulating other platforms, and the use of a Kubernetes cluster.

While Buildx offers a great deal of flexibility, its implementation has certain limitations. For instance, using the Docker server driver only allows for the creation of an image for a single platform. In such cases, the goal is often to build an image that can be executed on different platforms using only the local machine.

Once the appropriate driver has been selected, the next step is to identify the available instances of builders (build environments) that Buildx can locate on a given machine. This can be accomplished by issuing the docker buildx ls command.

➜  ~ docker buildx ls
NAME/NODE       DRIVER/ENDPOINT STATUS  PLATFORMS
default *       docker
  default       default         running linux/arm64, linux/amd64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6

The default builder is included by default in Buildx, and uses the local Docker server as its backend. While it supports cross-platform image building, it is important to note that it does not enable the creation of multi-platform images, as stated in the documentation. To build multi-platform images, a different builder configuration must be used.

error: multiple platforms feature is currently not supported for docker driver. Please switch to a different driver (eg. "docker buildx create --use")

Let’s start with creating a new builder called multi:

➜  ~ docker buildx create --driver=docker-container --name=multi --use
multi
➜  ~ docker buildx ls
NAME/NODE       DRIVER/ENDPOINT             STATUS   PLATFORMS
multi *         docker-container
  multi0        unix:///var/run/docker.sock inactive
default         docker
  default       default                     running  linux/arm64, linux/amd64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6

Note: in conjunction with the docker buildx create command, we utilized the following options:

--driver - selects the driver that will be used,
--name - gives a name to our builder instance,
--use - instructs to use the newly created instance for subsequent operations.

Upon analyzing the output, it is evident that the builder is currently inactive. However, this will change once the first build is initiated.

In order to test our configuration, we can create a simple Dockerfile that displays information about the platform upon which it is executed at build time.

FROM debian:stable-slim
RUN uname -m

Now it's build time! Buildx command syntax is modeled on a regular docker build command with additional parameters.

--progress - changes the way logs are displayed,
--platform - allows providing a comma separated list of platforms for which the image will be created.

For this exercise, we will choose a PC as well as the two most popular platforms for Raspberry Pi.

➜  ~ docker buildx build -t dockerpro/multi-arch:latest --progress plain --platform linux/amd64,linux/arm/v7,linux/arm64 .

WARNING: No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
#1 [internal] booting buildkit
#1 pulling image moby/buildkit:buildx-stable-1
#1 pulling image moby/buildkit:buildx-stable-1 5.9s done
#1 creating container buildx_buildkit_multi0
#1 creating container buildx_buildkit_multi0 0.6s done
#1 DONE 6.5s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load build definition from Dockerfile
#3 transferring dockerfile: 74B done
#3 DONE 0.0s

*** image downloading cut out for readability ***

#9 [linux/amd64 2/2] RUN uname -m
#0 0.069 x86_64
#9 DONE 0.2s

#11 [linux/arm64 2/2] RUN uname -m
#0 0.050 aarch64
#11 DONE 0.1s

#12 [linux/arm/v7 2/2] RUN uname -m
#0 0.079 armv7l
#12 DONE 0.2s

As observed in the output, a significant amount of logs are being generated. This can be attributed to BuildKit, an advanced Docker mechanism that enables parallel execution of tasks and offers improved cache management, among other capabilities. It is worth delving further into BuildKit, even if building Raspberry Pi images is not a priority. Additional information can be found here.

Going back to the log analysis.

Returning to the logs we can observe that in the first step the BuildKit image container is started, then the .dockerignore file is loaded and the build context is transferred. The next step is to load the Dockerfile and execute its instructions. For our image we selected 3 platforms, so the instructions are executed for each platform separately. At first, the Debian image is downloaded then the uname -m program is executed to print system information. Specifically on which "machine" it is currently running. You can see this in more details in steps #9, #11 and #12.

As mentioned earlier, regular computers and Raspberry Pi processors are not compatible. However, Buildx and BuildKit use QEMU to emulate other hardware, allowing the uname -m program to report other platforms during the build process. It should be noted that emulation using QEMU is slower than direct code execution, but it provides the ability to build images in one place. Alternatively, we can use other Raspberry Pi boards and add them to Buildx as external builders, which could be a more efficient solution depending on the requirements. Of course, this approach would require additional hardware.

The image building process was completed successfully. However, it is worth noting that a warning was displayed at the beginning of the build logs.

This one exactly:

WARNING: No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load

When we use a regular docker build command, the image defaults to a local collection and we can run a container from it or send it to an external registry.

In the Buildx context, a driver using an additional container to build images requires configuring what should happen with the image after it is built. The suggested --load option (which is a shorthand for --output=type=docker) will unfortunately not work in our case.
Using --load option will cause a following error:

error: docker exporter does not currently support exporting manifest lists

This is because the "docker exporter" used by the --load option cannot save the image manifest, which contains metadata for multiple platforms, at once. Therefore, we need to use a different output option to save the image in a way that supports multi-platform images.

[CTA]

Custom image registry

Even though the image is built on the local machine, it ultimately needs to be transferred to the Raspberry Pi. In the Buildx context, the only available option to save the image requires a registry where images can be transferred and stored. Thus, a registry is a necessary solution to our problem.

To transfer the built multi-platform image to the Raspberry Pi, we need to save it in a registry. The Docker registry can run on any machine with sufficient disk space. The official implementation, written in Golang, requires minimal CPU and memory resources. Setting up the registry is straightforward and can be accomplished by:

➜  ~ docker volume create registry-data
registry-data
➜  ~ docker run -d -p 5000:5000 -v registry-data:/var/lib/registry --restart always --name registry registry:2docker run -d -p 5000:5000 -v registry-data:/var/lib/registry --restart always --name registry registry:2
216f1553784bd45d2fbaf43906ac5f55182c1475ad1d0e94d2e41175293a48c7

This will start the registry on port 5000. Additionally, the registry will automatically start after a system reboot and will save data in a volume named registry-data. Note that if you plan to use the registry for anything beyond testing purposes, it is recommended to use docker-compose to keep the entire configuration as code.

To fully utilize the benefits of Buildx, we need to recreate the builder with a configuration file named builder.toml, which will serve as a blueprint for our builder. It is important to note that the IP address used in the file (192.168.1.2) must be replaced with the IP address of your computer. To find your IP address, you can use a command such as hostname -I:

[registry."192.168.1.2:5000"]
  http = true
  insecure = true

This file is required to connect to the registry without encryption. While we have omitted encryption for the purpose of this blog entry, it's important to note that encryption is a must(!) in a production environment to ensure the security of the data. When deploying a registry, it's recommended to use SSL certificates to encrypt the communication between the registry and the clients. This can be achieved by either obtaining a trusted SSL certificate or generating a self-signed one. It's important to note that using a self-signed certificate may result in warnings from the clients about the untrusted certificate.

Now, we can use the builder.toml configuration file to create a builder in Buildx. To use the configuration file, we need to add the --config option to the docker buildx create command. After that, we can build and upload the image to the registry.

➜  ~ docker buildx rm multi
➜  ~ docker buildx create --driver=docker-container --name=multi --use --config=builder.toml
multi
➜  ~ docker buildx build -t 192.168.1.2:5000/multi-arch:latest --progress plain --platform linux/amd64,linux/arm/v7,linux/arm64 --push .

*** cut off ***

#13 exporting to image
#13 exporting layers 0.1s done
#13 exporting manifest sha256:c6623eda280c8ee76b2bd63f9a73f429204ad4a260d60edd44de92a9401d276e
#13 exporting manifest sha256:c6623eda280c8ee76b2bd63f9a73f429204ad4a260d60edd44de92a9401d276e done
#13 exporting config sha256:279c30a6d250e68984e909a414149aa21a62138523b0f59c41d0beaa5e65f27d done
#13 exporting manifest sha256:ba2339518538ebd6e178db4b3f604fbe82d0167f25b453bd75dbd6bde5203732 done
#13 exporting config sha256:a25a708b066296c7c952fa48bd59dda84ec3a0a7206e086302fd402f0a6d4d4c done
#13 exporting manifest sha256:7625f06a22638df4e5749709851816a83222ca8e8ab35c1979eb4d206d22c72b done
#13 exporting config sha256:1b27246fc54aaa67e4d09b8161e1068dc8527982c4c0173a31f025bde9ebbff2 done
#13 exporting manifest list sha256:95a7f6cd1a6e7d5b2f4a5dd0d936d56f767166dedc13669cbd6c627b25a43cf4 done
#13 pushing layers
#13 pushing layers 2.5s done
#13 pushing manifest for 192.168.1.2:5000/multi-arch:latest@sha256:95a7f6cd1a6e7d5b2f4a5dd0d936d56f767166dedc13669cbd6c627b25a43cf4 0.0s done
#13 DONE 2.6s

The private Docker Hub has received and stored its first image. I would like to write that the next steps are super easy and it is only the download to Raspberry Pi operation that is left. Unfortunately, it is still not the case. As previously in the case of Buildx, we must first enable Docker to use an unencrypted connection to the registry.
To enable Docker to use an unencrypted connection to the registry, we need to create a file called /etc/docker/daemon.json with the following content, replacing the IP address used here (192.168.1.2) with the IP address of your computer:

{
  "insecure-registries" : ["192.168.1.2:5000"]
}

Now let's restart Docker and see if we can successfully run the container with the image previously prepared:

pi@raspberry:~ $ sudo systemctl restart docker
pi@raspberry:~ $ sudo docker run --rm 192.168.1.2:5000/multi-arch:latest uname -m
Unable to find image '192.168.1.2:5000/multi-arch:latest' locally
latest: Pulling from multi-arch
f83522bf96d7: Pull complete
329a37630ca9: Pull complete
Digest: sha256:f994065ab94a50f09de56873529bcdb4660f50fa3c1b1fbb9797da96725c5766
Status: Downloaded newer image for 192.168.1.2:5000/multi-arch:latest
armv7l

Congratulations, you have successfully completed the process of building and deploying a Docker image to a private registry for your Raspberry Pi. While the process may have required some effort, the end result is a highly efficient and customizable system that you can easily manage from your home network.

It is highly recommended to configure encryption and authentication to ensure the security of your registry. Additionally, you can enhance your registry with a user-friendly web interface, such as the one provided by docker-registry-ui.

Conclusion

The blog post describes the process of building and deploying a Docker image on a Raspberry Pi using a private Docker registry. It covers the use of Buildx, a Docker CLI plugin, for building multi-arch images, and creating a private Docker registry to store and transfer the images to the Raspberry Pi.

I hope that this post has been informative and useful in simplifying your multi-arch builds. Now you should know the process of building and deploying a Docker image on a Raspberry Pi using a private Docker registry. It covers the use of Buildx, a Docker CLI plugin, for building multi-arch images, and creating a private Docker registry to store and transfer the images to the Raspberry Pi.

If you require additional assistance beyond what has been provided, please feel free to reach out to us for professional support. We are always available and eager to assist you.

Useful links

Buildx:

BuildKit:

Registry:

Raspberry Pi:

https://en.wikipedia.org/wiki/Raspberry_Pi#Specifications

Short-Circuit Evaluation in Terraform: A Deep Dive

Daniel Kraszewski — Wed, 20 Aug 2025 15:55:06 +0000

Short-circuit evaluation is a fundamental concept in programming that can significantly impact how your Terraform configurations behave. Unlike traditional programming languages, Terraform's HashiCorp Configuration Language (HCL) has some unique characteristics when it comes to short-circuit evaluation that every DevOps engineer should understand.

What is Short-Circuit Evaluation?

Short-circuit evaluation is an optimization technique where the second operand of a logical operation is only evaluated if the first operand doesn't determine the result. In most programming languages:

For AND operations: if the first condition is false, the second isn't evaluated
For OR operations: if the first condition is true, the second isn't evaluated

How Traditional Languages Handle Short-Circuit Evaluation

In languages like JavaScript, Python, or Go, short-circuit evaluation prevents unnecessary computation and potential errors:

// JavaScript example
const user = getUser();
if (user && user.password) {
    // user.password is only checked if user exists
    console.log('User has password set');
}

# Python example
if user is not None and user.password:
    # user.password only evaluated if user is not None
    print("User has password set")

The Problem: Terraform's Short-Circuit Evaluation Doesn't Always Work

Here's a real-world example that demonstrates the issue:

variable "user_config" {
  type = object({
    username = string
    email    = string
    password = optional(string)
  })
  default = null
}

# ❌ This WILL FAIL with "Attempt to get attribute from null value"
locals {
  needs_random_password = var.user_config != null && var.user_config.password == null
}

Why does this fail? Even though we check var.user_config != null first, Terraform still evaluates the entire expression during the parsing phase and discovers that var.user_config.password is trying to access an attribute on a null value.

Terraform's Unique Approach

Terraform's HCL has a more complex relationship with short-circuit evaluation due to its declarative nature and the way it processes configurations during different phases (plan, apply, etc.).

The Reality: Limited Short-Circuit Evaluation

Unlike traditional programming languages, Terraform's short-circuit evaluation is limited because:

Parsing Phase: Terraform parses the entire expression before evaluation
Type Checking: All attribute accesses are validated regardless of conditions
Static Analysis: Terraform needs to understand all possible code paths

Solutions: Safe Ways to Handle Null Values

Solution 1: Explicit Boolean Variables

The most straightforward and readable approach is to use explicit boolean variables with validation:

variable "create_user" {
  description = "Whether to create the user"
  type        = bool
  default     = false
}

variable "user_config" {
  description = "User configuration (required when create_user is true)"
  type = object({
    username = string
    email    = string
    password = optional(string)
  })
  default = null

  validation {
    condition     = !var.create_user || var.user_config != null
    error_message = "user_config must be provided when create_user is true."
  }
}

resource "random_password" "user_password" {
  count = var.create_user && var.user_config.password == null ? 1 : 0

  length  = 16
  special = true
}

resource "aws_iam_user" "example" {
  count = var.create_user ? 1 : 0

  name = var.user_config.username

  tags = {
    Email = var.user_config.email
  }
}

resource "aws_iam_user_login_profile" "example" {
  count = var.create_user ? 1 : 0

  user     = aws_iam_user.example[0].name
  password = var.user_config.password != null ? var.user_config.password : random_password.user_password[0].result
}

Why this works: By separating the boolean flag from the configuration object, we avoid null attribute access entirely. The validation block ensures data integrity.

Note: Variable validation is a powerful tool that should be leveraged whenever possible. It provides early feedback to users, prevents configuration errors, and makes your modules more robust and user-friendly. The validation block runs during terraform plan and terraform apply, catching issues before resources are created.

Solution 2: Use `try()` Function

For cases where explicit boolean variables aren't preferred:

locals {
  needs_random_password = try(var.user_config.password == null, false)
}

resource "random_password" "user_password" {
  count = local.needs_random_password ? 1 : 0

  length  = 16
  special = true
}

Why this works: The try() function catches any errors during expression evaluation and returns the fallback value (false) if the expression fails. This prevents the "Attempt to get attribute from null value" error by gracefully handling the case where var.user_config is null.

Solution 3: Use `can()` Function

locals {
  needs_random_password = can(var.user_config.password) && var.user_config.password == null
}

resource "random_password" "user_password" {
  count = local.needs_random_password ? 1 : 0

  length  = 16
  special = true
}

Why this works: The can() function tests whether an expression can be evaluated without errors. It returns true if the expression is valid and false if it would cause an error. This allows us to safely check if var.user_config.password is accessible before evaluating it, effectively implementing our own short-circuit logic.

Solution 4: Nested Conditional Expression

locals {
  needs_random_password = var.user_config != null ? var.user_config.password == null : false
}

resource "random_password" "user_password" {
  count = local.needs_random_password ? 1 : 0

  length  = 16
  special = true
}

Why this works: The ternary conditional operator (condition ? true_value : false_value) provides proper short-circuit behavior in Terraform. The second part of the expression (var.user_config.password == null) is only evaluated if the first condition (var.user_config != null) is true, preventing null attribute access errors.

Summary

Short-circuit evaluation in Terraform is not the same as in traditional programming languages. The most effective approaches:

Use explicit boolean variables with validation for feature flags and optional resources - this is the most robust approach that leverages Terraform's validation capabilities
Use try() function for optional nested attributes and fallback values
Use can() function to test expression validity
Use variable validation wherever possible to catch errors early and provide better user experience
Combine approaches based on your team's preferences and the complexity of your configuration

The explicit boolean variable approach with validation tends to be more readable, maintainable, and provides the best user experience, while function-based approaches can be more concise for simple cases. Choose what works best for your team and use case, but always consider adding validation to improve reliability.

DEV Community: Daniel Kraszewski

Kubernetes HPA Scale to Zero Without KEDA: Native Autoscaling for Idle Workloads

Quick Reference

Local Setup with kind

The Metrics Pipeline

Deploy Redis with an Exporter

Deploy Prometheus and the Adapter

The Worker and HPA

Seeing It in Action

Cluster Autoscaler Integration

When to Use This vs. KEDA

Gotchas

Wrapping Up

Architecting and Operating Geospatial Workflows with Dagster

The System Boundary

Data Types and Access Patterns

Asset Graph Design

Partitioning Strategy

Raster Processing Pipeline

Data Quality as Dependencies

Execution and Elasticity

Observability and Lineage

Lessons from Production: Architectural Revisions

Open Questions and Next Steps

Forwarding Cookies Using CloudFront: A Workaround for AWS Cache Policy Limitations

The Problem: Managed Cache Policies and CachingDisabled

Why We Use CloudFront

The Solution: Legacy forwarded_values Configuration

Key Configuration Elements

Trade-offs and Considerations

Conclusion

The default user in the Docker image

The default user in the Docker image

Conclusion

Understanding the Docker build context

The context of the build - or where the megabytes in the console are coming from.

Summary

Understanding Docker multi-stage builds

Multi-stage builds

Conclusion

Speed up Docker image builds with cache management

Speed up image builds with cache management

Selecting the appropriate Docker base image

Choose base images wisely

Who is the author?

Architecture

Version

Size

Security

Standard library C

Conclusion

Exposing Private Load Balancers with CloudFront VPC Origins

Introduction: Why Keep Your Load Balancers Private?

The Architecture

Practical Implementation with Terraform

1. Setting Up the Private Load Balancer

2. Creating the Security Groups

3. Setting Up the VPC Origin Connection

4. Configuring the CloudFront Distribution

Conclusion: Solving Real Security Challenges

Terraform as a One-Shot Init Container in Docker Compose and CI: Ending "It Worked On My Machine"

The Problem: Infrastructure Assumptions Hidden in Plain Sight

The Solution: Infrastructure as Code, Everywhere

What Terraform Actually Creates

Local ↔ CI Parity: The Same Flow Everywhere

Testing Against Real Infrastructure (Not Mocks)

From Local to Production: The Path Forward

Why This Matters Beyond Just "Working"

Getting Started: Your Next Steps

The Bigger Picture

Creating Docker images that can run on different platforms, including Raspberry Pi

Building a multi-architecture Docker images

Docker Buildx

Custom image registry

Conclusion

Useful links

Short-Circuit Evaluation in Terraform: A Deep Dive

What is Short-Circuit Evaluation?

How Traditional Languages Handle Short-Circuit Evaluation

The Problem: Terraform's Short-Circuit Evaluation Doesn't Always Work

The Solution: Legacy `forwarded_values` Configuration

Solution 2: Use `try()` Function

Solution 3: Use `can()` Function