DEV Community: Sumedhvats

How I Shrunk My Docker Images by 98% (Go + Next.js)

Sumedhvats — Tue, 16 Jun 2026 13:21:51 +0000

I was building pasteCTL — a real-time collaborative paste/code sharing app (source on GitHub) — and at some point I opened lazydocker to check on things and saw this:

Combined, my two services were sitting at nearly 3.6GB. For a Go API and a Next.js frontend.

The app worked fine. But those numbers were going to slow down every deploy, eat registry storage, and make cold starts painful. So I fixed both of them. Here's the full process, step by step.

Why Docker Images Get So Big

When you write FROM golang:1.26 or FROM node, you're pulling an image designed for development — it includes the compiler, build tools, package managers, debug utilities, and a full OS userland. All of that gets baked into your final image even though none of it runs in production.

The Go compiler alone is ~600MB. The default node image (Debian-based) is over 1GB before you install a single dependency. By the time you run npm install or go mod download, you're already deep in the hole.

The fix is multi-stage builds: use one container to compile and build, then throw it away and copy only the output into a minimal runtime container.

Step 1 — Fix the Go Backend

Before

FROM golang:1.26
ENV FRONTEND_URL=http://paste.sumedh.app
ENV DATABASE_URL=https://notdburl.com
RUN apt-get install -y --no-install-recommends git ca-certificates tzdata
WORKDIR /backend
COPY . .
RUN go mod download
RUN go build -o main cmd/main.go
RUN chmod +x main
EXPOSE 8080
CMD ["./main"]

This uses golang:1.26 — the full Debian-based image. It compiles the binary and then leaves the entire Go toolchain, Debian base, and all source code sitting inside the final image. Size: 1.60GB.

After

FROM golang:1.26-alpine AS builder

RUN apk add --no-cache git ca-certificates tzdata
RUN adduser -D -g '' appuser

WORKDIR /backend
COPY go.mod go.sum ./
RUN go mod download
COPY . .

RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
    go build -trimpath -ldflags="-w -s -extldflags '-static'" -o /app/main ./cmd

FROM scratch AS runner

COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /usr/share/zoneinfo /usr/share/zoneinfo
COPY --from=builder /etc/passwd /etc/passwd
COPY --from=builder /app/main /main

USER appuser
EXPOSE 8080
ENTRYPOINT ["/main"]

Size: ~15MB.

Here's what changed and why each change matters:

Switch to golang:1.26-alpine for the build stage. Alpine is a minimal Linux distro. The Alpine-based Go image is a fraction of the Debian one — same compiler, none of the Debian bloat.

Use FROM scratch for the runner. This is the biggest backend win. scratch is a completely empty image — zero bytes, no OS, no shell, nothing. Since Go can compile to a fully static binary with no OS dependencies at all, you don't need an OS in the runner. The final image size is essentially just your binary.

Copy go.mod and go.sum before copying source. Docker caches each layer. If you COPY . . everything at once, any change to any source file invalidates the dependency download layer and re-downloads all your modules. Copying only the mod files first means that layer only busts when your dependencies actually change.

Add -trimpath and -extldflags '-static' to the build. -trimpath removes file system paths from the compiled binary, making it smaller and reproducible across machines. -extldflags '-static' guarantees no dynamic C libraries are linked — required for the binary to run in a scratch container.

Create the non-root user in the builder stage. Because scratch is empty, there's no adduser binary in the runner. You create the user in the builder and copy /etc/passwd across so the USER appuser directive has something to reference.

Copy certificates and timezone data from the builder. scratch has no filesystem at all, so anything your app needs at runtime must be explicitly copied. CA certificates are needed for outbound HTTPS calls, and zoneinfo is needed if your app does anything timezone-aware.

Step 2 — Fix the Next.js Frontend

Before

FROM node
ENV NEXT_PUBLIC_BACKEND_URL = http://paste.sumedh.app
ENV NEXT_PUBLIC_WS_URL = ws://paste.sumedh.app
WORKDIR /frontend
COPY . .
RUN npm install --legacy-peer-deps
RUN npm run build
EXPOSE 3000
CMD ["npm","run","start"]

This uses FROM node — the full default Node image, over 1GB by itself. Then npm install pulls in all of node_modules including every dev dependency: TypeScript, ESLint, webpack, the works. Size: 1.98GB.

After

FROM node:22-alpine AS builder

RUN apk add --no-cache libc6-compat
WORKDIR /frontend

ARG NEXT_PUBLIC_BACKEND_URL
ARG NEXT_PUBLIC_WS_URL
ENV NEXT_PUBLIC_BACKEND_URL=$NEXT_PUBLIC_BACKEND_URL \
    NEXT_PUBLIC_WS_URL=$NEXT_PUBLIC_WS_URL \
    NEXT_TELEMETRY_DISABLED=1

COPY package.json package-lock.json ./
RUN npm ci --legacy-peer-deps

COPY . .
RUN npm run build

FROM node:22-alpine AS runner
WORKDIR /app

RUN apk add --no-cache libc6-compat

ENV NODE_ENV=production \
    NEXT_TELEMETRY_DISABLED=1 \
    PORT=3000 \
    HOSTNAME=0.0.0.0

RUN addgroup --system --gid 1001 nodejs && \
    adduser --system --uid 1001 nextjs

COPY --from=builder --chown=nextjs:nodejs /frontend/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /frontend/.next/static ./.next/static
COPY --from=builder --chown=nextjs:nodejs /frontend/public ./public

USER nextjs
EXPOSE 3000

CMD ["node", "server.js"]

Size: 190.53MB.

Here's what changed:

Switch from node to node:22-alpine. Same move as the backend — Alpine drops the Debian userland. node:22-alpine is around 130MB vs 1GB+ for the default.

Add libc6-compat to both stages. Next.js uses SWC (a Rust-based compiler) and image optimization libraries like Sharp that depend on glibc. Alpine uses musl libc instead, and without this compatibility shim, the build or runtime will crash. It needs to be in both the builder and runner.

Copy package.json and package-lock.json first. Same layer caching logic as the Go backend. Your node_modules only rebuilds when your dependency files change.

Use npm ci instead of npm install. npm ci installs exactly what's in package-lock.json, skips the resolution step, and is faster and fully reproducible.

Use ARG for public env variables. The original Dockerfile hardcoded the URLs. Using ARG lets you pass them in at build time so the same Dockerfile works across environments.

Copy only the standalone output in the runner. Next.js has an output: 'standalone' mode that produces a minimal self-contained server bundle under .next/standalone with only the production Node dependencies your app actually needs — not all of node_modules. You also copy .next/static and public. That's it.

Note: You need output: 'standalone' in your next.config.js for this to work. Without it, the .next/standalone directory won't be generated and the build will fail.

Step 3 — The .dockerignore Files

A frequently missed cause of slow builds and inflated images is sending unnecessary files to the Docker daemon when COPY . . runs. Every file in the build context gets sent over, even if it never ends up in the image. Add these to both directories.

./backend/.dockerignore

.git
.idea
.vscode
*.md
bin/

./frontend/.dockerignore

.git
.next
node_modules
.env*.local
*.md
.vscode
.idea
npm-debug.log*
yarn-debug.log*
yarn-error.log*

The node_modules entry is especially important for the frontend. Without it, your entire local node_modules folder gets sent to the daemon and partially shadows what's installed inside the container, which causes subtle and confusing bugs.

Step 4 — Tighten the Docker Compose

With the images sorted, there were a couple of things worth fixing in the compose file too.

Container logs grow indefinitely by default. On a long-running server, uncapped JSON logs will quietly eat your disk. Adding a log driver config caps each service at 30MB total (3 files × 10MB).

The Postgres volume mount had a bug. The original used /var/lib/postgresql as the mount target. Postgres actually stores data in /var/lib/postgresql/data — mounting the parent can cause initialization to fail if that directory already has files in it.

name: pasteCTL_web

x-logging: &default-logging
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

services:
  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile
    container_name: pastectl_backend
    ports:
      - "8080:8080"
    environment:
      - FRONTEND_URL=http://localhost:3000
      - DATABASE_URL=postgres://user:password@db:5432/pastectl
    depends_on:
      db:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "wget -qO- http://localhost:8080/api/health || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s
    restart: unless-stopped
    logging: *default-logging

  db:
    image: postgres:18-alpine
    container_name: pastectl_db
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: pastectl
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d pastectl"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped
    logging: *default-logging

  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile
      args:
        - NEXT_PUBLIC_BACKEND_URL=http://localhost:8080
        - NEXT_PUBLIC_WS_URL=ws://localhost:8080
    container_name: pastectl_frontend
    ports:
      - "3000:3000"
    depends_on:
      backend:
        condition: service_healthy
    restart: unless-stopped
    logging: *default-logging

volumes:
  pgdata:

The Final Numbers

3.58GB down to ~205MB. Both services, same functionality, no changes to application code.

What You Could Still Improve

These Dockerfiles are solid for production, but there are a few things worth exploring if you want to go further:

Distroless runner for the frontend. Google's gcr.io/distroless/nodejs22-debian12 image is more locked-down than Alpine — no shell, no package manager, no utilities. Harder to debug but a smaller attack surface.

docker scout or Trivy for CVE scanning. Smaller images have fewer vulnerabilities, but Alpine and even scratch aren't immune. Running a scanner in CI catches issues before they reach production.

BuildKit cache mounts for Go modules. Instead of the COPY go.mod trick, BuildKit's --mount=type=cache keeps the module cache between builds on the same machine, making repeated local builds significantly faster.

Pin base image digests. golang:1.26-alpine is a mutable tag — it can change under you. For reproducible builds, pin to the SHA digest: FROM golang:1.26-alpine@sha256:....

Build the images in CI and push to a registry. Right now the images are built on the host. Moving the build to GitHub Actions and pushing to GHCR or Docker Hub means your production server only pulls, never builds.

The full source for pasteCTL is on GitHub if you want to look at the actual Dockerfiles in context.

Production-Ready Rate Limiter in Go: From Side Project to Distributed System

Sumedhvats — Mon, 03 Nov 2025 09:59:11 +0000

A deep dive into three algorithms, atomic Redis operations, and building a high-performance, flexible library from scratch.

When you're building a new service, rate limiting is one of those things you know you need, but you often start with something simple. Maybe it's a basic in-memory counter. But what happens when your service grows? When you move from a single server to a distributed system, that simple counter breaks down. You're stuck rewriting your rate limiting logic.

Most Go rate limiters I found forced me into a single algorithm (usually token bucket) or locked me into a specific storage backend. This was the problem I set out to solve.

I decided to build rate-limiter-go, a library that scales with you. It provides:

Multiple battle-tested algorithms
Pluggable storage (in-memory or Redis)
Atomic Redis operations for concurrency-safe, production-ready limiting

In this post, I'm going to walk you through the journey of building it: the algorithms I explored, the edge cases I found, and the final high-performance library.

Part 1: The Quest for the "Perfect" Algorithm

Rate limiting seems simple, but there are many ways to do it, each with critical trade-offs.

1. The Naive Start: Fixed Window

This is the most intuitive approach. You divide time into fixed "windows" (e.g., one minute) and allow a certain number of requests in that window.

Mental Model:
- Set a limit (e.g., 100 requests per minute).
- If the time is 12:24:02, the window is 12:24:00 to 12:24:59.
- All requests in this period increment a single counter.
- If counter > 100, reject.
- At 12:25:00, the counter resets to 0.
The Problem: Burst Errors
This algorithm has a major flaw. Imagine your limit is 100 requests/minute. A user could send 100 requests at 12:24:59 (which are allowed) and then another 100 requests at 12:25:00 (which are also allowed, as it's a new window).

This user just sent 200 requests in two seconds, effectively doubling your intended rate limit and bypassing your protection.
When to Use It: Simple, low-traffic, or single-node setups where absolute precision isn't critical.

2. The "Smooth" Approach: Token Bucket

This algorithm is a classic for a reason. It's designed to handle bursts gracefully while maintaining a steady average rate.

Mental Model:
- Each user gets a "bucket" with a maximum capacity (e.g., 100 tokens).
- The bucket is refilled at a constant rate (e.g., 10 tokens per second).
- Every request tries to consume one token.
- If a token is available, the request is allowed.
- If the bucket is empty, the request is rejected.

This is much better. It allows a user to "save up" tokens to send a short burst (up to the bucket capacity), but they can't exceed the steady-state refill rate over the long term.

Implementation Edge Cases:
- Clock Skew: In a distributed system, different servers will have different clocks, leading to inconsistent refill calculations. Solution: Use Redis server time (TIME command) as the single source of truth.
- Float Precision: Refill rates are often fractional (e.g., 1.66 tokens/sec). This can lead to floating-point precision issues. Solution: Be careful to round values before comparison.
When to Use It: This is ideal for most public APIs. It provides smooth flow control and allows for legitimate, short-term bursts of traffic.

3. The Balanced Approach: Sliding Window Counter

This was the algorithm that struck the best balance for me. It solves the "burst error" of the Fixed Window but is simpler to implement and often more performant than a Token Bucket.

Mental Model:
This algorithm smooths out the rate by considering a weighted average of the previous window and the current window.

Imagine a 1-minute window (limit 100).
- It's 12:25:15 (so, we are 25% of the way through the current window).
- Previous Window (12:24): Had 80 requests.
- Current Window (12:25): Has 10 requests so far.
We don't just look at the 10 requests. We calculate a weighted count:
- Weight of Previous Window: 75% (since 75% of the sliding window is still in the past)
- Weight of Current Window: 25%
Weighted Count = (80 requests * 75%) + (10 requests * 25%)
Weighted Count = 60 + 2.5 = 62.5

The user's current effective count is 62.5. They can continue making requests. This approach gracefully "slides" the count from one window to the next, completely eliminating the boundary burst problem.
When to Use It: My recommendation for most general-purpose, distributed rate limiting. It provides excellent accuracy and performance without the complexity of token management.

Part 2: From Theory to Production Library

Knowing the algorithms is one thing; implementing them in a production-ready way is another. Here were my core design goals for rate-limiter-go.

1. Storage Backend Abstraction

I wanted to start with in-memory storage for development and scale to Redis in production without changing my application code.

I defined a simple Storage interface:

type Storage interface {
    Get(key string) (interface{}, error)
    Set(key string, value interface{}, ttl time.Duration) error
    Delete(key string) error
    Increment(key string, amount int, ttl time.Duration) (int64, error)
}

Now, I can initialize my limiter with either backend:

// Development: in-memory
store := storage.NewMemoryStorage()

// Production: Redis (same interface)
store := storage.NewRedisStorage("redis-cluster:6379")

// Same limiter code works with both
rateLimiter := limiter.NewSlidingWindowLimiter(store, config)

2. Atomic Redis Operations

In a concurrent system, you can't just GET a value, check it, and then SET it. This is a classic race condition.

# !! RACE CONDITION !!
# Client 1 GETS count (99)
# Client 2 GETS count (99)
# Client 1 increments to 100, SETS 100. (Allowed)
# Client 2 increments to 100, SETS 100. (Also Allowed)
# !! We just allowed 101 requests !!

The solution is to perform all operations atomically. I used Lua scripts, which Redis guarantees will run without interruption.

Here is the (simplified) Lua script for the Fixed Window algorithm. It gets, checks, increments, and sets the expiry all in one atomic step.

-- Fixed Window example (simplified)
local current = tonumber(redis.call('GET', key) or '0')
if current + increment > limit then
    return 0  -- Denied
end
redis.call('INCRBY', key, increment)
redis.call('EXPIRE', key, ttl)
return 1  -- Allowed

No race conditions. No approximate counting. Just correctness.

3. High-Performance In-Memory Storage

For the in-memory backend, the obvious choice is a sync.Mutex wrapping a map[string]int. However, Go's documentation mentions sync.Map is optimized for a specific case: "when a given key is written once but read many times."

A rate limiter cache is the opposite: keys are read and written to on almost every request.

My implementation for in-memory storage uses sync.Map but leverages its CompareAndSwap (CAS) atomic operations to safely increment counters under high concurrency, which performs better than a single, global mutex blocking all goroutines.

Part 3: Putting It All Together

Here's what the final library looks like in practice.

Quick Start: 5 Lines to Rate Limiting

This is all it takes to add rate limiting to any function.

package main

import (
    "fmt"
    "time"

    "github.com/sumedhvats/rate-limiter-go/pkg/limiter"
    "github.com/sumedhvats/rate-limiter-go/pkg/storage"
)

func main() {
    // 1. Create in-memory storage
    store := storage.NewMemoryStorage()

    // 2. Create limiter: 10 requests per minute
    rateLimiter := limiter.NewSlidingWindowLimiter(store, limiter.Config{
        Rate:   10,
        Window: 1 * time.Minute,
    })

    // 3. Check if request is allowed
    allowed, err := rateLimiter.Allow("user:alice")
    if err != nil {
        panic(err)
    }

    // 4. Deny or allow
    if !allowed {
        fmt.Println("Rate limit exceeded!")
        return
    }

    // 5. Allow
    fmt.Println("Request allowed!")
}

The Most Common Use Case: HTTP Middleware

Of course, the most common need is for an HTTP API. I built a middleware that handles everything automatically.

package main

import (
    "net/http"
    "time"

    "github.com/sumedhvats/rate-limiter-go/middleware"
    "github.com/sumedhvats/rate-limiter-go/pkg/limiter"
    "github.com/sumedhvats/rate-limiter-go/pkg/storage"
)

func main() {
    // Use Redis for a distributed system
    store := storage.NewRedisStorage("localhost:6379")

    // 100 requests per minute per IP
    rateLimiter := limiter.NewSlidingWindowLimiter(store, limiter.Config{
        Rate:   100,
        Window: 1 * time.Minute,
    })

    // Apply middleware
    mux := http.NewServeMux()
    mux.HandleFunc("/api/data", dataHandler)

    // The middleware automatically uses IP address as the key
    handler := middleware.RateLimitMiddleware(middleware.Config{
        Limiter: rateLimiter,
    })(mux)

    http.ListenAndServe(":8080", handler)
}

func dataHandler(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("Data served successfully"))
}

This middleware automatically:

Extracts the client IP (handling X-Forwarded-For proxies).
Returns a 429 Too Many Requests JSON error.
Adds standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset).

Part 4: The Proof: Does it Scale?

I built this for performance, so I benchmarked it heavily. Here are the results on my 12th Gen Intel i5.

This first test shows a realistic, concurrent load with many different keys (e.g., many different users hitting the API).

Multiple Keys (Realistic Load) - Concurrent
| Algorithm | Time/op | Memory/op|
|--------------------|--------------|----------|
| Sliding Window | 68 ns/op | 100 B/op |
| Token Bucket | 76 ns/op | 160 B/op |
| Fixed Window | 130 ns/op | 261 B/op |

This test shows how the system scales when hammering the cache with 10,000 unique keys.

Scalability (10K Keys) - Concurrent
| Algorithm | Time/op | Throughput |
|--------------------|----------|-----------------|
| Token Bucket | 56 ns/op | ~17M ops/sec |
| Sliding Window | 74 ns/op | ~13M ops/sec |
| Fixed Window | 95 ns/op | ~10M ops/sec |

Key Insights:

Sliding Window and Token Bucket are the clear winners, both able to handle 13-17 million operations per second on a single core.
They are incredibly lightweight, using 100-160 bytes per operation.
The performance scales linearly with the number of keys.

Conclusion

Building this library was a fantastic journey through algorithm design, concurrency patterns in Go, and atomic database operations with Redis.

I started with a simple goal: create a rate limiter that wouldn't need to be rewritten when a project scaled. The result is a library that lets you choose the right algorithm for the job, scales from a single in-memory instance to a distributed Redis cluster, and operates with atomic, concurrency-safe guarantees.

If you want to check out the code, contribute, or use the library in your own project, you can find it on GitHub.

Thanks for reading!