IronSoftware

Posted on Feb 20

Gotenberg Docker Setup: Understanding the Hidden Complexity (Fixed)

#csharp #dotnet

Gotenberg has become a popular choice for developers seeking HTML to PDF conversion in containerized environments. The project markets itself as a "developer-friendly API" that bundles Chromium and LibreOffice into a Docker image. This sounds convenient on paper, but the architecture introduces operational complexity that many teams only discover after deployment.

This article examines the real-world challenges of running Gotenberg in production: the DevOps overhead of managing a separate service, network latency implications, container orchestration complexity, and the ongoing maintenance burden. It also presents an alternative architectural approach that eliminates these concerns.

The Gotenberg Architecture

Gotenberg operates as a stateless HTTP API packaged in a Docker container. To convert an HTML document to PDF, your application must:

Run the Gotenberg container as a separate service
Send HTTP requests with multipart form data containing your HTML
Receive the PDF binary in the HTTP response
Handle timeouts, retries, and error cases across the network boundary

The basic setup requires pulling the image and running it:

docker run --rm -p 3000:3000 gotenberg/gotenberg:8

While this single command appears simple, production deployments demand considerably more configuration.

Container Configuration Complexity

The Gotenberg image exposes numerous configuration flags for tuning Chromium and LibreOffice behavior. A production-ready docker-compose configuration often looks like this:

version: "3.8"
services:
  gotenberg:
    image: gotenberg/gotenberg:8
    restart: unless-stopped
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=false"
      - "--chromium-allow-list=file:///tmp/.*"
      - "--chromium-deny-list="
      - "--chromium-ignore-certificate-errors=true"
      - "--chromium-disable-web-security=true"
      - "--api-timeout=180s"
      - "--chromium-max-queue-size=20"
      - "--libreoffice-disable-routes=false"
    ports:
      - "3000:3000"
    deploy:
      resources:
        limits:
          memory: 4G
        reservations:
          memory: 2G
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

This configuration addresses several issues developers encounter:

Memory limits prevent runaway Chromium processes from consuming all host resources
Health checks enable container orchestrators to restart failed instances
Queue size limits prevent request pile-up during traffic spikes
Timeout configuration balances between allowing complex renders and preventing hung requests

Each of these settings requires understanding both Gotenberg's internals and your specific workload characteristics.

DevOps Overhead

Running Gotenberg means operating another service in your infrastructure. This creates ongoing work that compounds over time.

Service Monitoring

Your existing application monitoring needs to extend to Gotenberg:

Response time tracking for conversion requests
Error rate monitoring for failed conversions
Memory and CPU utilization alerts
Queue depth monitoring to detect backpressure
Health check integration with your alerting system

A typical Prometheus configuration for Gotenberg monitoring:

- job_name: 'gotenberg'
  static_configs:
    - targets: ['gotenberg:3000']
  metrics_path: '/prometheus/metrics'
  scrape_interval: 15s

You then need Grafana dashboards, alert rules, and runbooks for when things go wrong.

Version Management

Gotenberg releases new versions regularly. Each update potentially changes:

Chromium version (affecting rendering behavior)
LibreOffice version (affecting document conversion)
API behavior or configuration options
Resource consumption patterns

Staying on old versions risks security vulnerabilities and missing bug fixes. Upgrading requires testing to ensure your existing conversions still work correctly. This creates a recurring maintenance task that many teams underestimate.

Security Considerations

Gotenberg accepts arbitrary HTML and converts it using a full browser engine. This creates security surface area:

The service should not be exposed to the public internet
Input validation at the application layer remains essential
Network policies should restrict which services can reach Gotenberg
Container security hardening (non-root user, read-only filesystem) adds complexity

# Kubernetes NetworkPolicy example
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: gotenberg-ingress
spec:
  podSelector:
    matchLabels:
      app: gotenberg
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: your-application
      ports:
        - protocol: TCP
          port: 3000

Network Latency Impact

Every PDF conversion requires a round trip over the network. This adds latency that accumulates in high-volume scenarios.

Request Flow Analysis

A typical Gotenberg HTML to PDF conversion involves:

Serialization: Your application serializes HTML, CSS, and assets into multipart form data
Network transfer (outbound): The request travels to Gotenberg over TCP
Parsing: Gotenberg parses the multipart request
Chromium rendering: The actual PDF generation occurs
Network transfer (inbound): The PDF binary returns to your application
Deserialization: Your application reads the response bytes

Steps 2 and 5 introduce latency that does not exist with in-process conversion. Within a Kubernetes cluster, this might add 1-5ms per request. Across availability zones or regions, the penalty grows to 10-50ms or more.

Throughput Constraints

Network-based architecture creates throughput limitations:

TCP connection overhead for each request (or connection pool management)
Serialization/deserialization CPU cost
Network bandwidth consumption for large HTML documents or PDFs
Potential for network-related failures (timeouts, connection resets)

Consider a scenario generating 1,000 invoices. With Gotenberg, each invoice requires a network round trip. Even with connection pooling and parallelization, the network overhead accumulates:

// Gotenberg approach - each conversion is a network call
public async Task<byte[]> ConvertWithGotenberg(string html)
{
    using var client = new HttpClient();
    using var content = new MultipartFormDataContent();

    // Serialize HTML into form data
    var htmlContent = new StringContent(html);
    content.Add(htmlContent, "files", "index.html");

    // Network round trip to Gotenberg
    var response = await client.PostAsync(
        "http://gotenberg:3000/forms/chromium/convert/html",
        content);

    return await response.Content.ReadAsByteArrayAsync();
}

Container Orchestration Challenges

Production deployments rarely run a single Gotenberg instance. Scaling introduces additional complexity.

Kubernetes Deployment Considerations

A production Kubernetes deployment requires:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gotenberg
spec:
  replicas: 3
  selector:
    matchLabels:
      app: gotenberg
  template:
    metadata:
      labels:
        app: gotenberg
    spec:
      containers:
        - name: gotenberg
          image: gotenberg/gotenberg:8
          ports:
            - containerPort: 3000
          resources:
            requests:
              memory: "2Gi"
              cpu: "1000m"
            limits:
              memory: "4Gi"
              cpu: "2000m"
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: gotenberg
spec:
  selector:
    app: gotenberg
  ports:
    - port: 3000
      targetPort: 3000
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: gotenberg
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: gotenberg
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

This configuration addresses several concerns:

Replicas: Multiple instances for availability and throughput
Resource limits: Prevent individual pods from consuming excessive resources
Health probes: Enable Kubernetes to detect and replace unhealthy instances
HPA: Automatic scaling based on load

Each component requires tuning based on your workload, and misconfiguration leads to either wasted resources or degraded performance.

State and Stickiness Issues

While Gotenberg is stateless by design, certain conversion scenarios benefit from request affinity:

Multi-step conversions that share temporary files
Conversions requiring pre-warmed Chromium instances
Debugging scenarios where you need consistent routing

Implementing session affinity adds another layer of configuration and can conflict with load balancing efficiency.

Resource Overhead

Running Gotenberg means dedicating compute resources to a service that sits idle between conversion requests.

Memory Consumption

A running Gotenberg instance with Chromium consumes significant memory even when idle:

Base container: ~500MB
Chromium process: ~300-500MB additional
LibreOffice (if enabled): ~200-400MB additional
Per-request overhead: varies with document complexity

For a three-replica deployment with comfortable headroom, you might allocate 12GB of memory to Gotenberg alone.

CPU Utilization Patterns

PDF conversion is CPU-intensive during rendering but leaves CPU idle between requests. Unless your traffic patterns show consistent conversion load, you pay for capacity that sits unused. The bursty nature of most PDF generation workloads makes right-sizing difficult.

An Alternative Approach: Embedded Conversion

The operational complexity of Gotenberg stems from its architecture as a separate service. An alternative approach embeds PDF conversion directly into your application process, eliminating the service boundary.

IronPDF takes this approach, packaging a Chrome-based rendering engine as a NuGet package. Conversion happens in-process without network calls:

using IronPdf;

public class InvoiceGenerator
{
    public byte[] GenerateInvoice(InvoiceData data)
    {
        // Create renderer - Chrome engine is embedded
        var renderer = new ChromePdfRenderer();

        // Configure rendering options
        renderer.RenderingOptions.MarginTop = 20;
        renderer.RenderingOptions.MarginBottom = 20;
        renderer.RenderingOptions.PaperSize = IronPdf.Rendering.PdfPaperSize.A4;

        // Build HTML from template
        string html = BuildInvoiceHtml(data);

        // Convert in-process - no network call
        PdfDocument pdf = renderer.RenderHtmlAsPdf(html);

        return pdf.BinaryData;
    }

    private string BuildInvoiceHtml(InvoiceData data)
    {
        return $@"
            <!DOCTYPE html>
            <html>
            <head>
                <style>
                    body {{ font-family: Arial, sans-serif; }}
                    .invoice-header {{ display: flex; justify-content: space-between; }}
                    .line-items {{ width: 100%; border-collapse: collapse; }}
                    .line-items th, .line-items td {{
                        border: 1px solid #ddd;
                        padding: 8px;
                        text-align: left;
                    }}
                </style>
            </head>
            <body>
                <div class='invoice-header'>
                    <h1>Invoice #{data.InvoiceNumber}</h1>
                    <p>Date: {data.Date:yyyy-MM-dd}</p>
                </div>
                <!-- Invoice content here -->
            </body>
            </html>";
    }
}

Architectural Comparison

Aspect	Gotenberg (Service)	IronPDF (Embedded)
Deployment	Separate container	NuGet package
Network calls	Required	None
Scaling	Independent service scaling	Scales with application
Monitoring	Separate metrics pipeline	Application metrics
Versioning	Container image updates	Package updates
Latency	Network round-trip	In-process
Resource isolation	Container boundaries	Process shared

Deployment Simplification

With embedded conversion, your Dockerfile remains straightforward:

FROM mcr.microsoft.com/dotnet/aspnet:8.0
WORKDIR /app
COPY --from=build /app/publish .
ENTRYPOINT ["dotnet", "YourApplication.dll"]

No sidecar containers, no service mesh configuration, no inter-service authentication. Your application handles PDF conversion as another function call.

Code Comparison

Batch invoice generation illustrates the difference:

Gotenberg approach:

public async Task<List<byte[]>> GenerateInvoicesBatch(List<InvoiceData> invoices)
{
    var results = new List<byte[]>();
    using var client = new HttpClient { Timeout = TimeSpan.FromMinutes(5) };

    // Process in parallel with concurrency limit
    var semaphore = new SemaphoreSlim(10);
    var tasks = invoices.Select(async invoice =>
    {
        await semaphore.WaitAsync();
        try
        {
            using var content = new MultipartFormDataContent();
            var html = BuildInvoiceHtml(invoice);
            content.Add(new StringContent(html), "files", "index.html");

            var response = await client.PostAsync(
                "http://gotenberg:3000/forms/chromium/convert/html",
                content);

            response.EnsureSuccessStatusCode();
            return await response.Content.ReadAsByteArrayAsync();
        }
        finally
        {
            semaphore.Release();
        }
    });

    return (await Task.WhenAll(tasks)).ToList();
}

IronPDF approach:

public List<byte[]> GenerateInvoicesBatch(List<InvoiceData> invoices)
{
    var renderer = new ChromePdfRenderer();
    renderer.RenderingOptions.PaperSize = IronPdf.Rendering.PdfPaperSize.A4;

    // Process in parallel - no network, no semaphore needed for external service
    return invoices
        .AsParallel()
        .Select(invoice =>
        {
            string html = BuildInvoiceHtml(invoice);
            PdfDocument pdf = renderer.RenderHtmlAsPdf(html);
            return pdf.BinaryData;
        })
        .ToList();
}

The embedded approach eliminates:

HTTP client configuration
Semaphore-based concurrency limiting for external service
Network timeout handling
Response deserialization
Retry logic for transient network failures

Platform Support

IronPDF runs on the same platforms where your .NET application runs:

Windows (x64, x86)
Linux (Debian, Ubuntu, CentOS, Alpine)
macOS (Intel and Apple Silicon)
Docker containers
Azure App Service, AWS Lambda, Google Cloud Run

The rendering engine binaries are included in the NuGet package and extracted at runtime.

When Gotenberg Makes Sense

Despite the complexity, Gotenberg remains appropriate for certain scenarios:

Polyglot environments: When PDF generation is needed from multiple applications written in different languages
Strict resource isolation: When PDF conversion must be isolated from application memory for security or stability
Existing microservices infrastructure: When your team already operates extensive service meshes and the incremental cost is minimal
LibreOffice document conversion: When you need Word/Excel to PDF conversion alongside HTML conversion

Migration Considerations

Teams moving from Gotenberg to embedded conversion should consider:

API Surface Changes

Gotenberg uses multipart form requests. IronPDF uses method calls:

// Gotenberg: HTTP multipart request
// POST /forms/chromium/convert/html
// Content-Type: multipart/form-data
// files: index.html

// IronPDF: Method call
var pdf = renderer.RenderHtmlAsPdf(htmlString);
// or
var pdf = renderer.RenderHtmlFileAsPdf("path/to/file.html");

Configuration Translation

Common Gotenberg configurations map to IronPDF options:

var renderer = new ChromePdfRenderer();

// Gotenberg: --chromium-disable-javascript=true
renderer.RenderingOptions.EnableJavaScript = false;

// Gotenberg: --chromium-wait-delay=1s
renderer.RenderingOptions.WaitFor.RenderDelay = 1000;

// Gotenberg: PDF options in request body
renderer.RenderingOptions.MarginTop = 10;
renderer.RenderingOptions.MarginBottom = 10;
renderer.RenderingOptions.MarginLeft = 10;
renderer.RenderingOptions.MarginRight = 10;

// Gotenberg: paper size in request
renderer.RenderingOptions.PaperSize = IronPdf.Rendering.PdfPaperSize.A4;
// or custom size
renderer.RenderingOptions.SetCustomPaperSizeInMillimeters(210, 297);

Licensing

IronPDF is commercial software. Evaluate the licensing cost against the operational cost savings from eliminating a separate service. For many teams, the reduced DevOps burden justifies the license expense.

A free trial allows testing with your actual workload before commitment.

Conclusion

Gotenberg's Docker-based architecture trades development convenience for operational complexity. Running a separate conversion service means managing containers, configuring orchestration, monitoring another system, handling network failures, and accepting latency overhead.

Embedding PDF conversion in your application process eliminates these concerns. The conversion code becomes part of your application, scaling and deploying together, without network boundaries or service management overhead.

For .NET teams, IronPDF provides Chrome-based HTML rendering as a NuGet package, matching Gotenberg's conversion quality while removing the architectural complexity. The approach particularly benefits teams without dedicated DevOps capacity or those seeking to reduce their operational footprint.

Jacob Mellor is CTO at Iron Software with over 25 years building developer tools.

References

Gotenberg Documentation{:rel="nofollow"} - Official installation and configuration guide
Gotenberg GitHub Repository{:rel="nofollow"} - Source code and issue tracker
Gotenberg Docker Hub{:rel="nofollow"} - Official Docker image
IronPDF for .NET - Embedded Chrome PDF generation for .NET
IronPDF Docker Guide - Running IronPDF in containers
ChromePdfRenderer API Reference - IronPDF rendering options

For the latest IronPDF documentation and tutorials, visit ironpdf.com.

DEV Community