DEV Community

Cover image for OpenTelemetry Filelog Receiver
Alexandr Bandurchin for Uptrace

Posted on • Originally published at uptrace.dev

OpenTelemetry Filelog Receiver

📋 Part of the OpenTelemetry ecosystem: The Filelog Receiver is a component of the OpenTelemetry Collector that transforms file-based logs into structured OpenTelemetry format. New to OpenTelemetry? Start with What is OpenTelemetry?

Traditional log collection ties you to specific backends. The OpenTelemetry Filelog Receiver breaks this lock-in by converting file logs into vendor-neutral format, allowing you to send data to any compatible backend (Uptrace, Jaeger, Elasticsearch, Splunk).

This guide takes you from basic file tailing to production-grade log pipelines with parsing, multiline handling, and Kubernetes integration.

What is Filelog Receiver

The Filelog Receiver reads log files from disk, parses their content, and converts them into OpenTelemetry log records. It's part of the OpenTelemetry Collector Contrib distribution.

Why OpenTelemetry format?

OpenTelemetry provides a vendor-neutral standard for telemetry data. Instead of being locked into proprietary formats, your logs become portable:

  • Vendor-neutral: Send logs to any backend without changing instrumentation
  • Structured: Consistent format with timestamps, severity, attributes
  • Correlated: Link logs to distributed traces and metrics
  • Standardized: Part of CNCF OpenTelemetry specification

💡 Learn more: Read our OpenTelemetry Architecture guide to understand how components fit together.

Core capabilities:

  • Tails log files continuously
  • Handles log rotation automatically
  • Parses JSON, regex, and structured formats
  • Combines multiline entries (stack traces)
  • Checkpoints file positions across restarts
  • Enriches logs with metadata

How it works - the 4-step loop:

flowchart LR
  A[1. Discover Files] --> B[2. Read Lines]
  B --> C[3. Parse & Transform]
  C --> D[4. Emit Logs]
  D --> A
Enter fullscreen mode Exit fullscreen mode
  1. Discover: Scans filesystem using include/exclude patterns
  2. Read: Opens files and reads new lines based on start_at setting
  3. Parse: Applies operators to structure raw text
  4. Emit: Sends structured logs to the pipeline

Quick Start: Your First Log Collection

📚 Prerequisites: This guide assumes you have the OpenTelemetry Collector installed. Need help getting started? See our Collector installation guide.

💡 Backend Configuration: Examples in this guide use Uptrace as the
observability backend. OpenTelemetry is vendor-neutral - you can send logs to
Jaeger, Grafana Cloud, Elasticsearch, Splunk, or any OTLP-compatible platform.
See backend examples at the end of this guide.

Let's start simple. Imagine your app writes JSON logs to /var/log/myapp.log:

{"time":"2025-10-06 14:30:00","level":"ERROR","message":"Connection failed","user_id":"123"}
Enter fullscreen mode Exit fullscreen mode

Here's the minimal configuration:

receivers:
  filelog:
    include: [/var/log/myapp.log]
    start_at: beginning
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'
        severity:
          parse_from: attributes.level

exporters:
  otlp/uptrace:  # OTLP = OpenTelemetry Protocol
    endpoint: api.uptrace.dev:4317
    headers:
      uptrace-dsn: 'YOUR_UPTRACE_DSN_HERE'

service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [otlp/uptrace]
Enter fullscreen mode Exit fullscreen mode

💡 About OTLP: OpenTelemetry Protocol (OTLP) is the standard protocol for transmitting telemetry data. Learn more about configuring exporters.

What happens:

  • File is discovered and opened
  • Each line is parsed as JSON
  • time becomes the log timestamp
  • level becomes severity
  • All fields become attributes

Result: Structured logs sent to your OpenTelemetry APM.

Understanding Operators

Operators are the building blocks of log processing. Each operator does one thing well:

Raw Log → [Operator 1] → [Operator 2] → [Operator 3] → Structured Log
Enter fullscreen mode Exit fullscreen mode

Common Operators

JSON Parser - for structured JSON logs:

receivers:
  filelog:
    include: [/var/log/app/*.json]
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%fZ'
Enter fullscreen mode Exit fullscreen mode

Regex Parser - for text logs:

receivers:
  filelog:
    include: [/var/log/app.log]
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?P<level>\w+)\] (?P<message>.*)$'
        timestamp:
          parse_from: attributes.time
          layout: '2006-01-02 15:04:05'
        severity:
          parse_from: attributes.level
Enter fullscreen mode Exit fullscreen mode

Example log:

2025-10-06 14:30:00 [ERROR] Database connection timeout
Enter fullscreen mode Exit fullscreen mode

Add Attributes - enrich with metadata:

operators:
  - type: add
    field: attributes.environment
    value: production
  - type: add
    field: attributes.service.name
    value: payment-api
Enter fullscreen mode Exit fullscreen mode

Container Parser - for Kubernetes (simplified):

receivers:
  filelog:
    include: [/var/log/pods/*/*/*.log]
    include_file_path: true
    operators:
      - type: container  # Handles all container formats automatically
Enter fullscreen mode Exit fullscreen mode

🔗 Kubernetes Logs: For complete Kubernetes log collection, see our Kubernetes Log Management guide.

Automatically extracts:

  • Kubernetes namespace, pod, container names
  • Container runtime format (CRI-O, containerd, Docker)
  • stdout/stderr stream

Multiline Logs: Stack Traces

Stack traces span multiple lines. Without configuration, each line becomes a separate log:

2025-10-06 14:30:00 ERROR Exception occurred
    at com.example.App.main(App.java:42)
    at java.lang.Thread.run(Thread.java:829)
Enter fullscreen mode Exit fullscreen mode

Solution: Use multiline to combine them:

receivers:
  filelog:
    include: [/var/log/app/errors.log]
    multiline:
      line_start_pattern: '^\d{4}-\d{2}-\d{2}'  # New entry starts with date
    operators:
      - type: regex_parser
        regex: '^(?P<time>\S+) (?P<level>\w+) (?P<message>[\s\S]*)$'
Enter fullscreen mode Exit fullscreen mode

Now the entire stack trace is captured as one log record.

Common patterns:

# Java exceptions
line_start_pattern: '^(\d{4}-\d{2}-\d{2}|Exception|Caused by:)'

# Python tracebacks
line_start_pattern: '^(Traceback|\d{4}-\d{2}-\d{2})'

# Go panics
line_start_pattern: '^(panic:|\d{4}/\d{2}/\d{2})'
Enter fullscreen mode Exit fullscreen mode

Kubernetes Log Collection

🔗 Integration Tip: For automated deployment in Kubernetes, use the OpenTelemetry Operator which simplifies Collector management and auto-instrumentation.

Simple Method (Recommended)

Use the container operator for automatic parsing:

receivers:
  filelog:
    include: [/var/log/pods/*/*/*.log]
    include_file_path: true
    operators:
      - type: container

processors:
  batch:
    timeout: 10s

exporters:
  otlp/uptrace:
    endpoint: api.uptrace.dev:4317
    headers:
      uptrace-dsn: 'YOUR_UPTRACE_DSN_HERE'

service:
  pipelines:
    logs:
      receivers: [filelog]
      processors: [batch]
      exporters: [otlp/uptrace]
Enter fullscreen mode Exit fullscreen mode

This automatically:

  • Detects container runtime format
  • Parses timestamps and metadata
  • Handles partial log lines
  • Extracts Kubernetes metadata

DaemonSet Deployment

Deploy as DaemonSet to collect from all nodes:

📖 Full Kubernetes Setup: See our complete Kubernetes Monitoring with OpenTelemetry guide for cluster-wide observability.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:0.136.0
        volumeMounts:
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
Enter fullscreen mode Exit fullscreen mode

State Persistence: Never Lose Logs

Without persistence, restarting the Collector means:

  • start_at: beginning → Re-read everything (duplicates!)
  • start_at: end → Miss logs written during downtime

Solution: Checkpointing with storage extension:

extensions:
  file_storage:
    directory: /var/lib/otelcol/file_storage
    timeout: 10s

receivers:
  filelog:
    include: [/var/log/app.log]
    storage: file_storage  # Link to storage extension

service:
  extensions: [file_storage]
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [otlp/uptrace]
Enter fullscreen mode Exit fullscreen mode

What happens:

  • File positions saved to disk
  • On restart, reading resumes from last position
  • No duplicates, no data loss

Essential for production!

Real-World Examples

NGINX Access Logs

receivers:
  filelog/nginx:
    include: [/var/log/nginx/access.log]
    operators:
      - type: regex_parser
        regex: '^(?P<remote_addr>\S+) - (?P<remote_user>\S+) \[(?P<time>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) (?P<protocol>\S+)" (?P<status>\d{3}) (?P<bytes_sent>\d+)'
        timestamp:
          parse_from: attributes.time
          layout: '02/Jan/2006:15:04:05 -0700'
        severity:
          parse_from: attributes.status
          mapping:
            "401": WARN
            "5": ERROR
Enter fullscreen mode Exit fullscreen mode

Handles:

192.168.1.1 - - [06/Oct/2025:14:30:00 +0000] "GET /api/users HTTP/1.1" 200 1234
Enter fullscreen mode Exit fullscreen mode

PostgreSQL Logs

receivers:
  filelog/postgres:
    include: [/var/log/postgresql/*.log]
    multiline:
      line_start_pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'
    operators:
      - type: regex_parser
        regex: '^(?P<timestamp>\S+ \S+ \w+) \[(?P<pid>\d+)\] (?P<level>\w+):  (?P<message>[\s\S]*)$'
        timestamp:
          parse_from: attributes.timestamp
          layout: '2006-01-02 15:04:05.000 MST'
        severity:
          parse_from: attributes.level
Enter fullscreen mode Exit fullscreen mode

Application with Trace Context

receivers:
  filelog/app:
    include: [/var/log/app/*.json]
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%fZ'
        trace:
          trace_id:
            parse_from: attributes.trace_id
          span_id:
            parse_from: attributes.span_id
Enter fullscreen mode Exit fullscreen mode

Log example:

{
  "timestamp": "2025-10-06T14:30:00.123Z",
  "level": "error",
  "msg": "Payment failed",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "span_id": "00f067aa0ba902b7"
}
Enter fullscreen mode Exit fullscreen mode

Trace context links logs to distributed traces!

Troubleshooting

Logs Not Appearing

Problem: No logs show up in your backend.

Check 1: Permissions

# Verify collector can read files
ls -la /var/log/myapp.log
# Check collector process user
ps aux | grep otelcol
Enter fullscreen mode Exit fullscreen mode

Fix:

# Add collector user to log group
usermod -a -G adm otelcol
# Or adjust permissions
chmod 644 /var/log/myapp.log
Enter fullscreen mode Exit fullscreen mode

Check 2: start_at Setting

Default is end (only new logs). For testing, use:

start_at: beginning
Enter fullscreen mode Exit fullscreen mode

Check 3: File Glob Pattern

# Test your pattern
ls /var/log/pods/*/*/*.log
# Verify files exist
find /var/log -name "*.log"
Enter fullscreen mode Exit fullscreen mode

Regex Not Matching

Debug with test:

# Test pattern locally
echo "2025-10-06 14:30:00 ERROR Test" | grep -P '^\d{4}-\d{2}-\d{2}'
Enter fullscreen mode Exit fullscreen mode

Add debug exporter:

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]
Enter fullscreen mode Exit fullscreen mode

Check logs:

journalctl -u otelcol -f | grep -i error
Enter fullscreen mode Exit fullscreen mode

Multiline Not Working

Common mistake:

# Bad - pattern too specific
multiline:
  line_start_pattern: '^2025-10-06'

# Good - flexible pattern
multiline:
  line_start_pattern: '^\d{4}-\d{2}-\d{2}'
Enter fullscreen mode Exit fullscreen mode

High Memory Usage

Cause: Too many files or large entries.

Solutions:

  1. Exclude unnecessary files:
exclude:
  - /var/log/debug-*.log
  - /var/log/archive/*.log
Enter fullscreen mode Exit fullscreen mode
  1. Limit entry size:
receivers:
  filelog:
    max_log_size: 1MiB
Enter fullscreen mode Exit fullscreen mode
  1. Add memory limiter:
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
Enter fullscreen mode Exit fullscreen mode

Performance Optimization

Batching

Reduce network calls:

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

service:
  pipelines:
    logs:
      receivers: [filelog]
      processors: [batch]
      exporters: [otlp/uptrace]
Enter fullscreen mode Exit fullscreen mode

Guidelines:

  • Low latency: timeout: 1-5s
  • High volume: send_batch_size: 1024-2048
  • Balance both for production

Resource Limits

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128
  batch:
    timeout: 10s
Enter fullscreen mode Exit fullscreen mode

Important: Memory limiter must be first processor!

Polling Interval

receivers:
  filelog:
    poll_interval: 500ms  # Default: 200ms
Enter fullscreen mode Exit fullscreen mode
  • Lower: More CPU, faster detection
  • Higher: Less CPU, higher latency
  • Default (200ms) good for most cases

Advanced Features

Compressed Files

receivers:
  filelog:
    include: [/var/log/archive/*.log.gz]
    compression: auto  # or 'gzip'
Enter fullscreen mode Exit fullscreen mode

Header Metadata

receivers:
  filelog:
    include: [/var/log/app.log]
    start_at: beginning
    header:
      pattern: '^# METADATA:.*$'
      metadata_operators:
        - type: regex_parser
          regex: 'version="(?P<app_version>[^"]+)"'
Enter fullscreen mode Exit fullscreen mode

File example:

# METADATA: version="1.2.3" env="production"
2025-10-06 14:30:00 INFO App started
Enter fullscreen mode Exit fullscreen mode

Metadata added to all logs in the file.

Router Operator

Route logs based on content:

receivers:
  filelog:
    include: [/var/log/app.log]
    operators:
      - type: json_parser
      - type: router
        routes:
          - output: error_logs
            expr: 'attributes.level == "ERROR"'
          - output: default_logs
            expr: 'true'

      - id: error_logs
        type: add
        field: attributes.alert
        value: true

      - id: default_logs
        type: noop
Enter fullscreen mode Exit fullscreen mode

FAQ

  1. Why aren't my logs appearing?
    Possible causes include:

    • Collector can't read files or directories due to permission issues.
    • start_at: end only reads new logs (use beginning for testing).
    • The glob pattern may not match files specified in the include setting.
  2. How do I handle log rotation?
    Log rotation is handled automatically. The receiver tracks files by fingerprint, not filename.
    When app.log rotates to app.log.1, it finishes reading the old file and starts the new one.

  3. What's the difference between attributes and resource?

    • Attributes: Log-level metadata (varies per log)
    • Resource: Service-level metadata (shared across all logs from the same service)

    operators:

    • type: add field: attributes.request_id # Per-log value: "req-123"
    • type: add field: resource["service.name"] # Per-service value: "api"
  4. Can I parse logs from multiple services?
    Yes. Use multiple receivers:

   receivers:
     filelog/app1:
       include: [/var/log/app1/*.log]
       operators:
         - type: add
           field: attributes.service.name
           value: app1

     filelog/app2:
       include: [/var/log/app2/*.log]
       operators:
         - type: add
           field: attributes.service.name
           value: app2

   service:
     pipelines:
       logs:
         receivers: [filelog/app1, filelog/app2]
         exporters: [otlp/uptrace]
Enter fullscreen mode Exit fullscreen mode
  1. How do I test regex patterns?
    Use Regex101 with the "Golang" flavor to match the Collector's regex engine.

  2. What if I have mixed compressed and uncompressed files?

   receivers:
     filelog:
       include: [/var/log/app/*.log, /var/log/app/*.log.gz]
       compression: auto  # Auto-detects .gz files
Enter fullscreen mode Exit fullscreen mode
  1. How do I move attributes to resource?
   operators:
     - type: move
       from: attributes["file.name"]
       to: resource["log.file.name"]
Enter fullscreen mode Exit fullscreen mode
  1. Can I filter logs before sending? Yes. Use a filter processor:
   processors:
     filter:
       logs:
         exclude:
           match_type: regexp
           record_attributes:
             - key: level
               value: DEBUG
Enter fullscreen mode Exit fullscreen mode
  1. How do I parse key-value logs?
   operators:
     - type: regex_parser
       regex: '(\w+)=("[^"]*"|\S+)'
       parse_from: body
Enter fullscreen mode Exit fullscreen mode

Example: user=john status=200 duration=123ms

  1. What's the max file size limit?
    There’s no hard limit, but you can use max_log_size to prevent memory issues:

    receivers:
      filelog:
        max_log_size: 1MiB  # Per log entry
    

Backend Examples

This guide uses Uptrace in examples, but OpenTelemetry works with any OTLP-compatible backend. Here are quick configuration examples for other platforms:

Uptrace

exporters:
  otlp/uptrace:
    endpoint: api.uptrace.dev:4317
    headers:
      uptrace-dsn: 'YOUR_UPTRACE_DSN'
Enter fullscreen mode Exit fullscreen mode

Grafana Cloud

exporters:
  otlp:
    endpoint: otlp-gateway.grafana.net:443
    headers:
      authorization: "Bearer YOUR_GRAFANA_TOKEN"
Enter fullscreen mode Exit fullscreen mode

Jaeger

exporters:
  otlp:
    endpoint: jaeger-collector:4317
    tls:
      insecure: true
Enter fullscreen mode Exit fullscreen mode

Datadog

exporters:
  otlp:
    endpoint: trace.agent.datadoghq.com:4317
    headers:
      dd-api-key: "YOUR_DATADOG_API_KEY"
Enter fullscreen mode Exit fullscreen mode

New Relic

exporters:
  otlp:
    endpoint: otlp.nr-data.net:4317
    headers:
      api-key: "YOUR_NEW_RELIC_LICENSE_KEY"
Enter fullscreen mode Exit fullscreen mode

Prometheus (metrics only)

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
Enter fullscreen mode Exit fullscreen mode

Multiple Backends

Send data to multiple platforms simultaneously:

exporters:
  otlp/uptrace:
    endpoint: api.uptrace.dev:4317
    headers:
      uptrace-dsn: 'YOUR_UPTRACE_DSN'

  otlp/datadog:
    endpoint: trace.agent.datadoghq.com:4317
    headers:
      dd-api-key: "YOUR_DATADOG_API_KEY"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/uptrace, otlp/datadog]  # Send to both!
Enter fullscreen mode Exit fullscreen mode

More backends: See the OpenTelemetry Vendors directory for 40+ compatible platforms.

Next Steps

Top comments (0)