DEV Community

Ajit Kumar
Ajit Kumar

Posted on

Nginx Log Analytics with GoAccess: Local Logs, S3 Backups, and Beyond

Collecting logs is only half the job. The real value comes from turning logs into insights.

If you are running Nginx on a server, you already have everything you need to understand:

  • How many people visit your site
  • Which pages are popular
  • Where traffic comes from
  • How bots and crawlers behave
  • How much bandwidth you consume

In this article, we will explore four practical ways to analyze Nginx logs:

  1. GoAccess directly on the server
  2. GoAccess from S3-backed logs
  3. Third-party log analytics services
  4. Custom analytics using Python

1. What Is GoAccess?

GoAccess is a fast, open-source, terminal-based web log analyzer that can also generate HTML dashboards.

Key features:

  • Real-time and offline analysis
  • Works directly with Nginx access logs
  • No JavaScript tracking on users
  • Can visualize bots, visitors, requests, and bandwidth
  • Produces a single static HTML file

GoAccess reads raw log files and generates analytics — no instrumentation required.


2. Understanding Nginx Log Format (Important)

Before using GoAccess, confirm your log format.

Most Nginx installations use the combined format:

log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';
Enter fullscreen mode Exit fullscreen mode

This is critical because GoAccess must parse the format correctly.


3. Option 1 — GoAccess Directly from /var/log/nginx

This is the simplest and most common setup.

Installation

sudo apt install goaccess
Enter fullscreen mode Exit fullscreen mode

Verify:

goaccess --version
Enter fullscreen mode Exit fullscreen mode

Generate an HTML Report

sudo goaccess /var/log/nginx/access.log \
  --log-format=COMBINED \
  --date-format=%d/%b/%Y \
  --time-format=%H:%M:%S \
  -o /var/www/html/nginx_report.html
Enter fullscreen mode Exit fullscreen mode

Open in browser:

http://your-domain/nginx_report.html
Enter fullscreen mode Exit fullscreen mode

Advantages

  • Fast
  • No data movement
  • Great for quick diagnostics

Limitations

  • Only current log
  • Loses history after rotation
  • Not suitable for long-term analysis

4. Option 2 — GoAccess from Rotated Logs

To analyze historical data, include rotated logs:

sudo zcat /var/log/nginx/access.log*.gz /var/log/nginx/access.log | \
goaccess --log-format=COMBINED \
  --date-format=%d/%b/%Y \
  --time-format=%H:%M:%S \
  -o /var/www/html/nginx_full_report.html
Enter fullscreen mode Exit fullscreen mode

This gives you multi-day analytics.


5. Option 3 — GoAccess from S3 Backups (Recommended for Production)

If logs are backed up to S3, you can analyze them without touching the server.

Step 1: Download logs from S3

aws s3 sync s3://ngnix-logs/16-12-2025/ip-172-31-44-115/ ./logs/
Enter fullscreen mode Exit fullscreen mode

Step 2: Analyze with GoAccess

zcat logs/access.log*.gz | \
goaccess --log-format=COMBINED \
  --date-format=%d/%b/%Y \
  --time-format=%H:%M:%S \
  -o nginx_report_dec_2025.html
Enter fullscreen mode Exit fullscreen mode

This can be done:

  • On your laptop
  • On a CI server
  • On a reporting EC2 instance

Why This Is Powerful

  • Zero load on production server
  • Unlimited historical analysis
  • Easy monthly or yearly reports
  • Cheap (S3 storage is low-cost)

6. Handling Bots and Crawlers in GoAccess

GoAccess automatically detects known bots.

You can:

  • View bots and humans together (default)
  • Or exclude bots:
--ignore-crawlers
Enter fullscreen mode Exit fullscreen mode

For transparency, do not ignore crawlers unless you explicitly need human-only stats.


7. Option 4 — Third-Party Log Analytics Services

If you do not want to manage infrastructure, consider hosted solutions.

Popular Options

  • Datadog
  • ELK Stack (Elastic Cloud)
  • Splunk
  • Sumo Logic
  • CloudWatch Logs + Insights

How They Work

  1. Logs are shipped via agent or pipeline
  2. Data is indexed
  3. Queries and dashboards are built

Pros

  • Powerful querying
  • Alerting
  • Correlation across services

Cons

  • Cost
  • Vendor lock-in
  • Complexity for small projects

For early-stage or low-traffic projects, GoAccess is often sufficient.


8. Option 5 — Python-Based Log Analytics

If you want custom analytics, Python is a great choice.

Example: Counting Page Views per URL

import gzip
from collections import Counter

counter = Counter()

with gzip.open("access.log.2.gz", "rt") as f:
    for line in f:
        parts = line.split('"')
        if len(parts) > 1:
            request = parts[1]
            path = request.split(" ")[1]
            counter[path] += 1

for path, count in counter.most_common(10):
    print(path, count)
Enter fullscreen mode Exit fullscreen mode

When Python Makes Sense

  • Custom metrics
  • Feeding data to ML models
  • Exporting to CSV or databases
  • Advanced filtering

When It Doesn’t

  • Real-time dashboards
  • Visualization (GoAccess is better)

9. Automating GoAccess Reports

You can generate daily or monthly reports using cron.

Example (monthly):

0 1 1 * * zcat /var/log/nginx/access.log*.gz | goaccess \
  --log-format=COMBINED \
  --date-format=%d/%b/%Y \
  --time-format=%H:%M:%S \
  -o /var/www/html/nginx_report_last_month.html
Enter fullscreen mode Exit fullscreen mode

10. Verifying Analytics Accuracy

Always sanity-check:

  • Total requests vs AWS data transfer
  • Page size × visits ≈ bandwidth
  • Bot traffic vs real users
  • Status code distribution

Logs never lie — dashboards sometimes do.


11. Choosing the Right Approach

Scenario Best Option
Small server GoAccess local
Historical analysis GoAccess + rotated logs
Production GoAccess from S3
Enterprise Third-party tools
Custom logic Python

Final Thoughts

Analytics does not require invasive tracking scripts.

Your server already knows:

  • Who visited
  • What they requested
  • How much data you served

By combining:

  • Proper log rotation
  • Reliable backups
  • GoAccess analytics

…you gain visibility, cost awareness, and operational confidence with minimal complexity.

Logs are not just records.
They are observability without guesswork.

Top comments (0)