DEV Community

Ajit Kumar
Ajit Kumar

Posted on

Building an NGINX Log Discovery Tool in Rust - A Beginner's Journey

Building an NGINX Log Discovery Tool in Rust ๐Ÿฆ€

Have you ever wanted to automate the discovery of NGINX log files on your servers? In this tutorial, we'll build a practical CLI tool in Rust that automatically finds, parses, and helps you configure NGINX log analysis. Along the way, you'll learn essential Rust concepts, popular crates, and NGINX configuration parsing techniques.

๐ŸŽฏ What We're Building

Our tool will:

  • โœ… Detect NGINX installation
  • โœ… Extract the complete NGINX configuration
  • โœ… Parse log_format and access_log directives using regex
  • โœ… Present an interactive selection menu
  • โœ… Test log file accessibility
  • โœ… Generate a YAML configuration file

Final Result:

$ sudo ./discover_nginx_rust

๐Ÿ” NGINX Discovery Tool (Ubuntu)

โœ” Found nginx at /usr/sbin/nginx
โœ” Loaded full nginx configuration (10078 bytes)
โ„น No custom log_format directives found (will use nginx defaults)
โœ” Found 1 access_log directive(s)

๐Ÿ“„ Detected log sources:

[1] /var/log/nginx/access.log
    Format: combined ($remote_addr - $remote_user [$time_local]...)

๐Ÿ’ก Use SPACE to select, ENTER to confirm

Select logs to analyze: โœ” /var/log/nginx/access.log
โœ… Discovery completed successfully.
Enter fullscreen mode Exit fullscreen mode

๐Ÿ› ๏ธ Prerequisites

  • Basic Rust knowledge (variables, functions, structs)
  • Ubuntu/Debian system with NGINX installed
  • Rust toolchain installed (rustup)

๐Ÿ“ฆ Project Setup

cargo new discover_nginx_rust
cd discover_nginx_rust
Enter fullscreen mode Exit fullscreen mode

Cargo.toml:

[package]
name = "discover_nginx_rust"
version = "0.1.0"
edition = "2021"

[dependencies]
anyhow = "1.0"      # Easy error handling
dialoguer = "0.11"  # Interactive CLI prompts
which = "6.0"       # Find executables in PATH
regex = "1.10"      # Regular expressions
chrono = "0.4"      # Date/time handling
Enter fullscreen mode Exit fullscreen mode

๐Ÿ—๏ธ Architecture Overview

Our tool has four main components:

  1. System Detection - Find NGINX binary and dump config
  2. Configuration Parsing - Extract directives using regex
  3. Interactive Selection - Let users choose logs to analyze
  4. File Testing & Config Generation - Validate and output results

Let's build each part step by step!


Part 1: Core Data Structures ๐Ÿ“Š

First, let's define the data types we'll work with:

use anyhow::{Context, Result};
use dialoguer::{Confirm, MultiSelect};
use regex::Regex;
use std::collections::HashMap;
use std::fs;
use std::path::PathBuf;
use std::process::Command;
use which::which;

#[derive(Debug, Clone)]
struct LogSource {
    path: String,
    format_name: String,
    format: String,
}

#[derive(Debug, Clone)]
struct LogFormat {
    name: String,
    format: String,
}

#[derive(Debug, Clone)]
struct AccessLog {
    path: String,
    format_name: Option<String>,
}
Enter fullscreen mode Exit fullscreen mode

Key Rust Concepts:

  • #[derive(Debug, Clone)]: Automatically implement common traits
  • Option<String>: Represents a value that might not exist (like format_name)
  • String vs &str: Owned strings vs string references

Why these types?

  • LogFormat: Represents an NGINX log_format directive
  • AccessLog: Represents an NGINX access_log directive
  • LogSource: Combines both - the final log file with its format

Part 2: System Detection ๐Ÿ”

Finding NGINX

fn detect_nginx() -> Result<PathBuf> {
    which("nginx").context("nginx not found in PATH")
}
Enter fullscreen mode Exit fullscreen mode

What's happening:

  • which(): Searches for executables in system PATH (like bash's which)
  • PathBuf: An owned, mutable path (like String for file paths)
  • .context(): From anyhow crate - adds error context
  • Result<T>: Either Ok(T) or Err(error) - Rust's error handling

Dumping NGINX Configuration

fn dump_nginx_config() -> Result<String> {
    let out = Command::new("sudo")
        .arg("nginx")
        .arg("-T")
        .output()
        .context("Failed to run sudo nginx -T")?;

    if !out.status.success() {
        let stderr = String::from_utf8_lossy(&out.stderr);
        anyhow::bail!("nginx -T failed: {}", stderr);
    }

    let text = String::from_utf8_lossy(&out.stdout).to_string();
    Ok(text)
}
Enter fullscreen mode Exit fullscreen mode

Rust Concepts Explained:

  1. Command::new("sudo"): Create a new process

    • .arg(): Add command arguments
    • .output(): Run and capture stdout/stderr
  2. The ? operator:

   .output().context("Failed")?;
   // If error, return early with context
   // If ok, unwrap the value
Enter fullscreen mode Exit fullscreen mode
  1. String::from_utf8_lossy():

    • Converts bytes (&[u8]) to String
    • "Lossy" = replaces invalid UTF-8 with ๏ฟฝ
    • Safe for potentially corrupted data
  2. anyhow::bail!():

    • Early return with error message
    • Cleaner than return Err(anyhow::anyhow!("message"))

NGINX -T flag: Dumps the complete configuration including all include files - perfect for our needs!


Part 3: Parsing with Regex ๐ŸŽฏ

This is where the magic happens! We'll use regex to extract NGINX directives.

Understanding NGINX log_format Syntax

NGINX log formats look like:

log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

log_format simple '$remote_addr $request';
Enter fullscreen mode Exit fullscreen mode

Key patterns:

  • Starts with log_format
  • Followed by format name
  • Format string (quoted or unquoted)
  • Can span multiple lines
  • Ends with ;

Parsing Log Formats

fn parse_log_formats(config: &str) -> Result<Vec<LogFormat>> {
    // (?s) enables DOTALL mode - . matches newlines
    // \s+ matches one or more whitespace
    // (\w+) captures the format name
    // (.*?) non-greedy capture until semicolon
    let re = Regex::new(
        r"(?s)log_format\s+(\w+)\s+(.*?);"
    )?;

    let mut formats = Vec::new();

    for cap in re.captures_iter(config) {
        let name = cap[1].to_string();
        let format_raw = cap[2].trim();

        // Clean up whitespace and newlines
        let format_cleaned = format_raw
            .lines()
            .map(|line| line.trim())
            .collect::<Vec<_>>()
            .join(" ");

        // Remove quotes if present
        let format = remove_quotes(&format_cleaned);

        formats.push(LogFormat {
            name,
            format: format.to_string(),
        });
    }

    Ok(formats)
}
Enter fullscreen mode Exit fullscreen mode

Regex Breakdown:

Pattern Meaning
(?s) DOTALL flag - . matches newlines
log_format Literal text
\s+ One or more whitespace characters
(\w+) Capture group: word characters (format name)
(.*?) Non-greedy capture (format string)
; Semicolon terminator

Rust Iterator Patterns:

format_raw
    .lines()                    // Split into lines
    .map(|line| line.trim())   // Transform each line
    .collect::<Vec<_>>()       // Collect into vector
    .join(" ")                 // Join with spaces
Enter fullscreen mode Exit fullscreen mode

This is functional programming in Rust - transforming data through a pipeline!

Parsing Access Logs

fn parse_access_logs(config: &str) -> Result<Vec<AccessLog>> {
    // Matches: access_log /path/to/file [format] [options];
    let re = Regex::new(
        r"(?s)access_log\s+([^\s;]+)(?:\s+([^\s;]+))?[^;]*;"
    )?;

    let mut logs = Vec::new();

    for cap in re.captures_iter(config) {
        let path = cap[1].trim().to_string();

        // Skip disabled logs
        if path == "off" {
            continue;
        }

        // Skip syslog destinations
        if path.starts_with("syslog:") {
            eprintln!("โš  Skipping syslog destination: {}", path);
            continue;
        }

        let format_name = cap.get(2).map(|m| {
            let name = m.as_str().trim();
            // Filter out option keywords
            if name.contains('=') || name == "buffer" || 
               name == "gzip" || name == "flush" {
                return None;
            }
            Some(name.to_string())
        }).flatten();

        logs.push(AccessLog {
            path,
            format_name,
        });
    }

    Ok(logs)
}
Enter fullscreen mode Exit fullscreen mode

New Regex Patterns:

Pattern Meaning
[^\s;]+ Match anything except whitespace or ;
(?:...) Non-capturing group
? after group Makes the group optional

Working with Option:

cap.get(2)                    // Option<Match>
    .map(|m| { ... })        // Transform if Some
    .flatten()               // Option<Option<T>> -> Option<T>
Enter fullscreen mode Exit fullscreen mode

Helper: Removing Quotes

fn remove_quotes(s: &str) -> &str {
    let s = s.trim();

    // Remove single quotes
    if s.starts_with('\'') && s.ends_with('\'') && s.len() >= 2 {
        return &s[1..s.len()-1];
    }

    // Remove double quotes
    if s.starts_with('"') && s.ends_with('"') && s.len() >= 2 {
        return &s[1..s.len()-1];
    }

    s
}
Enter fullscreen mode Exit fullscreen mode

String Slicing in Rust:

  • &s[1..s.len()-1]: Slice from index 1 to second-to-last
  • Returns a string slice (&str), not a new String
  • Zero-copy operation - very efficient!

Part 4: Resolving Log Sources ๐Ÿ”—

Now we combine formats and logs:

fn resolve_sources(
    formats: Vec<LogFormat>, 
    logs: Vec<AccessLog>
) -> Vec<LogSource> {
    // Build format lookup map
    let mut fmt_map: HashMap<String, String> = HashMap::new();
    for fmt in formats {
        fmt_map.insert(fmt.name, fmt.format);
    }

    // Default nginx combined format
    let default_fmt = fmt_map
        .get("combined")
        .cloned()
        .unwrap_or_else(|| {
            r#"$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent""#.to_string()
        });

    let mut sources: Vec<LogSource> = logs
        .into_iter()
        .map(|log| {
            let format_name = log.format_name
                .clone()
                .unwrap_or_else(|| "combined".to_string());
            let format = fmt_map
                .get(&format_name)
                .cloned()
                .unwrap_or_else(|| default_fmt.clone());

            LogSource {
                path: log.path,
                format_name,
                format,
            }
        })
        .collect();

    // Deduplicate by path
    sources.sort_by(|a, b| a.path.cmp(&b.path));
    sources.dedup_by(|a, b| a.path == b.path);

    sources
}
Enter fullscreen mode Exit fullscreen mode

Rust Collections:

  1. HashMap: Key-value store (like a dictionary)
   let mut map = HashMap::new();
   map.insert(key, value);
   let val = map.get(&key);  // Returns Option<&V>
Enter fullscreen mode Exit fullscreen mode
  1. Iterator Patterns:
   .into_iter()              // Consume the vector
   .map(|item| transform)    // Transform each item
   .collect()                // Collect into Vec
Enter fullscreen mode Exit fullscreen mode
  1. Deduplication:
   .sort_by()       // Sort for grouping duplicates
   .dedup_by()      // Remove consecutive duplicates
Enter fullscreen mode Exit fullscreen mode

unwrap_or_else() vs unwrap_or():

// unwrap_or() evaluates immediately
map.get(key).unwrap_or(expensive_default())  // Always calls

// unwrap_or_else() is lazy
map.get(key).unwrap_or_else(|| expensive_default())  // Only if None
Enter fullscreen mode Exit fullscreen mode

Part 5: Interactive Selection ๐ŸŽฎ

Let's use the dialoguer crate for a nice CLI experience:

fn main() -> Result<()> {
    println!("\n๐Ÿ” NGINX Discovery Tool (Ubuntu)\n");

    let nginx_path = detect_nginx()?;
    println!("โœ” Found nginx at {}", nginx_path.display());

    let cfg_text = dump_nginx_config()?;
    println!("โœ” Loaded full nginx configuration ({} bytes)", cfg_text.len());

    let formats = parse_log_formats(&cfg_text)?;
    if formats.is_empty() {
        println!("โ„น No custom log_format directives found (will use nginx defaults)");
    } else {
        println!("โœ” Found {} custom log format(s)", formats.len());
        for fmt in &formats {
            println!("  โ€ข {}", fmt.name);
        }
    }

    let logs = parse_access_logs(&cfg_text)?;
    println!("โœ” Found {} access_log directive(s)", logs.len());

    if logs.is_empty() {
        println!("โŒ No access_log directives found.");
        return Ok(());
    }

    let sources = resolve_sources(formats, logs);

    if sources.is_empty() {
        println!("โŒ No valid log sources after filtering.");
        return Ok(());
    }

    println!("\n๐Ÿ“„ Detected log sources:\n");
    for (i, s) in sources.iter().enumerate() {
        let format_preview = if s.format.len() > 60 {
            format!("{}...", &s.format[..60])
        } else {
            s.format.clone()
        };
        println!("[{}] {}", i + 1, s.path);
        println!("    Format: {} ({})", s.format_name, format_preview);
    }

    println!("\n๐Ÿ’ก Use SPACE to select, ENTER to confirm\n");

    let selections = MultiSelect::new()
        .with_prompt("Select logs to analyze")
        .items(&sources.iter().map(|s| s.path.as_str()).collect::<Vec<_>>())
        .interact()?;

    if selections.is_empty() {
        println!("No logs selected. Exiting.");
        return Ok(());
    }

    println!("\n๐Ÿงช Testing selected logs...\n");
    for idx in &selections {
        let src = &sources[*idx];
        test_log_file(src)?;
    }

    check_logrotate()?;

    if Confirm::new()
        .with_prompt("Generate a starter config file?")
        .default(true)
        .interact()?
    {
        generate_config(&sources, &selections)?;
    }

    println!("\nโœ… Discovery completed successfully.\n");
    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Error Propagation with ?:

Every function that can fail returns Result<T>. The ? operator:

let value = risky_function()?;
// Equivalent to:
let value = match risky_function() {
    Ok(v) => v,
    Err(e) => return Err(e),
};
Enter fullscreen mode Exit fullscreen mode

dialoguer Crate:

  • MultiSelect: Checkbox list
  • Confirm: Yes/no prompt
  • .interact()?: Show prompt and wait for input

Part 6: File Testing & Validation โœ…

fn test_log_file(src: &LogSource) -> Result<()> {
    println!("โ–ถ Testing {}", src.path);

    match fs::read_to_string(&src.path) {
        Ok(content) => {
            let line_count = content.lines().count();
            if let Some(line) = content.lines().next() {
                println!("   Lines: {}", line_count);
                println!("   Sample: {}", &line[..line.len().min(100)]);
                println!("   โœ” Read OK\n");
            } else {
                println!("   โš  File is empty\n");
            }
        }
        Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
            println!("   โš  File not found (may not exist yet)\n");
        }
        Err(e) if e.kind() == std::io::ErrorKind::PermissionDenied => {
            println!("   โš  Permission denied (try running with sudo)\n");
        }
        Err(e) => {
            return Err(e).with_context(|| format!("Failed to read {}", src.path));
        }
    }

    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Pattern Matching on Errors:

Rust's match is powerful for error handling:

match fs::read_to_string(&path) {
    Ok(content) => { /* success */ },
    Err(e) if e.kind() == ErrorKind::NotFound => { /* specific error */ },
    Err(e) => { /* catch all */ }
}
Enter fullscreen mode Exit fullscreen mode

Guard Patterns (if in match):

  • Err(e) if e.kind() == ... matches only if condition is true
  • Allows fine-grained error handling

String Slicing Safety:

&line[..line.len().min(100)]
// Takes minimum of line length and 100
// Prevents panic if line < 100 chars
Enter fullscreen mode Exit fullscreen mode

Checking System Configuration

fn check_logrotate() -> Result<()> {
    println!("\n๐Ÿ”„ Checking logrotate configuration...\n");

    for p in ["/etc/logrotate.d/nginx", "/etc/logrotate.conf"] {
        if fs::metadata(p).is_ok() {
            println!("โœ” Found {}", p);
        }
    }
    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Rust Array Iteration:

  • ["/path1", "/path2"]: Array literal
  • for p in array: Direct iteration (no .iter() needed for arrays in loops)

Part 7: Config Generation ๐Ÿ“

Finally, let's generate a YAML config file:

fn generate_config(
    sources: &[LogSource], 
    selections: &[usize]
) -> Result<()> {
    let mut out = String::new();
    out.push_str("# Generated by NGINX Discovery Tool\n");
    out.push_str(&format!(
        "# Generated: {}\n\n", 
        chrono::Local::now().format("%Y-%m-%d %H:%M:%S")
    ));
    out.push_str("server:\n  type: nginx\n\npipelines:\n");

    for idx in selections {
        let s = &sources[*idx];
        // Escape single quotes for YAML
        let escaped_format = s.format.replace('\'', "''");
        out.push_str(&format!(
            "  - name: {}\n    input: {}\n    format: '{}'\n",
            s.format_name, s.path, escaped_format
        ));
        out.push_str("    # Add additional processing here\n\n");
    }

    let filename = "logcraft.generated.yaml";
    fs::write(filename, out)?;
    println!("๐Ÿ“ Config written to {}", filename);
    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

String Building in Rust:

  1. String::new(): Create empty mutable string
  2. .push_str(): Append string slice
  3. format!() macro: Like printf - returns String
  4. replace(): String replacement (creates new String)

Borrowing and Slices:

fn generate_config(sources: &[LogSource], ...)
//                          ^^^^^^^^^^^^ slice reference
// Can accept Vec<LogSource>, &Vec<LogSource>, or &[LogSource]
// More flexible than &Vec<T>
Enter fullscreen mode Exit fullscreen mode

๐Ÿš€ Building and Running

# Build release version (optimized)
cargo build --release

# Run with sudo (needed for nginx -T and log file access)
sudo ./target/release/discover_nginx_rust
Enter fullscreen mode Exit fullscreen mode

Output:

๐Ÿ” NGINX Discovery Tool (Ubuntu)

โœ” Found nginx at /usr/sbin/nginx
โœ” Loaded full nginx configuration (10078 bytes)
โ„น No custom log_format directives found (will use nginx defaults)
โœ” Found 1 access_log directive(s)

๐Ÿ“„ Detected log sources:

[1] /var/log/nginx/access.log
    Format: combined ($remote_addr - $remote_user [$time_local]...)

๐Ÿ’ก Use SPACE to select, ENTER to confirm

Select logs to analyze: โœ” /var/log/nginx/access.log

๐Ÿงช Testing selected logs...

โ–ถ Testing /var/log/nginx/access.log
   Lines: 1247
   Sample: 192.168.1.100 - - [19/Jan/2026:10:30:15 +0000] "GET / HTTP/1.1" 200 612...
   โœ” Read OK

๐Ÿ”„ Checking logrotate configuration...

โœ” Found /etc/logrotate.d/nginx

Generate a starter config file? Yes
๐Ÿ“ Config written to logcraft.generated.yaml

โœ… Discovery completed successfully.
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ“ Key Rust Concepts Learned

1. Error Handling

Result<T, E>           // Return type for fallible operations
?                      // Propagate errors
.context("msg")        // Add error context
anyhow::bail!()        // Early return with error
Enter fullscreen mode Exit fullscreen mode

2. Ownership & Borrowing

String vs &str         // Owned vs borrowed
&[T]                   // Slice reference
.clone()               // Explicit copying
into_iter()            // Consume ownership
Enter fullscreen mode Exit fullscreen mode

3. Pattern Matching

match value {
    Ok(v) => ...,
    Err(e) if condition => ...,
    _ => ...
}
if let Some(x) = option { ... }
Enter fullscreen mode Exit fullscreen mode

4. Iterators

.iter()                // Borrow each element
.into_iter()           // Consume collection
.map()                 // Transform
.filter()              // Select
.collect()             // Build collection
Enter fullscreen mode Exit fullscreen mode

5. Options & Results

Option<T>              // Some(T) or None
Result<T, E>           // Ok(T) or Err(E)
.unwrap_or_else()      // Lazy default
.map()                 // Transform inner value
.flatten()             // Collapse nested Options
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“š Crates We Used

Crate Purpose Key Features
anyhow Error handling Context, easy error types
dialoguer CLI interaction Prompts, selections
which Find executables Cross-platform PATH search
regex Pattern matching NGINX config parsing
chrono Date/time Timestamps

๐Ÿ”ฎ What's Next?

In Part 2 of this series, we'll build a proper parser library:

  • Hand-written lexer and parser
  • Full AST (Abstract Syntax Tree)
  • Support for all NGINX directives
  • Publish to crates.io

Stay tuned!


๐Ÿ”— Resources


๐Ÿ’ฌ Discussion

What would you like to see in Part 2? Comment below!

  • Full NGINX config validation?
  • Config transformation tools?
  • LSP (Language Server Protocol) support?
  • Something else?

Found this helpful? Give it a โค๏ธ and follow for Part 2!

rust #nginx #devops #tutorial #systems-programming


Top comments (0)