Building an NGINX Log Discovery Tool in Rust ๐ฆ
Have you ever wanted to automate the discovery of NGINX log files on your servers? In this tutorial, we'll build a practical CLI tool in Rust that automatically finds, parses, and helps you configure NGINX log analysis. Along the way, you'll learn essential Rust concepts, popular crates, and NGINX configuration parsing techniques.
๐ฏ What We're Building
Our tool will:
- โ Detect NGINX installation
- โ Extract the complete NGINX configuration
- โ
Parse
log_formatandaccess_logdirectives using regex - โ Present an interactive selection menu
- โ Test log file accessibility
- โ Generate a YAML configuration file
Final Result:
$ sudo ./discover_nginx_rust
๐ NGINX Discovery Tool (Ubuntu)
โ Found nginx at /usr/sbin/nginx
โ Loaded full nginx configuration (10078 bytes)
โน No custom log_format directives found (will use nginx defaults)
โ Found 1 access_log directive(s)
๐ Detected log sources:
[1] /var/log/nginx/access.log
Format: combined ($remote_addr - $remote_user [$time_local]...)
๐ก Use SPACE to select, ENTER to confirm
Select logs to analyze: โ /var/log/nginx/access.log
โ
Discovery completed successfully.
๐ ๏ธ Prerequisites
- Basic Rust knowledge (variables, functions, structs)
- Ubuntu/Debian system with NGINX installed
- Rust toolchain installed (
rustup)
๐ฆ Project Setup
cargo new discover_nginx_rust
cd discover_nginx_rust
Cargo.toml:
[package]
name = "discover_nginx_rust"
version = "0.1.0"
edition = "2021"
[dependencies]
anyhow = "1.0" # Easy error handling
dialoguer = "0.11" # Interactive CLI prompts
which = "6.0" # Find executables in PATH
regex = "1.10" # Regular expressions
chrono = "0.4" # Date/time handling
๐๏ธ Architecture Overview
Our tool has four main components:
- System Detection - Find NGINX binary and dump config
- Configuration Parsing - Extract directives using regex
- Interactive Selection - Let users choose logs to analyze
- File Testing & Config Generation - Validate and output results
Let's build each part step by step!
Part 1: Core Data Structures ๐
First, let's define the data types we'll work with:
use anyhow::{Context, Result};
use dialoguer::{Confirm, MultiSelect};
use regex::Regex;
use std::collections::HashMap;
use std::fs;
use std::path::PathBuf;
use std::process::Command;
use which::which;
#[derive(Debug, Clone)]
struct LogSource {
path: String,
format_name: String,
format: String,
}
#[derive(Debug, Clone)]
struct LogFormat {
name: String,
format: String,
}
#[derive(Debug, Clone)]
struct AccessLog {
path: String,
format_name: Option<String>,
}
Key Rust Concepts:
-
#[derive(Debug, Clone)]: Automatically implement common traits -
Option<String>: Represents a value that might not exist (likeformat_name) -
Stringvs&str: Owned strings vs string references
Why these types?
-
LogFormat: Represents an NGINXlog_formatdirective -
AccessLog: Represents an NGINXaccess_logdirective -
LogSource: Combines both - the final log file with its format
Part 2: System Detection ๐
Finding NGINX
fn detect_nginx() -> Result<PathBuf> {
which("nginx").context("nginx not found in PATH")
}
What's happening:
-
which(): Searches for executables in system PATH (like bash'swhich) -
PathBuf: An owned, mutable path (likeStringfor file paths) -
.context(): Fromanyhowcrate - adds error context -
Result<T>: EitherOk(T)orErr(error)- Rust's error handling
Dumping NGINX Configuration
fn dump_nginx_config() -> Result<String> {
let out = Command::new("sudo")
.arg("nginx")
.arg("-T")
.output()
.context("Failed to run sudo nginx -T")?;
if !out.status.success() {
let stderr = String::from_utf8_lossy(&out.stderr);
anyhow::bail!("nginx -T failed: {}", stderr);
}
let text = String::from_utf8_lossy(&out.stdout).to_string();
Ok(text)
}
Rust Concepts Explained:
-
Command::new("sudo"): Create a new process-
.arg(): Add command arguments -
.output(): Run and capture stdout/stderr
-
The
?operator:
.output().context("Failed")?;
// If error, return early with context
// If ok, unwrap the value
-
String::from_utf8_lossy():- Converts bytes (
&[u8]) to String - "Lossy" = replaces invalid UTF-8 with ๏ฟฝ
- Safe for potentially corrupted data
- Converts bytes (
-
anyhow::bail!():- Early return with error message
- Cleaner than
return Err(anyhow::anyhow!("message"))
NGINX -T flag: Dumps the complete configuration including all include files - perfect for our needs!
Part 3: Parsing with Regex ๐ฏ
This is where the magic happens! We'll use regex to extract NGINX directives.
Understanding NGINX log_format Syntax
NGINX log formats look like:
log_format combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
log_format simple '$remote_addr $request';
Key patterns:
- Starts with
log_format - Followed by format name
- Format string (quoted or unquoted)
- Can span multiple lines
- Ends with
;
Parsing Log Formats
fn parse_log_formats(config: &str) -> Result<Vec<LogFormat>> {
// (?s) enables DOTALL mode - . matches newlines
// \s+ matches one or more whitespace
// (\w+) captures the format name
// (.*?) non-greedy capture until semicolon
let re = Regex::new(
r"(?s)log_format\s+(\w+)\s+(.*?);"
)?;
let mut formats = Vec::new();
for cap in re.captures_iter(config) {
let name = cap[1].to_string();
let format_raw = cap[2].trim();
// Clean up whitespace and newlines
let format_cleaned = format_raw
.lines()
.map(|line| line.trim())
.collect::<Vec<_>>()
.join(" ");
// Remove quotes if present
let format = remove_quotes(&format_cleaned);
formats.push(LogFormat {
name,
format: format.to_string(),
});
}
Ok(formats)
}
Regex Breakdown:
| Pattern | Meaning |
|---|---|
(?s) |
DOTALL flag - . matches newlines |
log_format |
Literal text |
\s+ |
One or more whitespace characters |
(\w+) |
Capture group: word characters (format name) |
(.*?) |
Non-greedy capture (format string) |
; |
Semicolon terminator |
Rust Iterator Patterns:
format_raw
.lines() // Split into lines
.map(|line| line.trim()) // Transform each line
.collect::<Vec<_>>() // Collect into vector
.join(" ") // Join with spaces
This is functional programming in Rust - transforming data through a pipeline!
Parsing Access Logs
fn parse_access_logs(config: &str) -> Result<Vec<AccessLog>> {
// Matches: access_log /path/to/file [format] [options];
let re = Regex::new(
r"(?s)access_log\s+([^\s;]+)(?:\s+([^\s;]+))?[^;]*;"
)?;
let mut logs = Vec::new();
for cap in re.captures_iter(config) {
let path = cap[1].trim().to_string();
// Skip disabled logs
if path == "off" {
continue;
}
// Skip syslog destinations
if path.starts_with("syslog:") {
eprintln!("โ Skipping syslog destination: {}", path);
continue;
}
let format_name = cap.get(2).map(|m| {
let name = m.as_str().trim();
// Filter out option keywords
if name.contains('=') || name == "buffer" ||
name == "gzip" || name == "flush" {
return None;
}
Some(name.to_string())
}).flatten();
logs.push(AccessLog {
path,
format_name,
});
}
Ok(logs)
}
New Regex Patterns:
| Pattern | Meaning |
|---|---|
[^\s;]+ |
Match anything except whitespace or ;
|
(?:...) |
Non-capturing group |
? after group |
Makes the group optional |
Working with Option:
cap.get(2) // Option<Match>
.map(|m| { ... }) // Transform if Some
.flatten() // Option<Option<T>> -> Option<T>
Helper: Removing Quotes
fn remove_quotes(s: &str) -> &str {
let s = s.trim();
// Remove single quotes
if s.starts_with('\'') && s.ends_with('\'') && s.len() >= 2 {
return &s[1..s.len()-1];
}
// Remove double quotes
if s.starts_with('"') && s.ends_with('"') && s.len() >= 2 {
return &s[1..s.len()-1];
}
s
}
String Slicing in Rust:
-
&s[1..s.len()-1]: Slice from index 1 to second-to-last - Returns a string slice (
&str), not a new String - Zero-copy operation - very efficient!
Part 4: Resolving Log Sources ๐
Now we combine formats and logs:
fn resolve_sources(
formats: Vec<LogFormat>,
logs: Vec<AccessLog>
) -> Vec<LogSource> {
// Build format lookup map
let mut fmt_map: HashMap<String, String> = HashMap::new();
for fmt in formats {
fmt_map.insert(fmt.name, fmt.format);
}
// Default nginx combined format
let default_fmt = fmt_map
.get("combined")
.cloned()
.unwrap_or_else(|| {
r#"$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent""#.to_string()
});
let mut sources: Vec<LogSource> = logs
.into_iter()
.map(|log| {
let format_name = log.format_name
.clone()
.unwrap_or_else(|| "combined".to_string());
let format = fmt_map
.get(&format_name)
.cloned()
.unwrap_or_else(|| default_fmt.clone());
LogSource {
path: log.path,
format_name,
format,
}
})
.collect();
// Deduplicate by path
sources.sort_by(|a, b| a.path.cmp(&b.path));
sources.dedup_by(|a, b| a.path == b.path);
sources
}
Rust Collections:
- HashMap: Key-value store (like a dictionary)
let mut map = HashMap::new();
map.insert(key, value);
let val = map.get(&key); // Returns Option<&V>
- Iterator Patterns:
.into_iter() // Consume the vector
.map(|item| transform) // Transform each item
.collect() // Collect into Vec
- Deduplication:
.sort_by() // Sort for grouping duplicates
.dedup_by() // Remove consecutive duplicates
unwrap_or_else() vs unwrap_or():
// unwrap_or() evaluates immediately
map.get(key).unwrap_or(expensive_default()) // Always calls
// unwrap_or_else() is lazy
map.get(key).unwrap_or_else(|| expensive_default()) // Only if None
Part 5: Interactive Selection ๐ฎ
Let's use the dialoguer crate for a nice CLI experience:
fn main() -> Result<()> {
println!("\n๐ NGINX Discovery Tool (Ubuntu)\n");
let nginx_path = detect_nginx()?;
println!("โ Found nginx at {}", nginx_path.display());
let cfg_text = dump_nginx_config()?;
println!("โ Loaded full nginx configuration ({} bytes)", cfg_text.len());
let formats = parse_log_formats(&cfg_text)?;
if formats.is_empty() {
println!("โน No custom log_format directives found (will use nginx defaults)");
} else {
println!("โ Found {} custom log format(s)", formats.len());
for fmt in &formats {
println!(" โข {}", fmt.name);
}
}
let logs = parse_access_logs(&cfg_text)?;
println!("โ Found {} access_log directive(s)", logs.len());
if logs.is_empty() {
println!("โ No access_log directives found.");
return Ok(());
}
let sources = resolve_sources(formats, logs);
if sources.is_empty() {
println!("โ No valid log sources after filtering.");
return Ok(());
}
println!("\n๐ Detected log sources:\n");
for (i, s) in sources.iter().enumerate() {
let format_preview = if s.format.len() > 60 {
format!("{}...", &s.format[..60])
} else {
s.format.clone()
};
println!("[{}] {}", i + 1, s.path);
println!(" Format: {} ({})", s.format_name, format_preview);
}
println!("\n๐ก Use SPACE to select, ENTER to confirm\n");
let selections = MultiSelect::new()
.with_prompt("Select logs to analyze")
.items(&sources.iter().map(|s| s.path.as_str()).collect::<Vec<_>>())
.interact()?;
if selections.is_empty() {
println!("No logs selected. Exiting.");
return Ok(());
}
println!("\n๐งช Testing selected logs...\n");
for idx in &selections {
let src = &sources[*idx];
test_log_file(src)?;
}
check_logrotate()?;
if Confirm::new()
.with_prompt("Generate a starter config file?")
.default(true)
.interact()?
{
generate_config(&sources, &selections)?;
}
println!("\nโ
Discovery completed successfully.\n");
Ok(())
}
Error Propagation with ?:
Every function that can fail returns Result<T>. The ? operator:
let value = risky_function()?;
// Equivalent to:
let value = match risky_function() {
Ok(v) => v,
Err(e) => return Err(e),
};
dialoguer Crate:
-
MultiSelect: Checkbox list -
Confirm: Yes/no prompt -
.interact()?: Show prompt and wait for input
Part 6: File Testing & Validation โ
fn test_log_file(src: &LogSource) -> Result<()> {
println!("โถ Testing {}", src.path);
match fs::read_to_string(&src.path) {
Ok(content) => {
let line_count = content.lines().count();
if let Some(line) = content.lines().next() {
println!(" Lines: {}", line_count);
println!(" Sample: {}", &line[..line.len().min(100)]);
println!(" โ Read OK\n");
} else {
println!(" โ File is empty\n");
}
}
Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
println!(" โ File not found (may not exist yet)\n");
}
Err(e) if e.kind() == std::io::ErrorKind::PermissionDenied => {
println!(" โ Permission denied (try running with sudo)\n");
}
Err(e) => {
return Err(e).with_context(|| format!("Failed to read {}", src.path));
}
}
Ok(())
}
Pattern Matching on Errors:
Rust's match is powerful for error handling:
match fs::read_to_string(&path) {
Ok(content) => { /* success */ },
Err(e) if e.kind() == ErrorKind::NotFound => { /* specific error */ },
Err(e) => { /* catch all */ }
}
Guard Patterns (if in match):
-
Err(e) if e.kind() == ...matches only if condition is true - Allows fine-grained error handling
String Slicing Safety:
&line[..line.len().min(100)]
// Takes minimum of line length and 100
// Prevents panic if line < 100 chars
Checking System Configuration
fn check_logrotate() -> Result<()> {
println!("\n๐ Checking logrotate configuration...\n");
for p in ["/etc/logrotate.d/nginx", "/etc/logrotate.conf"] {
if fs::metadata(p).is_ok() {
println!("โ Found {}", p);
}
}
Ok(())
}
Rust Array Iteration:
-
["/path1", "/path2"]: Array literal -
for p in array: Direct iteration (no.iter()needed for arrays in loops)
Part 7: Config Generation ๐
Finally, let's generate a YAML config file:
fn generate_config(
sources: &[LogSource],
selections: &[usize]
) -> Result<()> {
let mut out = String::new();
out.push_str("# Generated by NGINX Discovery Tool\n");
out.push_str(&format!(
"# Generated: {}\n\n",
chrono::Local::now().format("%Y-%m-%d %H:%M:%S")
));
out.push_str("server:\n type: nginx\n\npipelines:\n");
for idx in selections {
let s = &sources[*idx];
// Escape single quotes for YAML
let escaped_format = s.format.replace('\'', "''");
out.push_str(&format!(
" - name: {}\n input: {}\n format: '{}'\n",
s.format_name, s.path, escaped_format
));
out.push_str(" # Add additional processing here\n\n");
}
let filename = "logcraft.generated.yaml";
fs::write(filename, out)?;
println!("๐ Config written to {}", filename);
Ok(())
}
String Building in Rust:
-
String::new(): Create empty mutable string -
.push_str(): Append string slice -
format!()macro: Likeprintf- returns String -
replace(): String replacement (creates new String)
Borrowing and Slices:
fn generate_config(sources: &[LogSource], ...)
// ^^^^^^^^^^^^ slice reference
// Can accept Vec<LogSource>, &Vec<LogSource>, or &[LogSource]
// More flexible than &Vec<T>
๐ Building and Running
# Build release version (optimized)
cargo build --release
# Run with sudo (needed for nginx -T and log file access)
sudo ./target/release/discover_nginx_rust
Output:
๐ NGINX Discovery Tool (Ubuntu)
โ Found nginx at /usr/sbin/nginx
โ Loaded full nginx configuration (10078 bytes)
โน No custom log_format directives found (will use nginx defaults)
โ Found 1 access_log directive(s)
๐ Detected log sources:
[1] /var/log/nginx/access.log
Format: combined ($remote_addr - $remote_user [$time_local]...)
๐ก Use SPACE to select, ENTER to confirm
Select logs to analyze: โ /var/log/nginx/access.log
๐งช Testing selected logs...
โถ Testing /var/log/nginx/access.log
Lines: 1247
Sample: 192.168.1.100 - - [19/Jan/2026:10:30:15 +0000] "GET / HTTP/1.1" 200 612...
โ Read OK
๐ Checking logrotate configuration...
โ Found /etc/logrotate.d/nginx
Generate a starter config file? Yes
๐ Config written to logcraft.generated.yaml
โ
Discovery completed successfully.
๐ Key Rust Concepts Learned
1. Error Handling
Result<T, E> // Return type for fallible operations
? // Propagate errors
.context("msg") // Add error context
anyhow::bail!() // Early return with error
2. Ownership & Borrowing
String vs &str // Owned vs borrowed
&[T] // Slice reference
.clone() // Explicit copying
into_iter() // Consume ownership
3. Pattern Matching
match value {
Ok(v) => ...,
Err(e) if condition => ...,
_ => ...
}
if let Some(x) = option { ... }
4. Iterators
.iter() // Borrow each element
.into_iter() // Consume collection
.map() // Transform
.filter() // Select
.collect() // Build collection
5. Options & Results
Option<T> // Some(T) or None
Result<T, E> // Ok(T) or Err(E)
.unwrap_or_else() // Lazy default
.map() // Transform inner value
.flatten() // Collapse nested Options
๐ Crates We Used
| Crate | Purpose | Key Features |
|---|---|---|
anyhow |
Error handling | Context, easy error types |
dialoguer |
CLI interaction | Prompts, selections |
which |
Find executables | Cross-platform PATH search |
regex |
Pattern matching | NGINX config parsing |
chrono |
Date/time | Timestamps |
๐ฎ What's Next?
In Part 2 of this series, we'll build a proper parser library:
- Hand-written lexer and parser
- Full AST (Abstract Syntax Tree)
- Support for all NGINX directives
- Publish to crates.io
Stay tuned!
๐ Resources
- Complete Code Gist (https://gist.github.com/urwithajit9/6e55447024a73d91402cc3566314e981)
- Rust Book
- NGINX Documentation
- Regex Tutorial
๐ฌ Discussion
What would you like to see in Part 2? Comment below!
- Full NGINX config validation?
- Config transformation tools?
- LSP (Language Server Protocol) support?
- Something else?
Found this helpful? Give it a โค๏ธ and follow for Part 2!
Top comments (0)