DEV Community

Mukhtar
Mukhtar

Posted on

Audit Your File System with Surveilr: A Practical Guide

How to scan your file system and query files using SQL


TL;DR: In this guide, you'll learn how to use surveilr to scan directories, capture file metadata in a SQLite database, and query for files using SQL—find PDFs, track document changes, and audit file system activity.

Prerequisites: surveilr installed


The Problem: Finding Files is Hard

If you've ever needed to track files across your system for compliance audits or project management, you know the pain:

  • Scattered files — Documents, spreadsheets, and PDFs across multiple directories
  • Manual searches — Using find and Finder/Explorer is tedious and limited
  • No audit trail — No record of what existed when
  • Point-in-time only — No way to track file changes over time

What if you could turn your entire file system into a queryable SQL database where you can find any file with SQL queries?

That's exactly what surveilr does.


What is surveilr?

surveilr is an edge-based, SQL-first surveillance platform that turns your file system into a Resource Surveillance State Database (RSSD)—a SQLite database containing all your file metadata.

Think of it as "Git for your entire file system's metadata" that you can query with SQL.

Key features:

  • 📂 Scans directories and captures file metadata (size, timestamps, hashes)
  • 🔍 Query files with SQL (no custom DSL to learn)
  • 🔄 Watch mode for continuous monitoring
  • 🔒 Works offline (edge-based, no cloud required)
  • 📊 Perfect for compliance audits (HIPAA, GDPR, SOC 2)

Let's see it in action.


Step 1: Create Your First RSSD

Initialize a new surveillance database:

surveilr admin init -d my-audit.db
Enter fullscreen mode Exit fullscreen mode

What just happened?
surveilr created my-audit.db—a SQLite database that will store all your file evidence. This is your Resource Surveillance State Database (RSSD).

💡 Pro tip: The RSSD is a standard SQLite database. Once created, it's independent of surveilr and can be queried by any tool that supports SQLite.


Step 2: Ingest Files from a Directory

Scan your Documents folder (or any directory):

# Scan your Documents folder
surveilr ingest files -r ~/Documents -d my-audit.db
Enter fullscreen mode Exit fullscreen mode

What's happening behind the scenes:

  1. surveilr walks the directory tree recursively
  2. Computes file hashes (SHA-256) for content tracking
  3. Captures metadata: size, timestamps, permissions, extensions
  4. Stores everything in SQLite tables

Step 3: Find Sensitive Files with SQL

Now comes the fun part. Open the interactive SQL shell:

surveilr shell -d my-audit.db
Enter fullscreen mode Exit fullscreen mode

📌 Note: File metadata lives in two joined tables:

  • uniform_resource — Core resource metadata (hash, size, timestamps)
  • ur_ingest_session_fs_path_entry — File-specific metadata (path, name, extension)

Query 1: Find PDFs Modified Recently

Find all PDF files modified in the last 30 days:

SELECT
    e.file_path_abs,
    e.file_basename,
    u.size_bytes / 1024 AS size_kb,
    u.last_modified_at
FROM uniform_resource u
JOIN ur_ingest_session_fs_path_entry e
  ON u.uniform_resource_id = e.uniform_resource_id
WHERE e.file_extn = 'pdf'
  AND u.last_modified_at > datetime('now', '-30 days')
ORDER BY u.last_modified_at DESC;
Enter fullscreen mode Exit fullscreen mode

Example output:

/Users/you/Documents/report-2024.pdf    | report-2024.pdf    | 2048 | 2024-05-12
/Users/you/Projects/proposal.pdf        | proposal.pdf       | 3248 | 2024-05-10
/Users/you/Desktop/invoice-may.pdf      | invoice-may.pdf    | 512  | 2024-05-08
Enter fullscreen mode Exit fullscreen mode

Query 2: Find All Excel Files

Find all Excel spreadsheets across your system:

SELECT
    e.file_path_abs,
    e.file_basename,
    u.size_bytes / 1024 AS size_kb,
    u.last_modified_at
FROM uniform_resource u
JOIN ur_ingest_session_fs_path_entry e
  ON u.uniform_resource_id = e.uniform_resource_id
WHERE e.file_extn IN ('xlsx', 'xls', 'csv')
ORDER BY u.last_modified_at DESC
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

Query 3: Find Word Documents by Name

Search for documents by filename:

SELECT
    e.file_path_abs,
    e.file_basename,
    u.last_modified_at
FROM uniform_resource u
JOIN ur_ingest_session_fs_path_entry e
  ON u.uniform_resource_id = e.uniform_resource_id
WHERE e.file_extn IN ('docx', 'doc')
  AND e.file_basename LIKE '%report%'
ORDER BY u.last_modified_at DESC;
Enter fullscreen mode Exit fullscreen mode

Step 4: Export Your Findings

Export results as JSON for reports:

surveilr shell -d my-audit.db --cmd "
SELECT e.file_path_abs, e.file_basename, u.size_bytes, u.last_modified_at
FROM uniform_resource u
JOIN ur_ingest_session_fs_path_entry e
  ON u.uniform_resource_id = e.uniform_resource_id
WHERE e.file_extn = 'pdf'
" --output json > pdf-files-report.json
Enter fullscreen mode Exit fullscreen mode

Or save as a SQL file for reuse:

cat > find-pdfs.sql <<'EOF'
SELECT e.file_path_abs, e.file_basename, u.size_bytes
FROM uniform_resource u
JOIN ur_ingest_session_fs_path_entry e
  ON u.uniform_resource_id = e.uniform_resource_id
WHERE e.file_extn = 'pdf'
EOF

surveilr shell -d my-audit.db find-pdfs.sql --output json > pdfs.json
Enter fullscreen mode Exit fullscreen mode

Step 5: Automate with Watch Mode

Set up continuous monitoring so surveilr automatically re-scans when files change:

surveilr ingest files -r ~/Documents -d my-audit.db --watch
Enter fullscreen mode Exit fullscreen mode

What happens:

  • surveilr monitors the directory for changes
  • New or modified files are automatically ingested
  • Press Ctrl+C to stop

Real-World Use Cases

🏥 HIPAA Compliance

Track PHI locations — Document where patient health information files are stored for GDPR/HIPAA audits.

🌍 GDPR Data Governance

Map PII locations — Track where personally identifiable information is stored across your organization.

🚨 Incident Response

Detect tampering — Establish file baselines and detect unauthorized modifications during security incidents.


Troubleshooting

Problem: "Permission denied" errors when scanning
Solution: Run with appropriate permissions or scan a different directory

Problem: RSSD file getting too large
Solution: Use --dry-run first to check file counts:

surveilr ingest files -r ~/Documents --dry-run
Enter fullscreen mode Exit fullscreen mode

What's Next?

You've just created your first compliance evidence database! Here's what to explore next:


Learn More


Key Takeaways

surveilr turns your file system into a queryable SQL database
Standard SQLite = no vendor lock-in, works with any SQLite tool
Perfect for compliance audits (HIPAA, GDPR, SOC 2)
Continuous monitoring with watch mode
All data stays on your machine (edge-based, no cloud required)

Result: A complete audit trail of your file system that you can query anytime


Found this helpful? Visit surveilr.com to learn more! Have questions? Open an issue.

Top comments (0)