How to scan your file system and query files using SQL
TL;DR: In this guide, you'll learn how to use surveilr to scan directories, capture file metadata in a SQLite database, and query for files using SQL—find PDFs, track document changes, and audit file system activity.
Prerequisites: surveilr installed
The Problem: Finding Files is Hard
If you've ever needed to track files across your system for compliance audits or project management, you know the pain:
- Scattered files — Documents, spreadsheets, and PDFs across multiple directories
-
Manual searches — Using
findand Finder/Explorer is tedious and limited - No audit trail — No record of what existed when
- Point-in-time only — No way to track file changes over time
What if you could turn your entire file system into a queryable SQL database where you can find any file with SQL queries?
That's exactly what surveilr does.
What is surveilr?
surveilr is an edge-based, SQL-first surveillance platform that turns your file system into a Resource Surveillance State Database (RSSD)—a SQLite database containing all your file metadata.
Think of it as "Git for your entire file system's metadata" that you can query with SQL.
Key features:
- 📂 Scans directories and captures file metadata (size, timestamps, hashes)
- 🔍 Query files with SQL (no custom DSL to learn)
- 🔄 Watch mode for continuous monitoring
- 🔒 Works offline (edge-based, no cloud required)
- 📊 Perfect for compliance audits (HIPAA, GDPR, SOC 2)
Let's see it in action.
Step 1: Create Your First RSSD
Initialize a new surveillance database:
surveilr admin init -d my-audit.db
What just happened?
surveilr created my-audit.db—a SQLite database that will store all your file evidence. This is your Resource Surveillance State Database (RSSD).
💡 Pro tip: The RSSD is a standard SQLite database. Once created, it's independent of surveilr and can be queried by any tool that supports SQLite.
Step 2: Ingest Files from a Directory
Scan your Documents folder (or any directory):
# Scan your Documents folder
surveilr ingest files -r ~/Documents -d my-audit.db
What's happening behind the scenes:
- surveilr walks the directory tree recursively
- Computes file hashes (SHA-256) for content tracking
- Captures metadata: size, timestamps, permissions, extensions
- Stores everything in SQLite tables
Step 3: Find Sensitive Files with SQL
Now comes the fun part. Open the interactive SQL shell:
surveilr shell -d my-audit.db
📌 Note: File metadata lives in two joined tables:
uniform_resource— Core resource metadata (hash, size, timestamps)ur_ingest_session_fs_path_entry— File-specific metadata (path, name, extension)
Query 1: Find PDFs Modified Recently
Find all PDF files modified in the last 30 days:
SELECT
e.file_path_abs,
e.file_basename,
u.size_bytes / 1024 AS size_kb,
u.last_modified_at
FROM uniform_resource u
JOIN ur_ingest_session_fs_path_entry e
ON u.uniform_resource_id = e.uniform_resource_id
WHERE e.file_extn = 'pdf'
AND u.last_modified_at > datetime('now', '-30 days')
ORDER BY u.last_modified_at DESC;
Example output:
/Users/you/Documents/report-2024.pdf | report-2024.pdf | 2048 | 2024-05-12
/Users/you/Projects/proposal.pdf | proposal.pdf | 3248 | 2024-05-10
/Users/you/Desktop/invoice-may.pdf | invoice-may.pdf | 512 | 2024-05-08
Query 2: Find All Excel Files
Find all Excel spreadsheets across your system:
SELECT
e.file_path_abs,
e.file_basename,
u.size_bytes / 1024 AS size_kb,
u.last_modified_at
FROM uniform_resource u
JOIN ur_ingest_session_fs_path_entry e
ON u.uniform_resource_id = e.uniform_resource_id
WHERE e.file_extn IN ('xlsx', 'xls', 'csv')
ORDER BY u.last_modified_at DESC
LIMIT 20;
Query 3: Find Word Documents by Name
Search for documents by filename:
SELECT
e.file_path_abs,
e.file_basename,
u.last_modified_at
FROM uniform_resource u
JOIN ur_ingest_session_fs_path_entry e
ON u.uniform_resource_id = e.uniform_resource_id
WHERE e.file_extn IN ('docx', 'doc')
AND e.file_basename LIKE '%report%'
ORDER BY u.last_modified_at DESC;
Step 4: Export Your Findings
Export results as JSON for reports:
surveilr shell -d my-audit.db --cmd "
SELECT e.file_path_abs, e.file_basename, u.size_bytes, u.last_modified_at
FROM uniform_resource u
JOIN ur_ingest_session_fs_path_entry e
ON u.uniform_resource_id = e.uniform_resource_id
WHERE e.file_extn = 'pdf'
" --output json > pdf-files-report.json
Or save as a SQL file for reuse:
cat > find-pdfs.sql <<'EOF'
SELECT e.file_path_abs, e.file_basename, u.size_bytes
FROM uniform_resource u
JOIN ur_ingest_session_fs_path_entry e
ON u.uniform_resource_id = e.uniform_resource_id
WHERE e.file_extn = 'pdf'
EOF
surveilr shell -d my-audit.db find-pdfs.sql --output json > pdfs.json
Step 5: Automate with Watch Mode
Set up continuous monitoring so surveilr automatically re-scans when files change:
surveilr ingest files -r ~/Documents -d my-audit.db --watch
What happens:
- surveilr monitors the directory for changes
- New or modified files are automatically ingested
- Press
Ctrl+Cto stop
Real-World Use Cases
🏥 HIPAA Compliance
Track PHI locations — Document where patient health information files are stored for GDPR/HIPAA audits.
🌍 GDPR Data Governance
Map PII locations — Track where personally identifiable information is stored across your organization.
🚨 Incident Response
Detect tampering — Establish file baselines and detect unauthorized modifications during security incidents.
Troubleshooting
Problem: "Permission denied" errors when scanning
Solution: Run with appropriate permissions or scan a different directory
Problem: RSSD file getting too large
Solution: Use --dry-run first to check file counts:
surveilr ingest files -r ~/Documents --dry-run
What's Next?
You've just created your first compliance evidence database! Here's what to explore next:
- Email Compliance Tracking — Track email communications via IMAP
- GitHub/GitLab/Jira Tracking — Project data with Singer taps
- surveilr.com — Full documentation and advanced features
Learn More
- surveilr.com — Official website and documentation
- Installation Guide — Download and install surveilr
Key Takeaways
✅ surveilr turns your file system into a queryable SQL database
✅ Standard SQLite = no vendor lock-in, works with any SQLite tool
✅ Perfect for compliance audits (HIPAA, GDPR, SOC 2)
✅ Continuous monitoring with watch mode
✅ All data stays on your machine (edge-based, no cloud required)
Result: A complete audit trail of your file system that you can query anytime
Found this helpful? Visit surveilr.com to learn more! Have questions? Open an issue.
Top comments (0)