Hello, I'm Shrijith. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.
Servers run dozens—or hundreds—of processes at any given time. Some processes are critical for your applications or data, others are ephemeral system threads. BackupScout is a tool designed to automatically identify the processes that matter and classify them by backup relevance.By the end of this post, you’ll understand how BackupScout:
- Enumerates processes
- Classifies them into categories
- Flags them as High, Medium, or Low relevance
- Produces a JSON file ready for review or further automation
The heavy lifting is powered by AI Studio’s Gemini API, which classifies processes based on minimal metadata like name, binary path, and parent process. No manual rules needed.
What BackupScout Produces
Here’s an example of the JSON output:
[
{
"pid": 330,
"name": "mysqld",
"category": "Database",
"backup_relevance": "High"
},
{
"pid": 451,
"name": "nginx",
"category": "Web Server",
"backup_relevance": "Medium"
},
{
"pid": 22,
"name": "kworker/1",
"category": "Kernel Thread",
"backup_relevance": "Low"
}
]
This gives immediate insight into what processes are worth backing up or monitoring.
Step 1: Enumerating Processes
We use Python’s psutil library for process enumeration. It provides cross-platform access to process metadata.
import psutil
processes = []
for proc in psutil.process_iter(['pid', 'name', 'exe', 'ppid']):
try:
processes.append(proc.info)
except (psutil.NoSuchProcess, psutil.AccessDenied):
continue
print(processes[:5]) # Example: preview first 5 processes
Sample Output:
{'pid': 1, 'name': 'systemd', 'exe': '/usr/lib/systemd/systemd', 'ppid': 0}
{'pid': 330, 'name': 'mysqld', 'exe': '/usr/sbin/mysqld', 'ppid': 1}
{'pid': 451, 'name': 'nginx', 'exe': '/usr/sbin/nginx', 'ppid': 1}
This metadata is enough for AI classification without exposing full process environments.
Step 2: Classifying Processes with AI
BackupScout uses Gemini via AI Studio for classification. The AI model assigns:
- Category (Database, Web App, Web Server, Caching, System)
- Backup relevance (High, Medium, Low)
For example, mysqld → Database → High, nginx → Web Server → Medium, kworker/1 → Kernel Thread → Low.
The AI handles these assignments automatically, so you don’t need to maintain a ruleset.
Step 3: Batching and Incremental Processing
For servers with hundreds of processes, sending them all at once can exceed the AI input limit. BackupScout handles this by:
- Processing in batches (e.g., 10–20 processes per request)
- Saving results incrementally to disk
- Retrying failed batches automatically
Example of incremental saving:
import json
import os
results_file = "process_classification.json"
all_results = json.load(open(results_file)) if os.path.exists(results_file) else []
# After processing a batch
all_results.extend(batch_results)
with open(results_file, "w") as f:
json.dump(all_results, f, indent=2)
This ensures partial results are never lost during long scans or network failures.
Step 4: Reviewing Results
Once the JSON is ready, you can filter High-relevance processes using jq:
jq '[.[] | select(.backup_relevance=="High")]' process_classification.json
Or get a quick PID + name + category list:
jq -r '.[] | select(.backup_relevance=="High") | "\(.pid)\t\(.name)\t\(.category)"' process_classification.json
Sample Output:
330 mysqld Database
451 wordpress Web App
This makes it easy to identify critical processes at a glance.
Step 5: Handling Large Servers
BackupScout is designed to work on servers with hundreds of processes. Key strategies:
- Batch AI requests to avoid hitting token limits
- Incremental saving to maintain progress
- Retries to handle network or API errors
These make the tool robust and reliable, even in real-world server environments.
Step 6: Practical Use Cases
BackupScout helps in:
- Prioritizing backups for databases and web apps
- Preparing server snapshots for disaster recovery *Identifying high-impact processes for monitoring
Because AI classification is already done, your workflow is mostly orchestration and review—you don’t need to maintain manual rules.
Where We Go Next
In the next part of the series, we’ll combine everything into a full working BackupScout script:
- Enumerating processes
- Batching AI calls
- Incremental saving
- Automatic retries
The goal: a ready-to-run tool for scanning any server and discovering critical data automatically.
Reference: psutil Python Documentation
*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*
Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:
HexmosTech
/
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
See It In Action
See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements
git-lrc-intro-60s.mp4
Why
- 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
- 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
- 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
- 🔗 Why git? Git is universal. Every editor, every IDE, every AI…
Top comments (0)