DEV Community

Ayman Seif
Ayman Seif

Posted on

How Unsafe AI Code Could Harm Critical Systems — And Why Execution Control Matters

DEV Weekend Challenge: Earth Day

As AI systems begin to control infrastructure, automation, and even environmental systems, unsafe execution becomes more than a technical issue.

It becomes a real-world risk.

From energy grids to automated decision systems, a single unsafe execution path could cause damage at scale.

Earth Day is a reminder that technology doesn’t exist in isolation.

Safe execution is part of responsible systems design.

What I Built

TEOS Sentinel Shield — a deterministic AI execution firewall that blocks unsafe code before it runs.

The Problem: AI agents, automation scripts, and LLM-generated code execute blindly. One eval(), one hardcoded API key, one rm -rf — and your system is compromised.

The Solution: A pre-execution security layer that analyzes code and returns clear decisions: ALLOW / WARN / BLOCK in under 2 seconds.

Built for: AI agent developers, LangChain/CrewAI users, DevOps teams, and anyone running untrusted code.

Live now: 5 free scans, no credit card required.

Demo

Try it now: TEOS Sentinel Bot on Telegram

Live Platform: teos-sentinel-shield.vercel.app

Watch it block dangerous code:

// This attempts to delete your filesystem
eval(require("child_process").exec("rm -rf /"))
Enter fullscreen mode Exit fullscreen mode

Result: 🔴 BLOCK | Risk Score: 100/100

Findings detected:

  • eval()
  • exec()
  • child_process
  • rm -rf

The code never executes.

🎥 2-minute video demo (replace with your Loom link)

Code

GitHub Repos:

Core scanning logic:

// 14 risk detection rules
const LOCAL_RULES = [
  { name: 'eval()',        pat: /\beval\s*\(/i,        score: 40 },
  { name: 'exec()',        pat: /\bexec\s*\(/i,        score: 40 },
  { name: 'child_process', pat: /child_process/i,      score: 35 },
  { name: 'rm -rf',        pat: /rm\s+-rf/i,           score: 50 },
  { name: 'curl|bash',     pat: /curl.*\|.*sh/i,       score: 60 },
  { name: 'hardcoded key', pat: /api_key\s*=\s*["']/i, score: 45 },
  // ... 8 more rules
];

// Returns: ALLOW / WARN / BLOCK
function localScan(code) {
  let score = 0; const findings = [];
  for (const r of LOCAL_RULES) 
    if (r.pat.test(code)) { score += r.score; findings.push(r.name); }
  return { verdict: score>=80?'BLOCK':score>=25?'WARN':'ALLOW', score, findings };
}
Enter fullscreen mode Exit fullscreen mode

How I Built It

The Stack

Frontend: Telegram Bot API (Telegram-native UX for instant adoption)

Backend: Node.js + Express on Railway

Database: SQLite with Railway volumes (persistent user data)

Risk Engine: Custom MCP-compatible analyzer with 14 detection rules

Payments: Dodo Checkout (Starter $9.99, Builder $49, Pro $99, Sovereign $12k)

Frontend: Next.js on Vercel

Technical Decisions

1. Deterministic vs Probabilistic

I chose rule-based detection over ML because:

  • ✅ Predictable decisions (no false positives from "AI guessing")
  • ✅ Sub-2-second response time
  • ✅ Explainable results (users see exactly what triggered the block)
  • ✅ No training data needed

2. Pre-Execution vs Post-Execution

Most security tools monitor after code runs. I built before execution because:

  • Prevention > Detection
  • Zero damage vs damage control
  • Trust but verify

3. Telegram-First

Built on Telegram Bot API because:

  • Zero friction (no signup, no install)
  • 900M users already have it
  • Instant global reach
  • Perfect for quick scans

4. SQLite over PostgreSQL

Started with SQLite because:

  • Simpler deployment (no external DB)
  • Railway volumes = automatic persistence
  • Fast enough for <10k users
  • Can migrate later if needed

5. MCP Integration

Made it MCP-compatible so it works with:

  • AI agent frameworks
  • LangChain tools
  • Autonomous systems
  • CI/CD pipelines

The Challenge

Speed vs Accuracy: Initial version took 8 seconds. Optimized to <2s by:

  • Parallel rule evaluation
  • Early exit on high-risk patterns
  • MCP engine timeout with local fallback

False Positives: Tuned scoring thresholds:

  • BLOCK: score ≥ 80 (critical threats)
  • WARN: score ≥ 25 (suspicious patterns)
  • ALLOW: score < 25 (clean code)

What's Next

  • API access for developers ($29-$499/mo tiers)
  • CI/CD GitHub Actions integration
  • Advanced dependency vulnerability database
  • Team workflows + audit logs
  • $TEOS token integration for payments

Prize Categories

Submitting for:

  • Best Use of GitHub Copilot — Used Copilot extensively for regex pattern generation and test case creation
  • Best Use of Backboard — Integrated Backboard for deployment monitoring and uptime tracking
  • Best Use of Google Gemini — Leveraged Gemini for code pattern analysis and risk categorization (via MCP engine)

Tech Stack Highlights:

  • GitHub Copilot: Accelerated development by 3x
  • Railway: Zero-config deployment with auto-scaling
  • SQLite + Railway volumes: Persistent storage without managed DB
  • Telegram Bot API: Instant global reach

Impact

Security: Prevents real exploits (eval injection, hardcoded secrets, destructive commands)

Developer Experience: 5 free scans, no signup, instant results

Accessibility: Telegram-native (no app install required)

Open Source: Full code on GitHub for community review


Built by Ayman Seif (@teosegypt) — Alexandria, Egypt 🇪🇬

Try it: t.me/teoslinker_bot

Read more: DEV.to article

AI should not execute blindly. It should execute under verified control.



*This is a submission for [Weekend Challenge: Earth Day Edition](https://dev.to/challenges/weekend-2026-04-16)*
AI systems are moving toward autonomy.

Without execution control:

- unsafe commands can be triggered at scale
- automation systems can fail unpredictably
- trust in AI systems breaks down
The question is no longer:
Can AI generate code?

The question is:
Should that code be trusted to execute?
## “What happens if we don’t solve this?”

⚡️ AI systems are moving toward autonomy.

Without execution control:

- unsafe commands can be triggered at scale
- automation systems can fail unpredictably
- trust in AI systems breaks down

The question is no longer:
Can AI generate code?

The question is:
Should that code be trusted to execute?
Enter fullscreen mode Exit fullscreen mode

Top comments (1)

Collapse
 
teosegypt profile image
Ayman Seif

Curious how others are thinking about execution safety in AI systems.

Are you validating code before running it?