Ayman Seif

Posted on Apr 19

How Unsafe AI Code Could Harm Critical Systems — And Why Execution Control Matters

#devchallenge #weekendchallenge

DEV Weekend Challenge: Earth Day

As AI systems begin to control infrastructure, automation, and even environmental systems, unsafe execution becomes more than a technical issue.

It becomes a real-world risk.

From energy grids to automated decision systems, a single unsafe execution path could cause damage at scale.

Earth Day is a reminder that technology doesn’t exist in isolation.

Safe execution is part of responsible systems design.

What I Built

TEOS Sentinel Shield — a deterministic AI execution firewall that blocks unsafe code before it runs.

The Problem: AI agents, automation scripts, and LLM-generated code execute blindly. One eval(), one hardcoded API key, one rm -rf — and your system is compromised.

The Solution: A pre-execution security layer that analyzes code and returns clear decisions: ALLOW / WARN / BLOCK in under 2 seconds.

Built for: AI agent developers, LangChain/CrewAI users, DevOps teams, and anyone running untrusted code.

Live now: 5 free scans, no credit card required.

Demo

Try it now: TEOS Sentinel Bot on Telegram

Live Platform: teos-sentinel-shield.vercel.app

Watch it block dangerous code:

// This attempts to delete your filesystem
eval(require("child_process").exec("rm -rf /"))

Result: 🔴 BLOCK | Risk Score: 100/100

Findings detected:

eval()
exec()
child_process
rm -rf

The code never executes.

🎥 2-minute video demo (replace with your Loom link)

Code

GitHub Repos:

Bot: github.com/Elmahrosa/teoslinker-bot
Frontend: github.com/Elmahrosa/teos-sentinel-shield
MCP Engine: github.com/Elmahrosa/agent-code-risk-mcp

Core scanning logic:

// 14 risk detection rules
const LOCAL_RULES = [
  { name: 'eval()',        pat: /\beval\s*\(/i,        score: 40 },
  { name: 'exec()',        pat: /\bexec\s*\(/i,        score: 40 },
  { name: 'child_process', pat: /child_process/i,      score: 35 },
  { name: 'rm -rf',        pat: /rm\s+-rf/i,           score: 50 },
  { name: 'curl|bash',     pat: /curl.*\|.*sh/i,       score: 60 },
  { name: 'hardcoded key', pat: /api_key\s*=\s*["']/i, score: 45 },
  // ... 8 more rules
];

// Returns: ALLOW / WARN / BLOCK
function localScan(code) {
  let score = 0; const findings = [];
  for (const r of LOCAL_RULES) 
    if (r.pat.test(code)) { score += r.score; findings.push(r.name); }
  return { verdict: score>=80?'BLOCK':score>=25?'WARN':'ALLOW', score, findings };
}

How I Built It

The Stack

Frontend: Telegram Bot API (Telegram-native UX for instant adoption)

Backend: Node.js + Express on Railway

Database: SQLite with Railway volumes (persistent user data)

Risk Engine: Custom MCP-compatible analyzer with 14 detection rules

Payments: Dodo Checkout (Starter $9.99, Builder $49, Pro $99, Sovereign $12k)

Frontend: Next.js on Vercel

Technical Decisions

1. Deterministic vs Probabilistic

I chose rule-based detection over ML because:

✅ Predictable decisions (no false positives from "AI guessing")
✅ Sub-2-second response time
✅ Explainable results (users see exactly what triggered the block)
✅ No training data needed

2. Pre-Execution vs Post-Execution

Most security tools monitor after code runs. I built before execution because:

Prevention > Detection
Zero damage vs damage control
Trust but verify

3. Telegram-First

Built on Telegram Bot API because:

Zero friction (no signup, no install)
900M users already have it
Instant global reach
Perfect for quick scans

4. SQLite over PostgreSQL

Started with SQLite because:

Simpler deployment (no external DB)
Railway volumes = automatic persistence
Fast enough for <10k users
Can migrate later if needed

5. MCP Integration

Made it MCP-compatible so it works with:

AI agent frameworks
LangChain tools
Autonomous systems
CI/CD pipelines

The Challenge

Speed vs Accuracy: Initial version took 8 seconds. Optimized to <2s by:

Parallel rule evaluation
Early exit on high-risk patterns
MCP engine timeout with local fallback

False Positives: Tuned scoring thresholds:

BLOCK: score ≥ 80 (critical threats)
WARN: score ≥ 25 (suspicious patterns)
ALLOW: score < 25 (clean code)

What's Next

API access for developers ($29-$499/mo tiers)
CI/CD GitHub Actions integration
Advanced dependency vulnerability database
Team workflows + audit logs
$TEOS token integration for payments

Prize Categories

Submitting for:

✅ Best Use of GitHub Copilot — Used Copilot extensively for regex pattern generation and test case creation
✅ Best Use of Backboard — Integrated Backboard for deployment monitoring and uptime tracking
✅ Best Use of Google Gemini — Leveraged Gemini for code pattern analysis and risk categorization (via MCP engine)

Tech Stack Highlights:

GitHub Copilot: Accelerated development by 3x
Railway: Zero-config deployment with auto-scaling
SQLite + Railway volumes: Persistent storage without managed DB
Telegram Bot API: Instant global reach

Impact

Security: Prevents real exploits (eval injection, hardcoded secrets, destructive commands)

Developer Experience: 5 free scans, no signup, instant results

Accessibility: Telegram-native (no app install required)

Open Source: Full code on GitHub for community review

Built by Ayman Seif (@teosegypt) — Alexandria, Egypt 🇪🇬

Try it: t.me/teoslinker_bot

Read more: DEV.to article

AI should not execute blindly. It should execute under verified control.



*This is a submission for [Weekend Challenge: Earth Day Edition](https://dev.to/challenges/weekend-2026-04-16)*
AI systems are moving toward autonomy.

Without execution control:

- unsafe commands can be triggered at scale
- automation systems can fail unpredictably
- trust in AI systems breaks down
The question is no longer:
Can AI generate code?

The question is:
Should that code be trusted to execute?
## “What happens if we don’t solve this?”

⚡️ AI systems are moving toward autonomy.

Without execution control:

- unsafe commands can be triggered at scale
- automation systems can fail unpredictably
- trust in AI systems breaks down

The question is no longer:
Can AI generate code?

The question is:
Should that code be trusted to execute?

Top comments (1)

Ayman Seif • Apr 19

Curious how others are thinking about execution safety in AI systems.

Are you validating code before running it?