DEV Community: Shaiful Islam Shabuj

@teststop : AI-Powered Adversarial Testing With No Configuration

Shaiful Islam Shabuj — Fri, 22 May 2026 15:17:16 +0000

v0.1.0 · github.com/shaifulshabuj/teststop

Every test suite ever written was written by someone who already knew how the system works.

Real users don't. That's why production still surprises us. We call it test coverage. What we actually have is assumption coverage — the set of things we imagined going wrong, written down, and checked. The things we didn't imagine are still there, waiting.

teststop flips this. Instead of writing tests, you give AI a mandate to think like a real adversarial user — someone who never read the docs, opens the same form in two tabs, retries three times when the page is slow, and pastes emoji into a date field. AI already knows all of these patterns. It just needed the right instruction.

go install github.com/shaifulshabuj/teststop/cmd/teststop@latest
teststop run

No API keys. No config file. claude or copilot CLI must be on your PATH — that's all.

What It Actually Gives You

Zero-configuration adversarial scenarios

teststop run scans your project, detects the language and system type, composes an AI mandate from that context, and hands it to Claude or Copilot in non-interactive mode. You get back structured scenarios: what the adversarial user tries, what chaos they inject, what failure they expose.

teststop run --depth aggressive --output json

Output is JSON by default — machine-readable, agent-parseable, ready to pipe into your CI pipeline.

Tests that reduce over time (not grow)

Most testing tools add work forever. teststop does the opposite. It maintains a confidence score per system area in .teststop/memory.json:

+0.19 per passing scenario
-0.30 per failure
Areas at 0.95+ confidence are retired — teststop stops hammering them

After ~15 clean passes, an area is proven stable and teststop moves on. The test surface shrinks as your system earns its confidence. Commit memory.json to version control — it's the accumulated proof that your system works.

An AI mandate you can read and improve

The mandate that drives AI behavior is mandate/base.md — plain markdown, checked into the repo, readable and editable. It describes the adversarial user archetypes: the one who never reads docs, the one who double-clicks, the one who pastes unexpected input, the one who has a slow connection and a short fuse.

teststop mandate --show   # see exactly what AI receives

The mandate is open — file a PR to improve the adversarial patterns for your domain.

Built for AI agent workflows

Exit codes are designed for autonomous agents:

Code	Meaning	Agent action
`0`	Confidence met	Safe to deploy
`1`	Below threshold	Review required
`2`	Critical failure	Do NOT deploy

teststop run --output json --quiet --no-color
# Exit 0: proceed. Exit 2: stop.

Apple Container sandbox isolation

When you run teststop, AI executes inside an isolated Apple Container VM — not on your host. The mandate is passed as a CLI argument. AI outputs JSON to stdout. Your filesystem is never touched by the AI process.

Works Everywhere


Languages	Any — Go, Node, Python, Rust, and more auto-detected
AI CLI	Claude Code (`claude`) or GitHub Copilot CLI (`copilot`)
Sandbox	Apple Container (macOS arm64) or direct exec fallback
CI	`TESTSTOP_SANDBOX=none` for Docker/non-macOS environments

If you're shipping AI-assisted code and wondering whether your test suite is actually covering what real users do — teststop is the missing piece.

github.com/shaifulshabuj/teststop · go install github.com/shaifulshabuj/teststop/cmd/teststop@latest

why is software development getting MORE complex every year — when the goal never changed?

Shaiful Islam Shabuj — Mon, 18 May 2026 14:02:57 +0000

Every software product ever built is the same three things.
Input → Analyze → Output.
A notepad. A banking system. An ERP. An AI assistant.
All identical in shape. Only the analyzer changes.

So why is software development getting MORE complex every year — when the goal never changed?
I spent time mapping this out. The answer is uncomfortable.

𝗧𝗵𝗲 𝗧𝗿𝗮𝗰𝗸𝗲𝗿'𝘀 𝗣𝗮𝗿𝗮𝗱𝗼𝘅
Every tool built to manage work eventually becomes work itself.

Not because the tool is bad. Because of a systemic pattern that every project tracker, documentation system, and workflow platform follows — regardless of how well it is designed:

Simple human problem: "Who is doing what? Is it done?"
A good tool is built. Solves it at first.
The tool creates its own complexity — config, plugins, training, admin roles
A new role emerges to manage the tool full-time
Teams run "tool cleanup sprints" to manage the tool that manages the work

↺ Then the replacement tool follows the exact same arc.

𝗧𝗵𝗲 𝗻𝘂𝗺𝗯𝗲𝗿𝘀 𝗳𝗼𝗿 𝗮 𝗺𝗶𝗱-𝘀𝗶𝘇𝗲 𝘁𝗲𝗮𝗺:
Tool overhead — licensing, admin salaries, training, and developer time spent managing the tool instead of building.

The original question — "is the work getting done?" — is often harder to answer than ever.

𝗧𝗵𝗲 𝗿𝗼𝗼𝘁 𝗰𝗮𝘂𝘀𝗲:
The Semantic Gap — the distance between human intent (contextual, meaningful, fuzzy) and machine execution (literal, precise, syntactic).

Every layer of tooling complexity exists to bridge this gap manually. Each bridge needs its own bridge. The industry calls this progress.

𝗧𝗵𝗲 𝗽𝗿𝗼𝗽𝗼𝘀𝗲𝗱 𝗲𝘅𝗶𝘁:
One product. Intent in. Value out. Nothing in between.

Not a tool that tracks work — a system that ensures the work gets done.
Not a documentation system — a system that preserves the WHY so nobody ever has to ask.

The loop breaks when the only interface is natural language and the output is the actual value. No config. No admin. No training spiral. No cleanup sprints.

𝗧𝗵𝗶𝘀 𝗶𝘀 𝘄𝗵𝘆 𝗜'𝗺 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴:
→ 𝗗𝗼𝗰𝘂𝗙𝗹𝗼𝘄: captures WHY decisions were made — not static docs, but the living theory of the system. Gives AI the domain context to be genuinely useful.
Github: github/docuflow

→ 𝗪𝗮𝘆𝗺𝗮𝗿𝗸: makes AI agent actions transparent and auditable. Builds the trust that lets humans step back — gradually, verifiably, permanently.
Github: github/waymark

Both are designed with exit conditions. Success means users need them LESS over time — not more.

𝗧𝗵𝗲 𝗼𝗻𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻 𝗜 𝗮𝘀𝗸 𝗲𝘃𝗲𝗿𝘆 𝘄𝗲𝗲𝗸:
"Does this make the user's problem easier to solve — or our system easier to build?"

If it's the second one: accidental complexity. Cut it.

The goal was never the software.
It was always the value on the other side of it.

What pattern have you seen the Tracker's Paradox play out in?

Check more details on my Linkedin/post

Waymark v4.7.0 is Live — The Ultimate MCP Security Layer

Shaiful Islam Shabuj — Fri, 15 May 2026 06:23:03 +0000

Just shipped: 30% faster policy enforcement, real-time approvals dashboard, and better error messages.

The Problem We Solve

AI agents like Claude are awesome for code generation, but they can:

😱 Read your .env files
💥 Delete entire directories
🗂️ Modify sensitive database schemas

Waymark prevents this by sitting between your agent and the MCP tools.

What's New

⚡ Performance

Policy evaluation: 30% faster
Dashboard renders: 50% lighter
New indexed database queries

🎯 Better UX

Clearer error messages with reasons
New CLI commands for testing policies
Real-time action telemetry

🔐 Security Improvements

Fixed symlink bypass issue
Transactional approval workflow
Immutable policy versions

Quick Example

{
  "allowedPaths": ["./src/**"],
  "blockedPaths": [".env", "secrets/"],
  "requireApproval": ["./migrations/**"],
  "blockedCommands": ["rm -rf", "DROP TABLE"]
}

→ Now Claude can write code safely while sensitive files stay protected.

The Numbers

Scenario	Result	Status
Read allowed file	ALLOW ✅	Works
Read blocked file	BLOCK ❌	Blocked
Write requires approval	PENDING ⏳	Dashboard review
Dangerous command	BLOCK ❌	Blocked
Safe shell command	ALLOW ✅	Works

100% policy enforcement across all scenarios.

Get Started in 2 Minutes

npm install -g @way_marks/cli@latest
npx @way_marks/cli init --template secure
npx @way_marks/cli start

Your dashboard opens at http://localhost:47000

Key Features

✅ Real-time policy enforcement — Every MCP call is validated

✅ Approval workflows — Human review for sensitive changes

✅ Full audit trail — Every action is logged to SQLite

✅ Dashboard UI — View policies, stats, action history

✅ Zero breaking changes — Upgrade from v4.6.x safely

✅ CLI tools — Test policies before deployment

What's Next?

🚀 v4.8.0 (Q3 2026)

Multi-agent coordination
Browser IDE extension
Cloud-hosted dashboard

Check the tool

github/waymark
npm install -g @way_marks/cli

Waymark v4.7.0 is ready. Download it today and build with confidence.

Safe AI collaboration starts here. 🚀

Published May 15, 2026 · Open source · MIT License

5 Startup Ideas Hidden Inside Japan's $130B IT Market (Nobody Built These Yet)

Shaiful Islam Shabuj — Thu, 14 May 2026 16:54:47 +0000

Japan is the world's third-largest economy. It has 1.2 million
software engineers. And most of them spend their workday in Excel.

Not writing code. Writing specs. In spreadsheets.

I spent time researching Japan's IT industry and found five deep,
structural pain points that have no good software solution yet. Each one has a clear product gap. Here they are:

## Pain #1 — Excel Hell

Japanese engineers at large system integrators (called SIers) deliver
Excel files as their primary output. Test case matrices, architecture
diagrams, design specs — all in grid format, never in version control.

76% of Japanese companies have documentation outdated by at least
one version (2026 survey).

Product gap: SpecSync — a CLI that parses Excel spec files and
converts them into versioned, searchable Markdown synced to Git.
Open-source core, paid cloud sync. No competitor targets this workflow.

## Pain #2 — Siloed Knowledge (属人化)

Over 70% of Japanese companies report that critical system
knowledge lives exclusively in one person's head. When they retire,
the knowledge goes with them.

Product gap: TribeMind — indexes Git history, PR comments, and
Slack logs into a queryable knowledge base. Ask why a module was
built a certain way — get an answer from years of commit context.

## Pain #3 — The Language Wall

Japan ranks 92nd globally in English proficiency, yet employs
over 90,000 foreign IT workers. Spec miscommunication between Japanese
clients and offshore English-speaking teams causes real project
failures.

Product gap: BridgeSE — converts Japanese requirements documents
into English, detects unstated cultural assumptions, and generates
bilingual handoff packets. DeepL doesn't do this. Google Translate
doesn't either.

## Pain #4 — Reporting Hell

Japanese engineers have adopted AI for coding. But the administrative
overhead — weekly progress reports, formal meeting minutes — hasn't
dropped.

Product gap: StandupAI — monitors Git commits and Jira activity,
auto-generates formal Japanese progress reports (進捗報告書) in proper
business keigo style. CLI + VS Code extension.

## Pain #5 — The Legacy Cliff

Japan's enterprise infrastructure includes COBOL systems from the
1970s, Java monoliths from the 2000s, and Classic ASP still running
on-premise in 2026. Nobody understands the full picture anymore.

Product gap: LegacyMap — deep-scans a codebase and outputs a
system map, dependency graph, risk heat map, and plain-language
narrative for each component. B2B — sold to banks, manufacturers,
and retailers.

## Watch the full breakdown

I made a short video on this (under 4 minutes, no fluff). It's my
first YouTube video and it's available on youtube.

👉 YouTube link

Which of the five would you build? Drop it in the comments here or on
YouTube.

Waymark: The Control Layer Your AI Coding Agent Was Missing

Shaiful Islam Shabuj — Tue, 12 May 2026 15:47:02 +0000

v4.5.0 · @way_marks/cli

AI coding agents are getting good at writing code. The problem is they're just as good at touching files they shouldn't, running commands you didn't ask for, and doing it all without leaving a paper trail.

Waymark sits between your agent and your codebase. Every file write and shell command gets intercepted, logged, and checked against your policy — before it happens.

npm install -g @way_marks/cli
waymark init   # registers with Claude Code or GitHub Copilot CLI
waymark start  # spins up the MCP server + dashboard

What It Actually Gives You

A policy that runs before the agent does anything

One config file. Three layers of control:

{
  "allowedPaths": ["src/**", "tests/**"],
  "blockedPaths": [".env", "*.pem", "*.key"],
  "requireApproval": ["package.json", "Dockerfile"],
  "blockedCommands": ["rm -rf", "regex:curl.*-o\\s+/"]
}

Allow — executed and logged automatically
Block — stopped cold, agent gets a clear reason
Require Approval — agent waits, you decide from the dashboard or Slack

The agent never touches .env. It never runs rm -rf. And before it rewrites your Dockerfile, it asks.

Every write is reversible

Before any file change goes through, Waymark snapshots the original. One click in the dashboard restores it — or you roll back an entire agent session at once. No git stash gymnastics, no "what did it do exactly."

A live view of every agent on your machine

$ waymark agents

Agent     PID    Status      Ctx%  Tokens   Task                         Age
copilot   39897  thinking     52%  146,032  Refactor auth middleware       1m
claude    64586  waiting      37%   75,060  (idle)                        12m

Context-window fill, token count, current task, tool calls — all live, all local. Nothing leaves the machine.

Approvals that don't kill your flow

When an action hits a requireApproval path, Waymark sends a Slack message with Approve / Reject buttons. The agent waits. You approve from your phone. Work continues. The dashboard shows the full audit trail: what was requested, who approved it, what changed.

Works where you already work

Platform	Status
Claude Desktop / Claude Code	✅ Full support
GitHub Copilot CLI	✅ Full support
GitHub Copilot Chat (VSCode)	⏳ Waiting for GitHub MCP

Setup is the same regardless of platform — waymark init picks it up.

The pitch is simple: AI agents move fast. Waymark makes sure they move in the right direction.

github/waymark · npm install -g @way_marks/cli

Previous posts of waymarks:
how-i-stopped-worrying-about-claude-code-touching-files-it-shouldnt

DocuFlow: Give Your AI Agent a Persistent Memory for Your Codebase

Shaiful Islam Shabuj — Sun, 10 May 2026 11:10:41 +0000

TL;DR — DocuFlow is an open-source MCP server that gives AI agents (Claude, Copilot, Cursor) a persistent, structured wiki about your codebase. Instead of re-explaining your project every session, your agent reads once, remembers forever, and builds on previous knowledge.

npm install -g @doquflow/cli && docuflow init

The Problem: AI Agents Have Goldfish Memory

Here's a conversation most of us have had:

You: "Add a rate limiter to the auth routes."

Agent: "Sure! What authentication library are you using?"

You: "...we've been using JWT for the last 6 sessions."

Every new conversation, your AI agent starts from scratch. It re-reads files, re-discovers patterns, re-asks the same questions. This gets worse as your codebase grows. By the time you've given enough context to be useful, your context window is half-burned.

The standard answer is RAG (Retrieval-Augmented Generation). Pull relevant files, chunk them, embed them, retrieve on demand. But RAG has a hidden cost: the LLM does the same extraction work on every single query. There's no accumulation. Knowledge doesn't compound.

The LLM Wiki Pattern

DocuFlow implements a different approach called the LLM Wiki pattern:

Raw Sources (immutable)
       ↓ ingest once
Wiki Layer (LLM-maintained markdown pages)
       ↓ query anytime
Synthesized Answers + Citations

The key insight: let the LLM do the bookkeeping once, then compound that work.

When you add a new source document, DocuFlow reads it, extracts entities and concepts, integrates them into an existing wiki, updates cross-references, and flags contradictions. The next query is better because the wiki is richer — not because you did more chunking.

Introducing DocuFlow

DocuFlow is an MCP server (Model Context Protocol) that works with Claude, Copilot, Cursor, and any other MCP-compatible agent. It provides 15 tools organized into four groups:

Category	Tools	What they do
Code Extraction	`read_module`, `list_modules`, `write_spec`, `read_specs`	Scan files → extract classes, endpoints, DB tables, deps
Wiki Pipeline	`ingest_source`, `query_wiki`, `wiki_search`, `save_answer_as_page`, …	Build and query the living wiki
Health	`lint_wiki`, `get_schema_guidance`, `preview_generation`	Keep the wiki accurate and complete
Dependency Graph	`generate_dependency_graph`	Visual map of imports, shared tables, and coupling

Plus an 8-command CLI and a React web UI — all from a single npm install.

A Real Example: The TaskFlow API

Let me walk through a real project. I have a TypeScript REST API called TaskFlow — JWT auth, RBAC, PostgreSQL, Express. The kind of project where every new developer asks the same 10 questions.

Step 1: Initialize

cd taskflow-api
docuflow init

This creates .docuflow/ in your project, registers the MCP server in Claude/Copilot, and writes a CLAUDE.md that auto-loads when your agent starts.

Step 2: Write source documents

DocuFlow's wiki starts from your curated docs — markdown files you drop into .docuflow/sources/. These are the authoritative descriptions you write once:

.docuflow/sources/
├── overview.md         # What this project is, tech stack, env vars
├── auth-security.md    # JWT lifecycle, bcrypt, RBAC, rate limiting
├── architecture.md     # System diagram, request lifecycle, DB schema
├── api-reference.md    # REST endpoints with request/response examples
└── developer-guide.md  # Setup, conventions, deployment

Step 3: Sync

docuflow sync

DocuFlow ingests all 5 sources and generates 71 wiki pages — each entity, concept, and relationship gets its own page with cross-references:

✓ Ingested overview.md          → 12 pages (JWT Authentication, RBAC, Rate Limiter, …)
✓ Ingested auth-security.md     → 18 pages (Access Token, Refresh Token, bcrypt, …)
✓ Ingested architecture.md      → 16 pages (Connection Pool, Task State Machine, …)
✓ Ingested api-reference.md     → 14 pages (POST /auth/login, GET /tasks/:id, …)
✓ Ingested developer-guide.md   → 11 pages (Environment Setup, Deployment, …)
Health: 96/100 · 71 pages · 5 sources

Now every AI session that opens this project can immediately answer:

You: "Explain the token refresh flow."

Agent (reading wiki, not re-scanning files): "TaskFlow uses short-lived access tokens (15 min) paired with rotating refresh tokens (7 days). On expiry, the client POSTs to /auth/refresh with the refresh token in a httpOnly cookie. The server validates against the stored hash, issues a new access token, and rotates the refresh token. In production, a Redis denylist handles immediate revocation…"

The agent didn't read auth.ts. It read the wiki page for Refresh Token — which already has the context, cross-references to Access Token and JWT Authentication, and a link to the source document.

Step 4: The Web UI

docuflow ui

Opens http://localhost:48821 — a live React interface with six views:

Ask — Type any question, get a synthesized answer with source citations:

Ask view showing JWT auth question with synthesized answer

Wiki — Browse all 71 pages organized as Entities, Concepts, Syntheses, Timelines:

Wiki tree with entity pages listed

Graph — Your entire knowledge base as an interactive D3 force graph. Each node is a wiki page; edges are cross-references:

Graph view with color-coded node clusters, zoomed into auth cluster showing JWT Authentication hub

Health — Real-time quality score (96/100) with actionable issues:

Health dashboard showing score gauge and orphan page list

The UI auto-discovers all DocuFlow projects in ~/dev, ~/code, ~/Desktop. Drop a project picker in the corner, switch between all your codebases.

What AI Agents Can Do With It

Once DocuFlow is registered, your agent gets a new vocabulary. Instead of:

"Let me read your package.json, then src/auth.ts, then… can you paste your middleware?"

The agent uses MCP tools:

→ query_wiki("How does authentication work?")
← Synthesized answer: JWT with refresh rotation, bcrypt, RBAC middleware...
   Sources: [entity/jwt_authentication, entity/refresh_token, concept/rbac]

→ generate_dependency_graph({ focus: "auth" })
← auth.ts imports: jsonwebtoken, bcryptjs, pg
  Shared tables with users.ts: "users"
  Most connected: auth.ts (hub for 4 modules)

→ lint_wiki()
← Health: 96/100
  Issues: 3 orphan pages, 1 stale concept
  Recommendation: link "Rate Limiter" from "Security Headers" page

Knowledge compounds across sessions. If the agent answers a complex question and saves it back:

→ save_answer_as_page({ question: "What happens when a refresh token expires?",
                         answer: "...", category: "synthesis" })
← Saved: wiki/syntheses/what_happens_when_refresh_token_expires.md

That page is now part of the wiki. The next agent to ask gets a better, richer answer — without anyone repeating themselves.

The CLI at a Glance

docuflow init               # Init + MCP registration
docuflow init --interactive # Guided domain setup (Code/Research/Business/Personal)
docuflow status             # Wiki stats, health score, version
docuflow suggest            # 5 prioritized starting-point suggestions
docuflow sync               # Re-ingest all sources, rebuild index
docuflow sync --ai          # Sync + AI-powered doc generation
docuflow watch              # Background daemon — ingests new sources in <1s
docuflow review --staged    # Review staged git changes for issues
docuflow ui                 # Start web interface on port 48821

Multi-Language, Multi-Domain

DocuFlow's extraction engine is regex-based and language-agnostic. It works on TypeScript, JavaScript, Python, Go, Ruby, Java, C#, PHP, SQL — extracting classes, functions, imports, REST endpoints, DB tables, and environment variable references from any of them.

And it's not just for code. The domain system supports Research, Business, and Personal knowledge bases too. The schema adapts, the wiki page templates change, the suggest command gives domain-appropriate recommendations.

Under the Hood

DocuFlow is a standard MCP server built with @modelcontextprotocol/sdk. It runs as a local subprocess — Claude or Copilot connects to it via stdio, same as any other MCP tool. The web UI is a single Express server on port 48821 that serves both the React frontend and a REST API bridge to the same 15 MCP tools.

Everything lives in .docuflow/ in your project directory:

.docuflow/
├── sources/       ← Your curated inputs (immutable)
├── wiki/          ← LLM-generated pages (entities, concepts, syntheses, timelines)
├── specs/         ← Agent-written technical specs
├── schema.md      ← Domain config (customize the wiki structure)
├── index.md       ← Auto-maintained page catalog
└── log.md         ← Operation history

The wiki is plain markdown files on disk — git-trackable, diffable, portable. No database, no cloud service, no API keys required.

Getting Started in 5 Minutes

# Install globally
npm install -g @doquflow/cli

# Initialize your project
cd your-project
docuflow init

# Write your first source document
cat > .docuflow/sources/overview.md << 'EOF'
# My Project
What it does, the tech stack, key concepts...
EOF

# Ingest and start the UI
docuflow sync && docuflow ui

Then open Claude/Copilot and ask anything about your project. The wiki is already loaded.

What's Next

DocuFlow is at v1.5.1 and actively developed. On the roadmap:

Auto-sync on git hooks — wiki updates automatically on every commit
Team mode — shared wiki across a team, conflict-free merge
More MCP clients — first-class support for Windsurf, Zed, VS Code Chat

Try It

npm: @doquflow/cli and @doquflow/server
GitHub: doquflows/docuflow
Email: docuflow@sshabuj.com

If you're tired of re-explaining your codebase to an AI that forgot everything overnight, give DocuFlow a try. Your future self — and your future agent — will thank you.

Built with TypeScript, @modelcontextprotocol/sdk, D3, React 18, Express.

Plus: waymark, tracking the agent's actions

How I stopped worrying about Claude Code touching files it shouldn't

Shaiful Islam Shabuj — Wed, 08 Apr 2026 01:03:15 +0000

Claude Code is powerful.
It can also silently write to your .env or run rm -rf.
You find out after it happens.

Waymark is an MCP server that intercepts
every agent action before it executes...

Waymark sits between an AI agent (Claude Desktop, Claude Code) and the filesystem. Every write_file, read_file, and bash call passes through Waymark before execution. Waymark:

Checks policy — blocks or queues the action if it violates waymark.config.json
Logs to SQLite — records every action with full input, output, and policy decision
Exposes a web UI — live dashboard at http://localhost:3001 showing all actions
Supports rollback — restores any overwritten file, or deletes any newly created file
Approval flow — pending actions can be approved (executes the action) or rejected from the UI or Slack

Setup:
cd your-project npx @way_marks/cli init npx @way_marks/cli start

What policies would you add to the default config?
What files should be protected that aren't already?

github/waymarks
npmjs/waymarks
email/shabuj

Plus: docuflow, llm-wiki based documentation using AI agent