TL;DR:
This proof-of-concept bridges the gap between AI code generation and real-world engineering, all thanks to its underlying architecture. By embedding rules, tests, and docs directly into the scaffold, the system forces the AI to follow a strict two-phase workflow: planning first (analyzing tasks, reading docs) and then validation (auto-linting, type-checking, and updating its own documentation). This turns the AI from an unpredictable assistant into a reliable partner that produces consistent, high-quality code.
Introduction
What if your scaffold could enforce architecture **and* write its own docs?*
The internet is overflowing with “10× developer” prompt recipes, best‑practice checklists, and mega‑threads on how to squeeze perfect code out of LLMs.
Still, when I looked for something that solved my very real, day‑to‑day problems—keeping large‑scale projects consistent, maintainable, and well‑documented without massive human overhead—I couldn’t find a solution that fit my needs.
In response, I built a Remix‑based starter project with AI‑guided scaffolding, architecture guardrails, and an example app generated entirely by the rules. The goal is to explore how we can make it painless to spin up and keep evolving—so quality stays high and standards stay intact.
If you’re curious how GenAI can help without turning your repo into spaghetti, read on to see the principles behind the template, the experiments that validated them, and the lessons you can apply to your own projects.
A quick note: This project is currently a working proof-of-concept, designed to explore the principles of "GenAI-Native" development. Think of it less as a production-ready tool and more as a research project and a source of ideas.
Motivation: The AI‑Hype vs. Real‑World Projects
Generative AI tools—GitHub Copilot, ChatGPT, Cursor—promise warp‑speed dev, but teams still hit walls:
- ❌ Inconsistent output: AI ignores folder boundaries or sneaks in anti‑patterns.
- ❌ Onboarding overhead: every codebase has its own sacred conventions.
- ❌ Manual policing: engineers still sift through PRs fixing style, tests, docs.
I set out to solve two challenges at once:
- Deliver production‑grade code consistently, not just “sometimes good”.
- Keep architecture, docs, and tests up‑to‑date with minimal human effort.
Baking lint/architecture rules straight into a template felt like the quickest path to a predictable, AI‑augmented workflow.
The Core Concept: A High-Level View
At the heart of the template is a two-phase workflow designed to transform the AI from a chaotic code generator into a disciplined engineering partner. The first phase, "Plan Before You Code," acts as a mandatory analysis step. Before writing a single line of code, the AI is forced to behave like a senior developer: it must break down the task, ask clarifying questions, identify which files it will modify, and—most importantly—read the existing README.md
for any code it intends to touch. This initial, deliberate planning stage eradicates the common "rush-to-code" problem, ensuring that every action is contextual and well-considered.
Once the code is written, it immediately faces the second phase: a gauntlet of Critical Workflow Checks that serve as an unyielding quality gate. This isn't just a simple lint and type-check. It’s a comprehensive audit where the AI must automatically fix its own style errors, ensure perfect type-safety, and even perform a self-review for accessibility issues. This automated validation loop guarantees that no code reaches a human reviewer until it meets the project's rigorous standards for quality, consistency, and correctness.
The final, and perhaps most crucial, check in this process is the Mandatory Documentation Update. The system verifies that the AI has updated the README.md
file to reflect the changes it just made. This closes the loop, creating a self-documenting system where the code and its explanation are never out of sync. By combining proactive planning with rigorous, automated validation, this entire workflow ensures the AI doesn't just produce code, but contributes to a healthy, maintainable, and well-documented codebase.
It's worth noting that this level of consistency is achieved by being highly specific. While the overall pattern could be adapted for any stack, the current rules are intentionally hardcoded with Remix-specific (and other libs) conventions. This trade-off—sacrificing immediate flexibility for greater accuracy—ensures the AI has unambiguous context, leading to more predictable and reliable results.
Experiments & Findings
From Chaos to Consistency
It took countless iterations of running, testing, and refining, but the rules eventually led to highly predictable behavior. The AI now consistently follows the architectural guardrails and uses the scaffolding scripts to generate new features, moving from unpredictable output to stable, structured results.
Rules forged through iteration
I ran hundreds of micro‑tasks; when results disappointed me I tweaked the YAML and tried again. Over time the rule‑set converged into 18 guardrails that survive even edge‑cases. For example, at first, rules from task-approach
were sometimes ignored or forgotten, so I had to introduce Critical Workflow Checks to ensure they were always enforced.
Single-prompt features & sustained autonomy
A medium-sized feature now completes end-to-end with a single prompt and runs continuously for 20–30 minutes without losing focus or quality, ultimately delivering fully working, well-structured code. For example prompt:
Create a widget named ProductsList and a page to show them. Use https://dummyjson.com/docs/products for the API.
In this case, Claude Sonnet 4 via Cursor (without using Max mode) generated both the widget and the page in just 10 minutes. It automatically produced an auxiliary plan file task-plan-products-list.md, having assessed this as a complex, multi-step task (more than three steps). Below is a screenshot of the AI creating that plan file:
As a result, we ended up with a fully functional, well-written feature. Here are screenshots of the code snippet and the resulting page in action:
Below is a link to the full execution log of the prompt in a single request:
View the complete prompt execution log
Yes, stricter rules cost time—but still beat manual work
Extra README edits, test stubs, and ESLint passes add overhead, yet the total cycle is faster than hand‑coding. The bigger the codebase grows, the faster those upfront costs repay themselves through fewer bugs and easier maintenance.
Compounding leverage
Once a rule is consistently green, adding more is easy—just remember to run a dedicated prompt that cross‑checks new rules against the existing set. Consistency + non‑contradiction are non‑negotiable for good results.
For example: you might start by creating a small set of base rules that define your project’s folder and file structure (e.g., “all feature slices must live under src/features/ and include a README.md, index.tsx, and styles.css”). Once those structural rules pass validation, you then introduce additional rules—say, conventions around naming components or importing shared utilities—and run a cross-validation prompt to ensure these new rules reference only the directories and files defined by your base rules, preventing any conflicts or ambiguities.
Example project in numbers
I spent roughly two days (around 8 hours each day, including breaks) working in a semi-manual loop of prompt → result → next task. Of that time, about 80 % was devoted to generating code, 15 % to validation and fixes, and the remaining 5 % to identifying and immediately applying enhancements to the rule set. The result is a sizable dummy app featuring a type-safe API layer, clean FSD (Feature-Sliced Design) folders, passing tests, and up-to-date documentation—a solid proof that the ecosystem holds under real pressure.
Interactive Review Loop
The ruleset naturally evolved into more than a generator; it became an interactive specification. I could manually prototype a feature, then enter a review loop with the AI, instructing it to "refactor this file to match our conventions." This proved incredibly effective for ensuring consistency without sacrificing manual flexibility.
For example, if I accidentally create a component by hand in a components/ folder—which shouldn’t exist—I can ask the AI to review and fix it. The AI will detect the misplaced component and relocate it into the correct directory (e.g., under features/ or widgets/), updating imports and documentation as needed.
Tips on Generating Rules with AI
If you use an AI to help write your rules, I found a three-step process is essential for success:
-
Compress: AI-written prompts are often verbose. By carefully editing them for conciseness, I was able to reduce the token count by an average of 61% across my rules without losing any semantic meaning.
for example, before:
name: "Security Vulnerability Detection and Prevention" description: | This rule implements comprehensive security scanning and validation to identify and prevent common web application vulnerabilities. It integrates with the FSD architecture methodology to ensure security considerations are properly addressed at each layer. benefits: - "**Proactive Security**: Early detection of security vulnerabilities" - "**Comprehensive Coverage**: Scans for multiple vulnerability types" - "**Development Integration**: Seamless workflow integration" - "**Code Quality**: Maintains high security standards" - "**Risk Mitigation**: Reduces production security incidents" - "**Compliance Support**: Helps meet security requirements" flagged_patterns: xss_vulnerabilities: description: "Cross-site scripting vulnerability patterns" examples: unescaped_html: | // ❌ BAD - Dangerous innerHTML usage <div dangerouslySetInnerHTML={{__html: userInput}} /> // ❌ BAD - Direct DOM manipulation with user data element.innerHTML = userData;
after:
name: "Security Vulnerability Detection" description: "Detects common web vulnerabilities in FSD architecture" flagged_patterns: xss_vulnerabilities: - "dangerouslySetInnerHTML with user input" - "innerHTML with unescaped data" - "document.write() usage" injection_vulnerabilities: - "eval() with dynamic content" - "SQL queries with string concatenation" - "Unsafe regex patterns"
Cross-validate: This is critical. You must check new rules against existing ones to ensure they don't create contradictions. A conflicting rule-set is a recipe for chaotic output.
Test in Practice: A rule that looks perfect on paper can fail in execution. Always run it through a real-world scenario to see the actual output.
Anatomy of the Template
The core idea isn’t a particular stack—it’s a mesh of principles and mutually reinforcing rules. Remix + Tailwind are tactical choices, but the pattern is supposed to work with any modern framework.
For this proof-of-concept, I chose a specific set of technologies to test the principles in a real-world scenario. Each choice was made to maximize the chances of getting clean, predictable output from the AI:
Stack & Structure
Ingredient | Why it matters |
---|---|
Remix + TypeScript | Remix provides first-class routing and a data-layer out of the box, SSR when you need it, and a compiler-less mindset that keeps bundle sizes sane. The strict type-safety from TypeScript is not a luxury, but a fundamental requirement for maintaining quality with AI-generated code. |
Tailwind CSS + shadcn/ui | The atomic nature of Tailwind is a perfect match for GenAI. An LLM can easily compose complex interfaces from small, understandable building blocks, which means almost no time is wasted on routine styling. Using shadcn/ui on top provides accessible primitives, ensuring the generated UI is high-quality from the start. |
Feature‑Sliced Design (FSD) | Clear vertical slices (entities/ , features/ , widgets/ ) enforce SRP, scale cleanly for big teams, and pair perfectly with DI (Dependency injection) rules—AI can’t “accidentally” reach across layers. |
Platform-agnostic rules | Rules live in neutral YAML. A conversion script then translates them into platform-specific formats. For instance, they can be converted into rules for Cursor, or prompts for Claude, Copilot, and Windsurf. |
Plop.js generators | Plop.js offers deterministic scaffolding based on templates. This is crucial as it creates a predictable file structure every time, forcing the AI to only fill in specific logic instead of generating boilerplate. This approach guarantees architectural consistency and significantly reduces token costs. |
RTK Query | Any state/query lib would work; RTKQ ships cache, tagging, polling, and optimistic updates out of the box, so AI focuses on business logic, not plumbing. |
Testing Suite | Jest + Playwright + jest‑axe. The concrete libs are less important than the mandate: AI must output tests, period. |
ESLint & TS strict mode | Acts as the final quality gate. ESLint enforces not only standard code style but also custom architectural rules, such as ensuring Feature-Sliced Design (FSD) boundaries are respected. TypeScript provides strict type-safety, an industry-standard practice that is absolutely essential for validating AI-generated code. |
How AI Fits In
The template assumes that LLM prompts do almost all the mechanical work while humans supervise and merge. Four core practices make that possible:
Plop‑powered scaffolding
Every new feature or component is generated through a Plop command backed by Handlebars templates. The model never creates folders by hand—it only fills in logic. This keeps token usage low, generation fast, and the file tree perpetually aligned with the architecture.
Living README files
Each slice of the codebase owns a README.md
. The upper half is written for humans; the lower half contains meta-instructions for the AI. Before making any code changes, the model must read that README, and after a change, it must update it. I specifically asked Claude Sonnet 4 to evaluate the usefulness of this README approach, and it rated the pattern 9 / 10 for improving accuracy and reducing hallucinations.
Hands‑free lint & type loop
After finishing a task, the model triggers eslint --fix
and tsc --noEmit
. If either fails, the same session patches the errors before a human even opens the PR. Result: style, a11y, and type issues rarely reach code review.
Rule‑driven orchestration
All of the above are wired together through platform‑agnostic YAML rules: scaffold → write code → lint → doc update. Each new prompt inherits a clean, validated context, so the codebase tends to improve with time instead of degrading.
In effect, the rules guide and constrain the LLM into producing tidy, decomposed code, and that tidy code then makes every next AI session easier. The system becomes a feedback loop that keeps large projects from collapsing under their own weight.
Next steps
While this GenAI-native Remix template already delivers reliable, high-quality results, it remains a work in progress. Further tuning of rule granularity, performance optimizations in the orchestration scripts, and expanded coverage for edge-case scenarios will be essential to unlock its full potential. I look forward to refining these guardrails and sharing future improvements as the ecosystem continues to evolve.
Try the concept yourself
You can find the repository here: https://github.com/tohachan/remix-genai-template
1. Clone the Template
Use degit
to clone the repository into a new folder without the Git history.
npx degit tohachan/remix-genai-template my-genai-app
2. Install Dependencies
Navigate into your new project folder and install the necessary packages.
cd my-genai-app
npm install
3. Initialize AI Rules
Run the init script. This will convert the agnostic YAML rules into a specific format for your chosen AI development environment.
npm run init:rules
4. Select Your Environment & Go!
You will be prompted in the terminal to choose your target environment (e.g., Cursor, Claude, etc.). Once the script finishes, simply open the project folder in your selected tool (for example, Open Folder in Cursor), and you're ready to start building with AI guardrails!
Questions? Ideas? Drop a comment or open an issue—would love to see what you build! 😄
Top comments (0)